(Update: Please note that cubic spline interpolation can only provide estimates of data between known data points. It cannot “create” unknown data. For example, if only annual figures exist for your data set, then annual observations are the only real data. Cubic splining between them to generate monthly estimates is only an approximation technique, and does not provide new actual monthly figures). For this reason, the below method may not be appropriate for all research purposes.)
A common problem economists face with time-series data is getting them into the right time interval. Some data are daily or weekly, while others are in monthly, quarterly or annual intervals. Since most regression models require consistent time intervals, an econometrician’s first job is usually getting data into the same frequency.
In this post I’ll explain how to solve a common problem we’ve run into: how to divide quarterly data into monthly data for econometric analysis. To do so, we’ll use a method known as “cubic spline interpolation.” In the example below we use Matlab and Excel. For Stata users, I’ve posted a Stata do file that illustrates how to work through the below example in Stata.
Cubic Spline Interpolation
One of the most widely used data sources in economics is the National Income and Product Accounts (NIPAs) from the U.S. Bureau of Economic Analysis. They’re the official source for U.S. GDP, personal income, trade flows and more. Unfortunately, most data are published only quarterly or annually. So if you’re hoping to run a regression using monthly observations — for example, this simple estimate of the price elasticity of demand for gasoline — you’ll need to split these quarterly data into monthly ones.
A common way to do this is by “cubic spline interpolation.” Here’s how it works. We start with n quarterly data points. That means we have n-1 spaces between them. Across each space, we draw a unique 3rd-degree (or “cubic”) polynomial connecting the two points. This is called a “piecewise polynomial” function.
To make sure our connecting lines form a smooth line, we force all our first and second derivatives to be continuous; that is, at each connecting point we make them equal to the derivitive on either side. When all these requirements are met — along with a couple end-point conditions you can read about here — we have a (4n-4) x (4n-4) linear system that can be solved for the coefficients of all n-1 cubic polynomials.
Once we have these n-1 piecewise polynomials, we can plug in x values for whatever time intervals we want: monthly, weekly or even daily. The polynomials will give us a pretty good interpolation between our known quarterly data points.
An Example Using MATLAB
While the above method seems simple, doing cubic splines by hand is not. A spline for just four data points requires setting up and solving a 12 x 12 linear system, then manually evaluating three different polynomials at the desired x values. That’s a lot of work. To get a sense of how hard this is, here’s an Excel file showing what’s involved in fitting a cubic spline to four data points by hand.
In practice, the best way to do a cubic spline is to use MATLAB. It takes about five minutes. Here’s how to do it.
MATLAB has a built-in “spline()” function that does the dirty work of cubic spline interpolation for you. It requires three inputs: a list of x values from the quarterly data you want to split; a list of y values from the quarterly data; and a list of x values for the monthly time intervals you want. The spline() function formulates the n-1 cubic polynomials, evaluates them at your desired x values, and gives you a list of interpolated monthly y values.
Here’s an Excel file showing how to use MATLAB to split quarterly data into monthly. In the file, the first two columns are quarterly values from BEA’s Personal Income series. Our goal is to convert these into monthly values. The next three columns (highlighted in yellow) are the three inputs MATLAB needs: the original quarterly x values (x); the original quarterly y values (y); and the desired monthly x values (xx).
In the Excel file, note that the first quarter is listed as month 2, the second quarter as month 5, and so on. Why is this? BEA’s quarterly data represent an average value over the three-month quarter. That means they should be treated as a mid-point of the quarter. For Q1 that’s month 2, for Q2 that’s month 5, and so on.
The next step is to open MATLAB and paste in these three columns of data. In MATLAB, type ” x = [ “, cut and paste the column of x values in from Excel, type ” ] ” and hit return. This creates an n x 1 vector with the x values. Repeat this for the y, and xx values in the Excel file.
Once you have x, y, and xx defined in MATLAB, type “yy = spline(x,y,xx)” and hit return. This will create a new vector yy with the interpolated monthly y values we’re looking for. Each entry in yy will correspond to one of the x values you specified in the xx vector.
Copy these yy values from MATLAB, paste them into Excel, and we’re done. We now have an estimated monthly Personal Income series.
Here’s an Excel file summarizing the above example for splitting quarterly Personal Income data into monthly using MATLAB. Also, here’s a MATLAB file with the x, y, xx, and yy vectors from the above exercise.
Note: For Stata users, here’s a “do” file with an example that performs the above cubic spline interpolation in mata. Note that Stata and Matlab use slightly different endpoint conditions for the cubic spline, so they’ll give slightly different results toward the beginning and end of the data set. (Once you generate the “yy” variable in Stata, be sure to save your file or else Stata will give an error when you run the above do file.)
An Econometric Warning: If you plan to do regression analysis with data that’s been interpolated by cubic splines, be aware that doing so will introduce a systematic source of serial correlation in your regressors. That is, the data points interpolated by cubic splines will be systematically related to each other by a cubic polynomial, so they will violate the standard OLS assumption about no autocorrelation. What does this mean? If you do any statistical inference using splined data — that is, if you test the significance of your regression coefficients or report standard errors — be sure to use Newey-West standard errors rather than the usual ones, as they’re robust to the autocorrelation introduced by cubic splines. You can do this using the “newey” command in STATA, rather than “reg”.