How Economists Convert Quarterly Data into Monthly: Cubic Spline Interpolation
January 20, 2010 61 Comments
(Update: Please note that cubic spline interpolation can only provide estimates of data between known data points. It cannot “create” unknown data. For example, if only annual figures exist for your data set, then annual observations are the only real data. Cubic splining between them to generate monthly estimates is only an approximation technique, and does not provide new actual monthly figures). For this reason, the below method may not be appropriate for all research purposes.)
A common problem economists face with time-series data is getting them into the right time interval. Some data are daily or weekly, while others are in monthly, quarterly or annual intervals. Since most regression models require consistent time intervals, an econometrician’s first job is usually getting data into the same frequency.
In this post I’ll explain how to solve a common problem we’ve run into: how to divide quarterly data into monthly data for econometric analysis. To do so, we’ll use a method known as “cubic spline interpolation.” In the example below we use Matlab and Excel. For Stata users, I’ve posted a Stata do file that illustrates how to work through the below example in Stata.
Cubic Spline Interpolation
One of the most widely used data sources in economics is the National Income and Product Accounts (NIPAs) from the U.S. Bureau of Economic Analysis. They’re the official source for U.S. GDP, personal income, trade flows and more. Unfortunately, most data are published only quarterly or annually. So if you’re hoping to run a regression using monthly observations — for example, this simple estimate of the price elasticity of demand for gasoline — you’ll need to split these quarterly data into monthly ones.
A common way to do this is by “cubic spline interpolation.” Here’s how it works. We start with n quarterly data points. That means we have n-1 spaces between them. Across each space, we draw a unique 3rd-degree (or “cubic”) polynomial connecting the two points. This is called a “piecewise polynomial” function.
To make sure our connecting lines form a smooth line, we force all our first and second derivatives to be continuous; that is, at each connecting point we make them equal to the derivitive on either side. When all these requirements are met — along with a couple end-point conditions you can read about here — we have a (4n-4) x (4n-4) linear system that can be solved for the coefficients of all n-1 cubic polynomials.
Once we have these n-1 piecewise polynomials, we can plug in x values for whatever time intervals we want: monthly, weekly or even daily. The polynomials will give us a pretty good interpolation between our known quarterly data points.
An Example Using MATLAB
While the above method seems simple, doing cubic splines by hand is not. A spline for just four data points requires setting up and solving a 12 x 12 linear system, then manually evaluating three different polynomials at the desired x values. That’s a lot of work. To get a sense of how hard this is, here’s an Excel file showing what’s involved in fitting a cubic spline to four data points by hand.
In practice, the best way to do a cubic spline is to use MATLAB. It takes about five minutes. Here’s how to do it.
MATLAB has a built-in “spline()” function that does the dirty work of cubic spline interpolation for you. It requires three inputs: a list of x values from the quarterly data you want to split; a list of y values from the quarterly data; and a list of x values for the monthly time intervals you want. The spline() function formulates the n-1 cubic polynomials, evaluates them at your desired x values, and gives you a list of interpolated monthly y values.
Here’s an Excel file showing how to use MATLAB to split quarterly data into monthly. In the file, the first two columns are quarterly values from BEA’s Personal Income series. Our goal is to convert these into monthly values. The next three columns (highlighted in yellow) are the three inputs MATLAB needs: the original quarterly x values (x); the original quarterly y values (y); and the desired monthly x values (xx).
In the Excel file, note that the first quarter is listed as month 2, the second quarter as month 5, and so on. Why is this? BEA’s quarterly data represent an average value over the three-month quarter. That means they should be treated as a mid-point of the quarter. For Q1 that’s month 2, for Q2 that’s month 5, and so on.
The next step is to open MATLAB and paste in these three columns of data. In MATLAB, type ” x = [ “, cut and paste the column of x values in from Excel, type ” ] ” and hit return. This creates an n x 1 vector with the x values. Repeat this for the y, and xx values in the Excel file.
Once you have x, y, and xx defined in MATLAB, type “yy = spline(x,y,xx)” and hit return. This will create a new vector yy with the interpolated monthly y values we’re looking for. Each entry in yy will correspond to one of the x values you specified in the xx vector.
Copy these yy values from MATLAB, paste them into Excel, and we’re done. We now have an estimated monthly Personal Income series.
Here’s an Excel file summarizing the above example for splitting quarterly Personal Income data into monthly using MATLAB. Also, here’s a MATLAB file with the x, y, xx, and yy vectors from the above exercise.
Note: For Stata users, here’s a “do” file with an example that performs the above cubic spline interpolation in mata. Note that Stata and Matlab use slightly different endpoint conditions for the cubic spline, so they’ll give slightly different results toward the beginning and end of the data set. (Once you generate the “yy” variable in Stata, be sure to save your file or else Stata will give an error when you run the above do file.)
An Econometric Warning: If you plan to do regression analysis with data that’s been interpolated by cubic splines, be aware that doing so will introduce a systematic source of serial correlation in your regressors. That is, the data points interpolated by cubic splines will be systematically related to each other by a cubic polynomial, so they will violate the standard OLS assumption about no autocorrelation. What does this mean? If you do any statistical inference using splined data — that is, if you test the significance of your regression coefficients or report standard errors — be sure to use Newey-West standard errors rather than the usual ones, as they’re robust to the autocorrelation introduced by cubic splines. You can do this using the “newey” command in STATA, rather than “reg”.
Hi Ranjini,
As a general rule, you should use the reported annual figures unless you have some highly compelling reason to interpolated the data between those points. The reason is simple: you don’t actually have knowledge of the data-generating process on a quarterly basis. You only know annual values. The data could be highly erratic on a quarterly basis, or perhaps linear, or follow some other unknown process. There is no reason a priori to assume the data-generating process follows a cubic polynomial between known points.
Another complication is that if you use cubic-spline interpolated values in regression analysis, you’ll be generating perfect autocorrelations between the quarterly values within each year. That is, they will be related perfectly by a cubic polynomial by design. Thus, you’ll need to use different standard errors that account for this, and be aware that autocorrelation is present in your residuals.
Bottom line: using cubic spline interpolation does not really give you “more” data than you originally began with. That’s simply not possible. So be careful about not giving a misleading impression that you “have” quarterly figures if you use a cubic spline, when in fact those are just approximated points based on the assumption of polynomial behavior of the underlying data-generating process between known points.
Hi Andrew!
Could I use polynomial interpolation to generate a single polynomial over the whole set of data and use that instead? What is the advantage of using cubic splines?
Thanks
Hi Jessica,
It all depends on what you’re trying to accomplish. If you’re trying to do forecasting, or trying to find a best-fitting polynomial for the entire data set, you can use non-parametric regression methods, and basically fit a high-degree polynomial (basically, a huge Taylor expansion) to your data set. Stata will do this.
The problem with this is that it can be very computationally intensive. Also, how do you know the correct degree of the polynomial you should be fitting to your data? 10th degree? 10,000th degree? For more on this, just Google around for non-linear regression methods.
However, if instead you’re simply trying to estimate data values between known points, there are many interpolation methods available. Cubic spline is just one of many, and is a pretty crude and simple method compared to some of the others.
Hi Andrew,
This is what I’ve been looking for. I tried it, and it works well.
By the way, I have a question for you. Does this method still work on data with seasonality?
I have a data set with seasonality, and I tried to verify if your method fits. When I plot two sets of data (the real and the splined one) in one figure, the splined data looks very smooth as compared to the real one, it seems like a trend of the real one. Is this good?
Thanks a lot!
I wish you also develop a method to disaggregate annual data into quarterly data
Thank you
Hi
I used the cubic spline method to interpolate a dependent variable for a regression. Is there any other way except for the Newey-West standard errors to deal with the serial correlation induced by the cubic spline interpolation?
Thank you
Hi Andrew,
I try to interpolate a variable (debt/GDP) ration, which is available on a annual basis into monthly. The data starts from January 1999 and ends September 2011.
Could you kindly advice what should be the first value of xx?
Many thanks,
Leon
Hi Andrew
I have data from 1973-2003…Now I want to work over the period 2010…How can I Interpolate the missing data in excel. Thanks
Hi Andrew,
Currently I’m working on HDI (human Development Index) data for my research paper about my country (Malaysia). I’m trying to convert the five-year data (1980, 1985, 1990 and so on) into yearly data as I’ve to get more observation for the time-series analysis. Would it be possible?..I’ve quarterized some yearly data using Eviews and it worked. I only have problem with this 5-year data. I’ve browsed through the post by Isriya above, but still couldn’t understand.
Thanks in advance for advice.
FP