# How Economists Convert Quarterly Data into Monthly: Cubic Spline Interpolation

January 20, 2010 61 Comments

(**Update**: Please note that cubic spline interpolation can only provide estimates of data between known data points. It cannot “create” unknown data. For example, if only annual figures exist for your data set, then annual observations are the only real data. Cubic splining between them to generate monthly estimates is only an approximation technique, and does not provide new actual monthly figures). For this reason, the below method may not be appropriate for all research purposes.)

A common problem economists face with time-series data is getting them into the right time interval. Some data are daily or weekly, while others are in monthly, quarterly or annual intervals. Since most regression models require consistent time intervals, an econometrician’s first job is usually getting data into the same frequency.

In this post I’ll explain how to solve a common problem we’ve run into: how to divide quarterly data into monthly data for econometric analysis. To do so, we’ll use a method known as “cubic spline interpolation.” In the example below we use Matlab and Excel. For Stata users, I’ve posted a Stata do file that illustrates how to work through the below example in Stata.

**Cubic Spline Interpolation**

One of the most widely used data sources in economics is the National Income and Product Accounts (NIPAs) from the U.S. Bureau of Economic Analysis. They’re the official source for U.S. GDP, personal income, trade flows and more. Unfortunately, most data are published only quarterly or annually. So if you’re hoping to run a regression using monthly observations — for example, this simple estimate of the price elasticity of demand for gasoline — you’ll need to split these quarterly data into monthly ones.

A common way to do this is by “cubic spline interpolation.” Here’s how it works. We start with n quarterly data points. That means we have n-1 spaces between them. Across each space, we draw a unique 3rd-degree (or “cubic”) polynomial connecting the two points. This is called a “piecewise polynomial” function.

To make sure our connecting lines form a smooth line, we force all our first and second derivatives to be continuous; that is, at each connecting point we make them equal to the derivitive on either side. When all these requirements are met — along with a couple end-point conditions you can read about here — we have a (4n-4) x (4n-4) linear system that can be solved for the coefficients of all n-1 cubic polynomials.

Once we have these n-1 piecewise polynomials, we can plug in x values for whatever time intervals we want: monthly, weekly or even daily. The polynomials will give us a pretty good interpolation between our known quarterly data points.

**An Example Using MATLAB**

While the above method seems simple, doing cubic splines by hand is not. A spline for just four data points requires setting up and solving a 12 x 12 linear system, then manually evaluating three different polynomials at the desired x values. That’s a lot of work. To get a sense of how hard this is, here’s an Excel file showing what’s involved in fitting a cubic spline to four data points by hand.

In practice, the best way to do a cubic spline is to use MATLAB. It takes about five minutes. Here’s how to do it.

MATLAB has a built-in “spline()” function that does the dirty work of cubic spline interpolation for you. It requires three inputs: a list of x values from the quarterly data you want to split; a list of y values from the quarterly data; and a list of x values for the monthly time intervals you want. The spline() function formulates the n-1 cubic polynomials, evaluates them at your desired x values, and gives you a list of interpolated monthly y values.

Here’s an Excel file showing how to use MATLAB to split quarterly data into monthly. In the file, the first two columns are quarterly values from BEA’s Personal Income series. Our goal is to convert these into monthly values. The next three columns (highlighted in yellow) are the three inputs MATLAB needs: the original quarterly x values (x); the original quarterly y values (y); and the desired monthly x values (xx).

In the Excel file, note that the first quarter is listed as month 2, the second quarter as month 5, and so on. Why is this? BEA’s quarterly data represent an average value over the three-month quarter. That means they should be treated as a mid-point of the quarter. For Q1 that’s month 2, for Q2 that’s month 5, and so on.

The next step is to open MATLAB and paste in these three columns of data. In MATLAB, type ” x = [ “, cut and paste the column of x values in from Excel, type ” ] ” and hit return. This creates an n x 1 vector with the x values. Repeat this for the y, and xx values in the Excel file.

Once you have x, y, and xx defined in MATLAB, type “yy = spline(x,y,xx)” and hit return. This will create a new vector yy with the interpolated monthly y values we’re looking for. Each entry in yy will correspond to one of the x values you specified in the xx vector.

Copy these yy values from MATLAB, paste them into Excel, and we’re done. We now have an estimated monthly Personal Income series.

Here’s an Excel file summarizing the above example for splitting quarterly Personal Income data into monthly using MATLAB. Also, here’s a MATLAB file with the x, y, xx, and yy vectors from the above exercise.

**Note:** For Stata users, here’s a “do” file with an example that performs the above cubic spline interpolation in mata. Note that Stata and Matlab use slightly different endpoint conditions for the cubic spline, so they’ll give slightly different results toward the beginning and end of the data set. (Once you generate the “yy” variable in Stata, be sure to *save your file* or else Stata will give an error when you run the above do file.)

**An Econometric Warning:** If you plan to do regression analysis with data that’s been interpolated by cubic splines, be aware that doing so will introduce a systematic source of serial correlation in your regressors. That is, the data points interpolated by cubic splines will be systematically related to each other by a cubic polynomial, so they will violate the standard OLS assumption about no autocorrelation. What does this mean? If you do any statistical inference using splined data — that is, if you test the significance of your regression coefficients or report standard errors — be sure to use Newey-West standard errors rather than the usual ones, as they’re robust to the autocorrelation introduced by cubic splines. You can do this using the “newey” command in STATA, rather than “reg”.

It is really a helpful guide. I was wondering whether we can use similar methodology to convert annual data into monthly data.Moreover, things would have been much clearer if explanation of the basis of ‘writing the desired monthly x values (xx)’ were provided.

Thanks

Hi Nirmal,

Yes, this methodology can also be used to convert annual data into monthly data. For annual data, the original (x) values will be 6.5 for the first year, 6.5 + 12 = 18.5 for the second year, and so on. And the desired (xx) values will simply be the months you’d like to split the data into: 1, 2, 3, and so on.

However, you should always be cautious about using cubic spline interpolation to divide data from long intervals (such as years) into very short intervals (such as months or even days). Be sure there is a good theoretical reason for why the underlying data should behave like a cubic function over very short intervals. For highly erratic data with large variance over short intervals, cubic spline may give misleadingly smooth estimates.

Thanks. It really helped me a lot. I was also wondering if the theoretical reasoning will be violated (or not)when the same is done using Quadratic Match sum Method through Eviews. For example, converting annual Gross domestic product (GDP) into monthly GDP.

Thank you once again.

Nirmal,

I’m not an EViews user — I mostly use STATA and MATLAB — so unfortunately I don’t know much about the quadradic match function.

My suggestion is that you carefully read the documentation for the version of EViews that you’re using, and make sure you totally understand the underlying theoretical method that’s being applied inside the software — and whether it makes sense for your data — before using it to interpolate quarterly data into monthly values.

Modern software has made it very easy to simply apply built-in interpolation functions to data, without really understanding the underlying mathematics behind it. However, different data interpolation methods have advantages and disadvantages, which you can only really understand if you take the time to work through the math yourself before using built-in functions.

Thanks for the suggestion.

Andrew,

Thanks for posting. Just wondering can we use the same methodology that convert the monthly data into weekly or daily data?

And did you see any disadvantages of this approach arising if we convert monthly data into daily data?

Many Thanks!

Julia,

Sure, you can use cubic spline interpolation to split monthly data into daily data. However, be sure to ask yourself, “How realistic are these daily data?”

For example, stock prices tend to be very volatile from one day to the next. If we instead only had monthly average stock prices, and used cubic spline interpolation to split them into daily data based on a smooth cubic polynomial function, it would give us the impression that stock prices are very smooth and stable from day to day.

The lesson is just to be aware of the assumptions underlying whatever method you use to interpolate between data points. And when splitting long intervals into short ones, be aware that you may be masking underlying volatility in the data — which may or may not be important for your analysis, depending on the problem you’re working on.

FYI, here is a useful thread discussing some of the methodological issues to be aware of when using cubic spline interpolation; i.e., that it may give the appearance that your data has smooth, polynomial-like properties between observed data points that it may not actually have:

http://www.phdcomics.com/proceedings/viewtopic.php?p=3316871&sid=abcf2cdfaab7161e675e3eba58bb9d78

I have data on input-output table for 5 years i.e, 2000 and 2005 How should i converted it annually (i.e the data in 2001, 2002,2003,2004?

Hi Isriya,

Cubic spline interpolation only allows you to estimate *between* two or more data points. It doesn’t allow you to split a five-year average into individual yearly data. So unfortunately it won’t work for your problem.

What country is your input-output table for? There may be good reasons why whatever statistical agency compiled the table that they’ve only produced a five-year average table. For example, in the U.S., benchmark tables are only produced every five years. The so-called “annual” U.S. input-output tables are really just the benchmark ones with various industry-level adjustments made based on annual GDP data.

Thank you very much, Andrew. The data on I-O table is for Thailand. It’s based on the survey at particular point in time, i.e. 2000 or 2005. I would like to get the annual series of the operating surplus of some sub-sectors. Can i use linear interporation to convert 5 years data to annual data?

Hi Isriya,

I see — these must be the Thailand input-output tables you’re working on:

http://www.nesdb.go.th/Default.aspx?tabid=97

If you start with a time-series of operating surplus by industry from these tables, you can definitely use cubic spline interpolation to estimate the annual values between the actual 1990, 1995, 2000 and 2005 values. The method should basically be the same as I’ve outlined above.

However, using cubic splines will only give you a very rough approximation of the annual values. In some ways, it won’t provide a much better approximation than simply doing a straight-line interpolation between your known points.

Here’s something to consider: Why should operating surplus for industries be expected to move as a cubic function over time, rather than a linear or some other functional form? Whatever analysis you’re planning to use these data for, just be aware of this issue, and that using cubic splines to divide data always involves some assumptions about the data that may or may not be accurate.

Good luck!

Hi there,

Interesting article. I have a problem that touches on interpolation that I’d welcome any thoughts on. I have a short monthly series (12 points) that I wish to backcast. I have a longer quarterly series (52 obs) that I can draw on. The series are in stock rather than flow domain.

Another angle is to constrain the monthly series to the quarterly (benchmark) series.

Does this temporal disaggregation method preserve the short term movements of the monthly series but be consistent with the benchmarks?

Thanks

Fida

Hi Fida,

I think the best approach is probably just to use cubic splines interpolation on your quarterly data, make a note of it in your analysis, and simply merge it with your 12 monthly observations.

The reason I say this is that having only 12 monthly observations probably isn’t enough data to reliably establish a monthly pattern of seasonality in your data.

Perhaps with 24 or 36 monthly points you could build a monthly index of seasonality, which could then be applied to the quarterly data, creating a series that matches your benchmarks but also displays the patterns visible in your monthly observations. Unfortunately, 12 observations probably isn’t enough to reasonably do that.

Just something to think about: how can you be sure the 12 months of data you happen to have aren’t unusual in some respect? Without at least another year of monthly data, it’s hard to know how reliable they are as a measure of true monthly variation in your data.

Good luck!

Thanks for your thoughts Andrew. From a quarterly perspective, we can say that the series is non-seasonal. I’ll attempt to interpolate it to give the m interpolates and splice on to the 12 m observations.

Fida

I am having problems opening the excel file. There are only xml files. The data I have are quarterly returns (which I am assuming are x values) and I want to convert them to monthly. I am not clear what the y and xx data represent? Thanks.

Hi there,

I have several doubts about cubic interpolation. I wonder if I can use this method for flow variables (for example, deficit as a percentabe of GDP) to convert frequency from annually to quarterly. Thank You in advance.

Mindi

Hi Mindi,

There’s nothing fundamentally different about using cubic spline interpolation on flow data compared to stock data. Keeping the limitations in mind in the above comments, cubic splines should work just fine this.

I’m curious — you mention having doubts about cubic splines interpolation. What are those doubts, if different from the ones I’ve mentioned above?

Dear Sir,

indeed my doubts were related with the ones you have mentioned above. Again thank You for Your answer and explanation.

Sincerely,

Mindi

would you by any chance have a do file or something for STATA to do this? I wasn’t able to open the files linked above and I don’t use matlab, so wasn’t sure how they were done. It would be really helpful, thanks.

Hi Jenn,

You may be having trouble opening the files if you’re using a version of Excel that’s older than 2007. I’ve created a pre-2007 version of the files, which you can try downloading here:

http://the-idea-shop.com/data/Cubic%20Spline%20Interpolation.xls

http://the-idea-shop.com/data/Cubic%20Spline%20by%20Hand.xls

STATA probably has a spline interpolation function similar to MATLAB’s, but I’ve never actually used it. You may want to do some searching through the STATA help files.

I’m using Excel 2007 but they still didn’t open. The new ones you sent works perfectly though- and it’s been a real help. FYI, you have to use the mata (matrix in STATA) to do it – and there’s a mata command file called spline3(x,y) which gives you the coefficients and then you have to run spline3eval(spline,xx) for the next step.

In any case, thanks a lot, especially for the prompt reply.

Hi,

I want to convert the annual schooling data into quarterly ones using the above method but I don’t know which values to start on x and xx? would you help me please, i really need to sort this out for my dissertation.

Thanks so much,

Hi Anh,

To interpolate quarterly values between annual data points, you’ll want to use the following values for X and XX.

Your original data are an annual average. Think of them as having been collected at the “average” time for the year. Since you’re interested in dividing the year into quarters, the average time (in quarters) for the first year (1 + 2 + 3 + 4)/4 = 2.5. So your original X value will be 2.5 for the first year, 2.5 + 4 = 6.5 for the second year, and so on.

For the desired quarterly XX values, simply use the list quarters you’d like to interpolate the data for: 1, 2, 3, and so on.

Hope that helps!

Hi, just to add to this – the common stats agencies method is to give guidelines to the shape between benchmarks. Terminology may differ but its often known as “interpolating” using an INDICATOR series, as opposed to cubic spline interpolation.

Cubic Spline interpolation is essentially the optimal solution of an interpolation using a CONSTANT 0 growth indicator series. (see Legendre polynomials in wikipedia for a v.good writeup).

In the far more preferable indicator method, instead of a constant 0 growth, you often choose an indicator series that would contain another data source that whilst not exactly measuring your series, provides a good approximation AND is available between the benchmarks for the periods you are interested in.

You maintain movements in that indicator series whilst ensuring your totals (reconcile) with the benchmarked series.

This has an infinite solution set so instead you assign some form of error measurement to minimise (I think mostly use a linear regression appx on the reconciled series and work a solution that has the lowest Rsquared), but also ensuring the totals meet the benchmarks.

This is a very powerful technique for filling in all the gaps in time series data and is a key tool (and strangely not well known outside stats agencies) in any data work I do with time series. With a well written function/macro you can apply this everytime to almost any set of time series data, and get a very consistent hierachical time series with data at all levels and all time periods you need.

Re: literature

I’ve just skimmed this but it looks to be a very solid writeup of some of the more commonly accepted methods:

http://www.scirp.org/journal/PaperInformation.aspx?paperID=3396

RE: code, I dont use STATA but this seems to do the job:

http://ideas.repec.org/c/boc/bocode/s422501.html

I think there’s matlab equivalents.

I use SAS but strangely enough there are no actual inbuilt SAS functions to do this and so most people end up having to code their own. And of course they keep their code under lock and key….

For my own work I think its simpler to just write an iterative solution

1. Get pro rata ratio (benchmark / indicator)

2. Cubic interpolate the PRO RATA RATIOS

3. Apply interpolated pro rata ratios to indicator

back to step 1…

rinse/repeat until its close enough for your own work.

Mathematically this probably yields the same solution as one of the other BLUE methods anyways.

hi.. how if i want to convert the monthly data to quarterly data? what should i key in for variable x and xx and yy?

Hi An,

You don’t need to use this interpolation procedure to move from monthly to quarterly data. You simply take a three-month average of the monthly data. That is, if A B and C are your first three months of data, your first quarterly data point is (A + B + C)/3.

Hi

I wonder if published quarterly data is a Qaurter-to-date value. For example GDP for Q1 is 100 in aggregate value, will the monthly data outputs from spline valid? For example, outputs are Jan = 90, Feb = 95 and Mar = 100 and total Q1 = 285. Do you have any suggestions on how to convert this kind of quarterly data in to monthly data?

Hi Bee,

Suppose you start with only quarterly data, and the data are cumulative. That is, each quarter shows the year-to-date value of something. Then Q1 = q1, Q2 = q1 + q2, Q3 = q1 + q2 +q3, and Q4 = q1 + q2 + q3 + q4, where the capital Qs are the cumulative data you’ve been given, and the small qs are the individual quarterly data you want. To use cubic spline interpolation, simply adjust the Q figures into quarterly values of q by subtracting out earlier values from each quarter. That is, let q1 = Q1, q2 = Q2 – Q1, q3 = Q3 – Q2, and q4 = Q4 – Q3. Then you can use cubic spline interpolation on q1, q2 q3 and q4 to estimate monthly values.

Pingback: Stata Do-File for Cubic Spline Interpolation « Chamberlain Economics, L.L.C.

Hi,

I would like to be more clear on your answer regarding conversion of annual data into quarterly data.To be more precise I would like to proceed with an example.Suppose I have 25 observations of annual data.So my desire number of “x” will be 2.5,6.5,10.5 and so on..

But my concerns is in the desired number “XX”.how will I proceed?

thanks

Hi,

I have a more general question regarding interpolation methods. How do you compare different methods of interpolation and choose the best method for a specific data? Is there any measure or test that evaluates interpolation methods and comes up with the best interpolation? Or we have just to rely on a graphical comparison?

Thanks.

Hi,

I am wondering if cubic spline interpolation is the best method to convert cumulative monthly data to weekly data. I have monthly data that are cumulative for the entire month (e.g. trade data), thus no average values. Or is the ‘ipolate’ method better for these kind of data? (in STATA).

Thanks in advance for your reply,

Hi Andrew,

I tried the way u guided to convert data from quarterly to monthly frequency, my data is of GDP from 2000 till 1st quarter of 2011, but Matlab is giving error when executing the spline function, which is as follows,

??? Error using ==> chckxy at 89

The number of sites, 45, is incompatible with

the number of values, 2.

Error in ==> spline at 55

[x,y,sizey,endslopes] = chckxy(x,y);

Could you help me what is the problem?

Many thanks,

Noaman

hii i have a data since 1970 till date andi want to make a regression with another data from 1992-

i dont want to use data from 1992 only since it wil not gve me an accurate estimation

any advise

thanks in advance

There is a free add-in for Excel that will add a cubic spline function directly to Excel. If you install the add-in then you can do cubic spline interpolation directly in Excel. The cubic spline function that is added to Excel is used just like all other built-in Excel functions.

Here is a link to the site:

http://www.srs1software.com/SRS1CubicSplineForExcel.aspx

They also have a demo workbook on the web page that shows an example of interpolating monthly data from quarterly data using the new cubic spline function.

The add-in also adds a linear interpolation function to Excel as well.

Stephen — Thanks for the link! I think that add-on will help a lot of people who find this post searching for help on the web.

Hi!

Thanks for your article!

I’m currently trying to apply your Stata code to convert my quarterly observations from Compustat into monthly data. However I need to do this for a series of firms with different number of quarter markers. Is there any way I can generalize the sting “X = st_data((1,36),”x”)”, where 36 would be replace with an appropriate number of markers for a current firm.

If I just type “X = st_data(.,”x”)” it won’t work.

I can’t figure out how to access stata variables in mata mode.

Many thanks!

Hi

I have a 20 periods yearly data. So starting x is 2.5. How do I go about setting xx? should it be 2.5, 3.5, 4.5, 5.5, 6.5? If yes, then I don’t know why but the yy result column is not getting displayed, It still shows all the yy entries as dots.

Thank You

could you please provide a bit detailed commands for yearly to quarterly interpolation command? Thank You

Hi Andrew,

Do you have any suggestions how to interpolate a quarterly number that is a sum of monthly data? I.e. Jan=10, Feb=30, March=4, so that the known Q1=44. Thank you.

Monika — I’m not sure I understand your question. Right now, you have a known quarterly figure of Q1 = 44, correct? And you want to divide that up into three estimated monthly figures, right? If so, that’s just a typical cubic spline interpolation problem like the one described above in this post. Maybe you can explain your problem better and I might be able to help out. Thanks.

Thanks, Andrew. In the problem you discribe X vector starts at 2 as a middle value. I understand the logic as the quarterly data is an average in your case. In my, it is a sum. In that case, I am not sure what are my starting numbers for a vector x and xx. Am I missing something?Thanks.

Hi Monika — In the case described in the above post, the quarterly data are the sum for the three months, not the average value of the three months. That is, quarterly GDP is GDP for the entire quarter (which consists of the three months), not a monthly average during that quarter. You use x = 2 just because that’s the centerpoint of the quarter. So that’s exactly what you should be doing also. Thanks!

got it, thanks!

Hi Andrew, I just did what you suggested, but I am still puzzled. Let say that the number I have is the number for GDP in Q1 and it is equal to 15265678. After interpolation I get the following numbers: Jan=15265678, Feb=14638131.87, March=15192567.09. The sum from Jan to March is not equal Q1. Thank you.

Hi Monika — Sorry about that… I really I misunderstood your question here. While cubic splines won’t give you an exact interpolation in your case, you can get a pretty decent cubic approximation if you do the following.

First, divide your quarterly figures by 3, making them a monthly average. So for Q1, you’ll divide by three, and that’s the “y” value for x = 2 (the centerpoint of the quarter). Do that for all your quarterly points. Then run cubic spline interpolation on those average values, just as described above.

The resulting “yy” values will be very close to summing to the quarterly total, but they won’t match exactly. So as a final step, simply adjust the figures like this. In an Excel file, sum each of the three yy values for the months in each quarter. This will be either larger or smaller than the quarterly figures you actually have.

To correct this error, divide the actual value for the quarter by the sum of the monthly spline estimates, to get an error ratio. Then just multiply each of the monthly interpolated estimates by this error ratio to “scale” them up or down. The resulting scaled monthly figures will sum to the quarterly total in each quarter.

It’s not a very elegant solution, but it is a decent approximation to a true cubic spline. After all, interpolations are all just arbitrary splitting of data — for example, you don’t know if your monthly data are actually really smooth, or highly volatile, since all you see are quarterly sums — so this approximation seems about as good as a true spline.

Hope that helps!

Thanks, a lot Andrew and sorry if I was not very good in asking questions I will do what you suggest and hopefully won’t trouble you again. It is the first time I am doing this and the literature on the issue is really not vast (or easily comprehensible for non mathematicians)…Warm wishes.

Hi Andrew, can I use spline function to interpolate the quarterly data into monthly when the quarterly data is build in a way that it uses the value of the last month of that quarter, i.e. if we have the following monthly data:

32837

32882

33441

35033

35442

34845, etc

the value for Q1 is 33441, for Q2 is 34845, etc?

Thank you for your advice.

Hi Monika,

In that case, you just need to change the “x” values to reflect that you have data for the third month rather than the second. So your “x” values should be (3, 6, 9, …). I think that will work for you. Do the spline interpolation that way and just check that all the point for the 3rd, 6th, etc. months match your original data. Thanks.

Hi Andrew, it works, but you also have to change the ‘xx’ vector, i.e. it has to start from 1 and not 3 or 2, which makes sense, I believe. Thank you.

Hi Andrew,

I have data from 1960-2011 to study Ecaudor’s import demand function. However, the data from 1960-1990 are in annual form; from 1990-2011 I can access the data in either quarterly or annual format. These data are taken from the IMF’s database.

Is it better to stick with annual data, or should I transform the annual data into quarterly data? Will this cause problems?

thank you very much