A Stata Do-File for Cubic Spline Interpolation

As a follow up to our post on using cubic spline interpolation in Matlab, we’ve posted a Stata do file explaining how to divide quarterly data into monthly estimates. The do-file is here. For our original post explaining the data and methodology, see here.

Enjoy!

(Note: once you generate the “yy” variable, be sure to save your Stata file before running this do file, as otherwise Stata returns an error.)

Advertisements

New Methodology Paper on Building a Square U.S. Input-Output Table

We’ve posted an updated version of our methodology paper explaining how Chamberlain Economics, L.L.C. develops square U.S. input-output tables based on data from the B.E.A.’s benchmark input-output accounts. Check it out here.

New Study of the Kerry-Lieberman “American Power Act”

We’ve released our latest study today, an economic analysis of the “American Power Act” or Kerry-Lieberman cap-and-trade bill. The study explores the bill in detail, discussing several theoretical issues raised by the legislation as well as providing new distributional estimates of the bill’s cost to U.S. households. Check out the full study here.

Now Available: Carbon Content of U.S. Products and Industries

We’ve added a new data set to our “Data Products” page — detailed estimates of the fossil-fuel carbon emitted by U.S. industries.

The data are the most comprehensive estimates we’ve done to date, covering 426 industries and the more than 5,300 products produced by them, ranging from soy beans to jet fuel refining. The data are formatted into two easy-to-use Excel files, one with the industry estimates and one for the product estimates. (See a preview of the data here.)

These data are a terrific short-cut for researchers, students and others looking to estimate the carbon impact of various products or industries without building their own input-output or CGE model to do so. All estimates are based on our Chamberlain Economics, L.L.C. Input-Output Model, which is designed to model carbon flows in the U.S. economy.

See our “Data Products” page for ordering information.

How Economists Convert Quarterly Data into Monthly: Cubic Spline Interpolation

(Update: Please note that cubic spline interpolation can only provide estimates of data between known data points. It cannot “create” unknown data. For example, if only annual figures exist for your data set, then annual observations are the only real data. Cubic splining between them to generate monthly estimates is only an approximation technique, and does not provide new actual monthly figures). For this reason, the below method may not be appropriate for all research purposes.)

A common problem economists face with time-series data is getting them into the right time interval. Some data are daily or weekly, while others are in monthly, quarterly or annual intervals. Since most regression models require consistent time intervals, an econometrician’s first job is usually getting data into the same frequency.

In this post I’ll explain how to solve a common problem we’ve run into: how to divide quarterly data into monthly data for econometric analysis. To do so, we’ll use a method known as “cubic spline interpolation.” In the example below we use Matlab and Excel. For Stata users, I’ve posted a Stata do file that illustrates how to work through the below example in Stata.

Cubic Spline Interpolation
One of the most widely used data sources in economics is the National Income and Product Accounts (NIPAs) from the U.S. Bureau of Economic Analysis. They’re the official source for U.S. GDP, personal income, trade flows and more. Unfortunately, most data are published only quarterly or annually. So if you’re hoping to run a regression using monthly observations — for example, this simple estimate of the price elasticity of demand for gasoline — you’ll need to split these quarterly data into monthly ones.

A common way to do this is by “cubic spline interpolation.” Here’s how it works. We start with n quarterly data points. That means we have n-1 spaces between them. Across each space, we draw a unique 3rd-degree (or “cubic”) polynomial connecting the two points. This is called a “piecewise polynomial” function.

To make sure our connecting lines form a smooth line, we force all our first and second derivatives to be continuous; that is, at each connecting point we make them equal to the derivitive on either side. When all these requirements are met — along with a couple end-point conditions you can read about here — we have a (4n-4) x (4n-4) linear system that can be solved for the coefficients of all n-1 cubic polynomials.

Once we have these n-1 piecewise polynomials, we can plug in x values for whatever time intervals we want: monthly, weekly or even daily. The polynomials will give us a pretty good interpolation between our known quarterly data points.

An Example Using MATLAB
While the above method seems simple, doing cubic splines by hand is not. A spline for just four data points requires setting up and solving a 12 x 12 linear system, then manually evaluating three different polynomials at the desired x values. That’s a lot of work. To get a sense of how hard this is, here’s an Excel file showing what’s involved in fitting a cubic spline to four data points by hand.

In practice, the best way to do a cubic spline is to use MATLAB. It takes about five minutes. Here’s how to do it.

MATLAB has a built-in “spline()” function that does the dirty work of cubic spline interpolation for you. It requires three inputs: a list of x values from the quarterly data you want to split; a list of y values from the quarterly data; and a list of x values for the monthly time intervals you want. The spline() function formulates the n-1 cubic polynomials, evaluates them at your desired x values, and gives you a list of interpolated monthly y values.

Here’s an Excel file showing how to use MATLAB to split quarterly data into monthly. In the file, the first two columns are quarterly values from BEA’s Personal Income series. Our goal is to convert these into monthly values. The next three columns (highlighted in yellow) are the three inputs MATLAB needs: the original quarterly x values (x); the original quarterly y values (y); and the desired monthly x values (xx).

In the Excel file, note that the first quarter is listed as month 2, the second quarter as month 5, and so on. Why is this? BEA’s quarterly data represent an average value over the three-month quarter. That means they should be treated as a mid-point of the quarter. For Q1 that’s month 2, for Q2 that’s month 5, and so on.

The next step is to open MATLAB and paste in these three columns of data. In MATLAB, type ” x = [ “, cut and paste the column of x values in from Excel, type ” ] ” and hit return. This creates an n x 1 vector with the x values. Repeat this for the y, and xx values in the Excel file.

Once you have x, y, and xx defined in MATLAB, type “yy = spline(x,y,xx)” and hit return. This will create a new vector yy with the interpolated monthly y values we’re looking for. Each entry in yy will correspond to one of the x values you specified in the xx vector.

Copy these yy values from MATLAB, paste them into Excel, and we’re done. We now have an estimated monthly Personal Income series.

Here’s an Excel file summarizing the above example for splitting quarterly Personal Income data into monthly using MATLAB. Also, here’s a MATLAB file with the x, y, xx, and yy vectors from the above exercise.

Note: For Stata users, here’s a “do” file with an example that performs the above cubic spline interpolation in mata. Note that Stata and Matlab use slightly different endpoint conditions for the cubic spline, so they’ll give slightly different results toward the beginning and end of the data set. (Once you generate the “yy” variable in Stata, be sure to save your file or else Stata will give an error when you run the above do file.)

An Econometric Warning: If you plan to do regression analysis with data that’s been interpolated by cubic splines, be aware that doing so will introduce a systematic source of serial correlation in your regressors. That is, the data points interpolated by cubic splines will be systematically related to each other by a cubic polynomial, so they will violate the standard OLS assumption about no autocorrelation. What does this mean? If you do any statistical inference using splined data — that is, if you test the significance of your regression coefficients or report standard errors — be sure to use Newey-West standard errors rather than the usual ones, as they’re robust to the autocorrelation introduced by cubic splines. You can do this using the “newey” command in STATA, rather than “reg”.

New Study of the Waxman-Markey Cap-and-Trade Bill

We’re pleased to release our latest study today, examining the economic impact on U.S. households of the Waxman-Markey cap-and-trade bill (H.R. 2454).

The study is basically a critique of recent Congressional Budget Office (CBO) distributional estimates that suggest the bill’s impact is likely to be progressive across income groups. We find the bill is much more likely to be regressive across income groups once the microeconomic response of regulated public utilities is taken into account. Under this framework, we estimate the bill will result in net benefits to the nation’s highest-earning 20 percent of earners, while imposing net costs on the lowest-earning 80 percent of U.S. households.

Check out the full study and news release here.

New Study of Household Burdens from a U.S. Cap-and-Trade System

We’ve released a new study today exploring the likely cost to U.S. households of a typical cap-and-trade system aimed at cutting carbon emissions by 15 percent. The study uses a standard input-output model to estimate how the costs of cap-and-trade regulations will be borne by households in various income groups, age groups, family types and U.S. regions. The study is No. 6 in the Working Paper series at the Tax Foundation in Washington, D.C.

You can view the full study here.