A Stata Do-File for Cubic Spline Interpolation

As a follow up to our post on using cubic spline interpolation in Matlab, we’ve posted a Stata do file explaining how to divide quarterly data into monthly estimates. The do-file is here. For our original post explaining the data and methodology, see here.

Enjoy!

(Note: once you generate the “yy” variable, be sure to save your Stata file before running this do file, as otherwise Stata returns an error.)

New Methodology Paper on Building a Square U.S. Input-Output Table

We’ve posted an updated version of our methodology paper explaining how Chamberlain Economics, L.L.C. develops square U.S. input-output tables based on data from the B.E.A.’s benchmark input-output accounts. Check it out here.

New Study of the Kerry-Lieberman “American Power Act”

We’ve released our latest study today, an economic analysis of the “American Power Act” or Kerry-Lieberman cap-and-trade bill. The study explores the bill in detail, discussing several theoretical issues raised by the legislation as well as providing new distributional estimates of the bill’s cost to U.S. households. Check out the full study here.

Now Available: Carbon Content of U.S. Products and Industries

We’ve added a new data set to our “Data Products” page — detailed estimates of the fossil-fuel carbon emitted by U.S. industries.

The data are the most comprehensive estimates we’ve done to date, covering 426 industries and the more than 5,300 products produced by them, ranging from soy beans to jet fuel refining. The data are formatted into two easy-to-use Excel files, one with the industry estimates and one for the product estimates. (See a preview of the data here.)

These data are a terrific short-cut for researchers, students and others looking to estimate the carbon impact of various products or industries without building their own input-output or CGE model to do so. All estimates are based on our Chamberlain Economics, L.L.C. Input-Output Model, which is designed to model carbon flows in the U.S. economy.

See our “Data Products” page for ordering information.

How Economists Convert Quarterly Data into Monthly: Cubic Spline Interpolation

(Update: Please note that cubic spline interpolation can only provide estimates of data between known data points. It cannot “create” unknown data. For example, if only annual figures exist for your data set, then annual observations are the only real data. Cubic splining between them to generate monthly estimates is only an approximation technique, and does not provide new actual monthly figures). For this reason, the below method may not be appropriate for all research purposes.)

A common problem economists face with time-series data is getting them into the right time interval. Some data are daily or weekly, while others are in monthly, quarterly or annual intervals. Since most regression models require consistent time intervals, an econometrician’s first job is usually getting data into the same frequency.

In this post I’ll explain how to solve a common problem we’ve run into: how to divide quarterly data into monthly data for econometric analysis. To do so, we’ll use a method known as “cubic spline interpolation.” In the example below we use Matlab and Excel. For Stata users, I’ve posted a Stata do file that illustrates how to work through the below example in Stata.

Cubic Spline Interpolation
One of the most widely used data sources in economics is the National Income and Product Accounts (NIPAs) from the U.S. Bureau of Economic Analysis. They’re the official source for U.S. GDP, personal income, trade flows and more. Unfortunately, most data are published only quarterly or annually. So if you’re hoping to run a regression using monthly observations — for example, this simple estimate of the price elasticity of demand for gasoline — you’ll need to split these quarterly data into monthly ones.

A common way to do this is by “cubic spline interpolation.” Here’s how it works. We start with n quarterly data points. That means we have n-1 spaces between them. Across each space, we draw a unique 3rd-degree (or “cubic”) polynomial connecting the two points. This is called a “piecewise polynomial” function.

To make sure our connecting lines form a smooth line, we force all our first and second derivatives to be continuous; that is, at each connecting point we make them equal to the derivitive on either side. When all these requirements are met — along with a couple end-point conditions you can read about here — we have a (4n-4) x (4n-4) linear system that can be solved for the coefficients of all n-1 cubic polynomials.

Once we have these n-1 piecewise polynomials, we can plug in x values for whatever time intervals we want: monthly, weekly or even daily. The polynomials will give us a pretty good interpolation between our known quarterly data points.

An Example Using MATLAB
While the above method seems simple, doing cubic splines by hand is not. A spline for just four data points requires setting up and solving a 12 x 12 linear system, then manually evaluating three different polynomials at the desired x values. That’s a lot of work. To get a sense of how hard this is, here’s an Excel file showing what’s involved in fitting a cubic spline to four data points by hand.

In practice, the best way to do a cubic spline is to use MATLAB. It takes about five minutes. Here’s how to do it.

MATLAB has a built-in “spline()” function that does the dirty work of cubic spline interpolation for you. It requires three inputs: a list of x values from the quarterly data you want to split; a list of y values from the quarterly data; and a list of x values for the monthly time intervals you want. The spline() function formulates the n-1 cubic polynomials, evaluates them at your desired x values, and gives you a list of interpolated monthly y values.

Here’s an Excel file showing how to use MATLAB to split quarterly data into monthly. In the file, the first two columns are quarterly values from BEA’s Personal Income series. Our goal is to convert these into monthly values. The next three columns (highlighted in yellow) are the three inputs MATLAB needs: the original quarterly x values (x); the original quarterly y values (y); and the desired monthly x values (xx).

In the Excel file, note that the first quarter is listed as month 2, the second quarter as month 5, and so on. Why is this? BEA’s quarterly data represent an average value over the three-month quarter. That means they should be treated as a mid-point of the quarter. For Q1 that’s month 2, for Q2 that’s month 5, and so on.

The next step is to open MATLAB and paste in these three columns of data. In MATLAB, type ” x = [ “, cut and paste the column of x values in from Excel, type ” ] ” and hit return. This creates an n x 1 vector with the x values. Repeat this for the y, and xx values in the Excel file.

Once you have x, y, and xx defined in MATLAB, type “yy = spline(x,y,xx)” and hit return. This will create a new vector yy with the interpolated monthly y values we’re looking for. Each entry in yy will correspond to one of the x values you specified in the xx vector.

Copy these yy values from MATLAB, paste them into Excel, and we’re done. We now have an estimated monthly Personal Income series.

Here’s an Excel file summarizing the above example for splitting quarterly Personal Income data into monthly using MATLAB. Also, here’s a MATLAB file with the x, y, xx, and yy vectors from the above exercise.

Note: For Stata users, here’s a “do” file with an example that performs the above cubic spline interpolation in mata. Note that Stata and Matlab use slightly different endpoint conditions for the cubic spline, so they’ll give slightly different results toward the beginning and end of the data set. (Once you generate the “yy” variable in Stata, be sure to save your file or else Stata will give an error when you run the above do file.)

An Econometric Warning: If you plan to do regression analysis with data that’s been interpolated by cubic splines, be aware that doing so will introduce a systematic source of serial correlation in your regressors. That is, the data points interpolated by cubic splines will be systematically related to each other by a cubic polynomial, so they will violate the standard OLS assumption about no autocorrelation. What does this mean? If you do any statistical inference using splined data — that is, if you test the significance of your regression coefficients or report standard errors — be sure to use Newey-West standard errors rather than the usual ones, as they’re robust to the autocorrelation introduced by cubic splines. You can do this using the “newey” command in STATA, rather than “reg”.

New Study of the Waxman-Markey Cap-and-Trade Bill

We’re pleased to release our latest study today, examining the economic impact on U.S. households of the Waxman-Markey cap-and-trade bill (H.R. 2454).

The study is basically a critique of recent Congressional Budget Office (CBO) distributional estimates that suggest the bill’s impact is likely to be progressive across income groups. We find the bill is much more likely to be regressive across income groups once the microeconomic response of regulated public utilities is taken into account. Under this framework, we estimate the bill will result in net benefits to the nation’s highest-earning 20 percent of earners, while imposing net costs on the lowest-earning 80 percent of U.S. households.

Check out the full study and news release here.

New Study of Household Burdens from a U.S. Cap-and-Trade System

We’ve released a new study today exploring the likely cost to U.S. households of a typical cap-and-trade system aimed at cutting carbon emissions by 15 percent. The study uses a standard input-output model to estimate how the costs of cap-and-trade regulations will be borne by households in various income groups, age groups, family types and U.S. regions. The study is No. 6 in the Working Paper series at the Tax Foundation in Washington, D.C.

You can view the full study here.

Core Concepts: The Economics of Tax Incidence

All good economics starts with theory. The world is a complicated place—far too complex to make sense of directly. Economic theory helps collapse that complexity into a few key relationships we can work out mathematically and check against the facts. The first step in every analysis is to sit down with pencil and pad to work out the theory.

To help our clients better understand the economic theory underlying our work, we’ll be posting an ongoing series of articles titled “Core Concepts.” The goal is to provide a collection of simple and brief introductions to the core theoretical concepts used by Chamberlain Economics, L.L.C.

As the first in the series we’ve posted “Core Concepts: The Economics of Tax Incidence“. The piece is designed as a refresher on the basics of tax incidence, and how it’s derived analytically from elasticities of supply and demand in the marketplace. This idea serves as the foundation for nearly all of our work on tax modeling and policy analysis.

Check out the article here.

U.S. Input-Output Tables from BEA Data: Available Now

One of the hard parts about building Leontief input-output models is that the source data are hard to use.

Instead of producing a square industry-by-industry input-output table, the Bureau of Economic Analysis (BEA) produces rectangular “use” and “make” tables. The make table shows products produced by each industry, while the use table shows how products get used by industries, consumers, government and the rest of the world. However, what we need for Leontief models is a square table that shows only the industry-by-industry relationships.

I’ve you’re a researcher who has run into this problem, I have good news. Our economists have produced summary-level (134 industries) and detailed-level (426 industries) input-output tables from BEA data, which are now for sale on our “Data Products” page. We’ve also written up a methodology paper explaining how the tables are derived, allowing you to reproduce them quickly and easily. See here to place an order today.

New Study of Business & Occupation Tax Pyramiding

We’ve released the latest Chamberlain Economics study this week, which examines tax pyramiding from Washington State’s Business & Occupation (B&O) tax.

Gross receipts taxes like the B&O tax work like a sales tax, except they apply to business inputs as well as final goods. For a baker selling loaves of bread, the flour, electricity and packaging are all taxed first, then the loaf itself is taxed when sold to consumers. These extra layers of taxation get quietly built into the final selling price—something economists call “tax pyramiding.”

Here’s the abstract for the piece:

Using newly released 2002 Washington State input-output data, we provide the first estimates of tax pyramiding from the state’s Business & Occupation (B&O) tax since 2001. We find tax pyramiding is more severe than found by previous studies that did not distinguish between imported and domestically produced products. We find the B&O tax pyramids an average of 3.0 times, ranging from 1.6 times on architectural, engineering and computing services to 16.7 times on petroleum and coal products manufacturing.

A file with some tables of findings is here. If you’d like to learn how we can develop an input-output model like this for your own study, give us a call today.