Making Regression Coefficient Plots in Stata
CGA’s John V. Kane provides a concise guide for making publication-quality graphs of regression results using Stata. The guide covers the most common aesthetic changes, combining multiple models, working with binary outcome models, and more, with plenty of other little tips and tricks along the way. See the full post here.
INTRO TO COEFPLOT
Traditionally, researchers reported results of regression analyses using tables. A more visually appealing way of presenting these results is by using a coefficient plot. Unlike a typical scatterplot with a fitted line (or a “marginsplot”), a coefficient plot displays multiple coefficients from the model — or from several models — at once.
To do this, Stata users can install Ben Jann’s extremely popular “coefplot” package:
There is an impressively comprehensive online resource for coefplot, which I have used a ton. Here is a link to it.
The purpose of this guide is to provide researchers with a concise resource that contains some of the most common commands and options for producing publication-quality coefficient plots using Stata. It’s the product of many years of me banging my head against my desk with a computer screen full of error messages. The goal is simply to make nicer coefficient plots that (ideally) don’t require any additional editing via Stata’s Graph Editor (though that is perfectly fine as a last resort).
The .do file containing all the code below is located here.
Notes before we get started:
- Graphs will use schemes from “schemepack” by Asjad Naqvi. You can explore all your awesome new schemes by executing “graph query, schemes”:
2. Note: All graphs use “AbelPro-Regular” font, which can be downloaded here. For details on installing/using fonts that are not native to Stata, see here.
BASIC COMMANDS & ESSENTIAL OPTIONS FOR EVERY COEFPLOT
We’ll begin by importing everybody’s favorite dataset: auto.dta
We’ll regress price onto mpg, weight, length, and foreign. To produce a basic coefficient plot, simply execute coefplot after the model:
If using a version of Stata that is pre-v.18, your graph will use the heartbreakingly dull “s2color” scheme, which is shown below. (Version 18 uses the new “stcolor” scheme. So as not to exclude pre-18 users, I will use “schemepack” schemes in the examples below.)
This graph shows point estimates for each variable, as well as the constant, with 95% confidence intervals. That said, it’s certainly not the prettiest graph you’ve probably ever seen.
Some essential first options for making every coefplot nicer and more informative:
1. Remove the constant using “drop(_cons)”: the constant is rarely useful and will often (though not in the example above) extend the x-axis by a large amount to accommodate the value of the constant, making the coefficients difficult to read.
2. Add a vertical line at x=0 using “xline(0)”: This helps readers see the coefficients (and their CIs) in relation to 0, which is central to null hypothesis significance testing. The “lcolor( )” option controls the color of the line; the “lwidth( )” option controls the line’s width.
3. Change the scheme using “scheme( )”: Please don’t use s2color. Please. I’m begging you.
Implementing these options, with “scheme(white_jet)” produces a graph with nicer colors, but in this case looks a bit weird because the effect of “foreign” is enormous compared to the effects of any of the other variables (usually this is what will happen when the constant is left in rather than removed — this just happens to be an unusual case, I promise!). As such, before making another graph, I am going to put all of the predictor variables on a 0 to 1 scale using a simple trick:
Now that I have all the predictors on a 0-to-1 scale, I run the following:
Which produces:
This looks nicer than the original in many ways. But it can still be customized and improved upon quite a bit, starting with the labels for the variables/values. (Note: In the model, using “i.” before “foreign” leads to the value label being used in the graph; had only “foreign” been specified in the model, the variable label would appear in the graph instead.)