# Coefficient plots in `ggplot`

Drawing coefficient plots in R and `ggplot`

.

Recently a colleague asked how they could quickly draw a coefficient plot in R. In Stata, this is relatively simple - just run your regression and use the command `coefplot`

afterwards. This produces a graphic that is perfectly acceptable for exploratory data analysis, but leaves something to be desired if you want to use it in a publication.

This post shows you how to draw coefficient plots in R and **ggplot**, and is extensible for use with regressions beyond the basic `lm`

command.

### Data

For this example we will use data from the lovely **Palmer Penguins** package from Allison Horst. The penguins dataset is a great toy dataset for exploration and visualization, based on genuine data collected by Dr. Kristen Gorman at the Palmer Station in Antarctica.

### Basic regression

We begin with a basic regression where our dependent variable is penguin body weight in grams, and independent variables are the dimensions of the penguins’ bills, flipper length, as well as species and sex.

```
library(tidyverse)
theme_set(theme_light())
# read in data
<- palmerpenguins::penguins
df
# basic regression
<- df %>%
basic_reg lm(body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm + species + sex, data = .)
basic_reg
```

```
Call:
lm(formula = body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm +
species + sex, data = .)
Coefficients:
(Intercept) bill_length_mm bill_depth_mm flipper_length_mm
-1460.99 18.20 67.22 15.95
speciesChinstrap speciesGentoo sexmale
-251.48 1014.63 389.89
```

This produces a rather messy output. It includes both the regression specification and the coefficients.

### Tidy with **broom** package

We can use the **broom** package to return a `tibble`

, a neat data object that is easy to work with.

```
library(broom)
<- df %>%
tidy_reg lm(body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm + species + sex, data = .) %>%
tidy()
tidy_reg
```

```
# A tibble: 7 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -1461. 571. -2.56 1.10e- 2
2 bill_length_mm 18.2 7.11 2.56 1.09e- 2
3 bill_depth_mm 67.2 19.7 3.40 7.45e- 4
4 flipper_length_mm 16.0 2.91 5.48 8.44e- 8
5 speciesChinstrap -251. 81.1 -3.10 2.09e- 3
6 speciesGentoo 1015. 130. 7.83 6.85e-14
7 sexmale 390. 47.8 8.15 7.97e-15
```

Great! This output is much easier to deal with.

### Coefficient plot

Let’s try and make a coefficient plot.

```
%>%
tidy_reg filter(term != "(Intercept)") %>%
# reorder the coefficients so that the largest is at the top of the plot
mutate(term = fct_reorder(term, estimate)) %>%
ggplot(aes(estimate, term)) +
geom_point() +
# add in a dotted line at zero
geom_vline(xintercept = 0, lty = 2) +
labs(
x = "Estimate of effect of variable on body mass (in grams)",
y = NULL,
title = "Coefficient plot"
)
```

We can see that relative to Adelie penguins (the base category), Gentoo penguins weigh more and Chinstrap penguins weigh less.

Further, male penguins weigh more than females.

### Error bars

To get the error bars, we specify that we want a confidence interval when we use the `tidy`

command from the **broom** package, like so: `tidy(conf.int = TRUE)`

```
<- df %>%
tidy_reg_conf_int lm(body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm + species + sex, data = .) %>%
tidy(conf.int = TRUE)
tidy_reg_conf_int
```

```
# A tibble: 7 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -1461. 571. -2.56 1.10e- 2 -2585. -337.
2 bill_length_mm 18.2 7.11 2.56 1.09e- 2 4.22 32.2
3 bill_depth_mm 67.2 19.7 3.40 7.45e- 4 28.4 106.
4 flipper_length_mm 16.0 2.91 5.48 8.44e- 8 10.2 21.7
5 speciesChinstrap -251. 81.1 -3.10 2.09e- 3 -411. -92.0
6 speciesGentoo 1015. 130. 7.83 6.85e-14 760. 1270.
7 sexmale 390. 47.8 8.15 7.97e-15 296. 484.
```

Now in our `tibble`

we get columns called `conf.low`

and `conf.high`

.

To plot these, we use an additional geometry in our ggplot, called `geom_errorbarh`

. Here the `h`

at the end specifies we want it in the horizontal direction. We map the `conf.low`

and `conf.high`

variables to `xmin`

and `xmax`

respectively.

```
%>%
tidy_reg_conf_int filter(term != "(Intercept)") %>%
# reorder the coefficients so that the largest is at the top of the plot
mutate(term = fct_reorder(term, estimate)) %>%
ggplot(aes(estimate, term)) +
geom_point() +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high)) +
# add in a dotted line at zero
geom_vline(xintercept = 0, lty = 2) +
labs(
x = "Estimate of effect of variable on body mass (in grams)",
y = NULL,
title = "Coefficient plot with error bars"
)
```

Because none of the error bars cross the dotted line at zero, we conclude that the point estimates of the effects are significantly different from zero at the five percent level.

### Categories of predictor

Finally, let’s facet the variables by type.

We use the `case_when`

command from **dplyr** to call any variable containing `mm`

a numeric variable, and the others categorical. We colour our bars and points by variable type, and `facet_wrap`

to make a small multiple chart. This makes it easy to differentiate between the different types of variables.

```
%>%
tidy_reg_conf_int filter(term != "(Intercept)") %>%
# create a type variable with case_when
mutate(type = case_when(
str_detect(term, "mm") ~ "Numeric variable",
TRUE ~ "Categorical variable"
%>%
)) # reorder the coefficients so that the largest is at the top of the plot
mutate(term = fct_reorder(term, estimate)) %>%
ggplot(aes(estimate, term, colour = type)) +
geom_point() +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high)) +
# add in a dotted line at zero
geom_vline(xintercept = 0, lty = 2) +
# remove the legend as the facet show that information
theme(legend.position = "none") +
# facet by type
facet_wrap(~type, scales = "free_y") +
labs(
x = "Estimate of effect of variable on body mass (in grams)",
y = NULL,
title = "Coefficient plot with error bars",
subtitle = "By variable type"
)
```

In this way we can see that the scaling of the variables can have an impact on how we perceive the results. Because the bill and flipper dimensions are measured in mm, and the body mass in grams, we are seeing the effect of an additional mm of bill depth, for example, on body mass in grams, which appears quite small.

### Conclusion

Great - I hope that this was useful! It’s possible to easily customise your coefficient plots in **ggplot**, adding labels and colours to help your reader understand your regression results in a glance.