-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path14.Rmd
2896 lines (2309 loc) · 147 KB
/
14.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
```{r, echo = F, cache = F}
knitr::opts_chunk$set(fig.retina = 2.5)
knitr::opts_chunk$set(fig.align = "center")
options(width = 110)
```
# Adventures in Covariance
> In this chapter, you'll see how to... specify **varying slopes** in combination with the varying intercepts of the previous chapter. This will enable pooling that will improve estimates of how different units respond to or are influenced by predictor variables. It will also improve estimates of intercepts, by borrowing information across parameter types. Essentially, varying slopes models are massive interaction machines. They allow every unit in the data to have its own response to any treatment or exposure or event, while also improving estimates via pooling. When the variation in slopes is large, the average slope is of less interest. Sometimes, the pattern of variation in slopes provides hints about omitted variables that explain why some units respond more or less. We'll see an example in this chapter.
>
> The machinery that makes such complex varying effects possible will be used later in the chapter to extend the varying effects strategy to more subtle model types, including the use of continuous categories, using **Gaussian process**. Ordinary varying effects work only with discrete, unordered categories, such as individuals, countries, or ponds. In these cases, each category is equally different from all of the others. But it is possible to use pooling with categories such as age or location. In these cases, some ages and some locations are more similar than others. You'll see how to model covariation among continuous categories of this kind, as well as how to generalize the strategy to seemingly unrelated types of models such as phylogenetic and network regressions. Finally, we'll circle back to causal inference and use our new powers over covariance to go beyond the tools of Chapter 6, introducing **instrumental variables**. Instruments are ways of inferring cause without closing backdoor paths. However they are very tricky both in design and estimation. [@mcelreathStatisticalRethinkingBayesian2020, pp. 436--437, **emphasis** in the original]
## Varying slopes by construction
> How should the robot pool information across intercepts and slopes? By modeling the joint population of intercepts and slopes, which means by modeling their covariance. In conventional multilevel models, the device that makes this possible is a joint multivariate Gaussian distribution for all of the varying effects, both intercepts and slopes. So instead of having two independent Gaussian distributions of intercepts and of slopes, the robot can do better by assigning a two-dimensional Gaussian distribution to both the intercepts (first dimension) and the slopes (second dimension). (p. 437)
#### Rethinking: Why Gaussian?
> There is no reason the multivariate distribution of intercepts and slopes must be Gaussian. But there are both practical and epistemological justifications. On the practical side, there aren't many multivariate distributions that are easy to work with. The only common ones are multivariate Gaussian and multivariate Student-t distributions. On the epistemological side, if all we want to say about these intercepts and slopes is their means, variances, and covariances, then the maximum entropy distribution is multivariate Gaussian. (p. 437)
As it turns out, **brms** does currently allow users to use the multivariate Student-$t$ distribution in this way. For details, check out [this discussion from the **brms** GitHub repository](https://github.com/paul-buerkner/brms/issues/231). Bürkner's exemplar syntax from his comment on May 13, 2018, was `y ~ x + (x | gr(g, dist = "student"))`. I haven't experimented with this, but if you do, do consider [sharing how it went](https://github.com/ASKurz/Statistical_Rethinking_with_brms_ggplot2_and_the_tidyverse_2_ed/issues).
### Simulate the population.
If you follow this section closely, it's a great template for simulating multilevel code for any of your future projects. You might think of this as an alternative to a frequentist power analysis. Vourre has done [some nice work along these lines](https://gitlab.com/vuorre/bayesplan), I have a [blog series](https://solomonkurz.netlify.com/post/bayesian-power-analysis-part-i/) on Bayesian power analysis, and Kruschke covered the topic in Chapter 13 of his [-@kruschkeDoingBayesianData2015] [text](https://sites.google.com/site/doingbayesiandataanalysis/).
```{r}
a <- 3.5 # average morning wait time
b <- -1 # average difference afternoon wait time
sigma_a <- 1 # std dev in intercepts
sigma_b <- 0.5 # std dev in slopes
rho <- -.7 # correlation between intercepts and slopes
# the next three lines of code simply combine the terms, above
mu <- c(a, b)
cov_ab <- sigma_a * sigma_b * rho
sigma <- matrix(c(sigma_a^2, cov_ab,
cov_ab, sigma_b^2), ncol = 2)
```
It's common to refer to a covariance matrix as $\mathbf \Sigma$. The mathematical notation for those last couple lines of code is
$$
\mathbf \Sigma = \begin{bmatrix} \sigma_\alpha^2 & \sigma_\alpha \sigma_\beta \rho \\ \sigma_\alpha \sigma_\beta \rho & \sigma_\beta^2 \end{bmatrix}.
$$
Anyway, if you haven't used the `matrix()` function before, you might get a sense of the elements like so.
```{r}
matrix(1:4, nrow = 2, ncol = 2)
```
This next block of code will finally yield our café data.
```{r, message = F, warning = F}
library(tidyverse)
sigmas <- c(sigma_a, sigma_b) # standard deviations
rho <- matrix(c(1, rho, # correlation matrix
rho, 1), nrow = 2)
# now matrix multiply to get covariance matrix
sigma <- diag(sigmas) %*% rho %*% diag(sigmas)
# how many cafes would you like?
n_cafes <- 20
set.seed(5) # used to replicate example
vary_effects <-
MASS::mvrnorm(n_cafes, mu, sigma) %>%
data.frame() %>%
set_names("a_cafe", "b_cafe")
head(vary_effects)
```
Let's make sure we're keeping this all straight. `a_cafe` = our café-specific intercepts; `b_cafe` = our café-specific slopes. These aren't the actual data, yet. But at this stage, it might make sense to ask *What's the distribution of `a_cafe` and `b_cafe`?* Our variant of Figure 14.2 contains the answer.
For our plots in this chapter, we'll make our own custom **ggplot2** theme. The color palette will come from the "pearl_earring" palette of the [**dutchmasters** package](https://github.com/EdwinTh/dutchmasters) [@R-dutchmasters]. You can learn more about the original painting, Vermeer's [-@vermeerGirlPearlEarring1665] *Girl with a Pearl Earring*, [here](https://en.wikipedia.org/wiki/Girl_with_a_Pearl_Earring).
```{r, fig.width = 8, fig.height = 3.75}
# devtools::install_github("EdwinTh/dutchmasters")
library(dutchmasters)
dutchmasters$pearl_earring
scales::show_col(dutchmasters$pearl_earring)
```
We'll name our custom theme `theme_pearl_earring()`. I cobbled together this approach to defining a custom **ggplot2** theme with help from
* [Chapter 19](https://ggplot2-book.org/programming.html) of Wichkam's [-@wickhamGgplot2ElegantGraphics2016] *ggplot2: Elegant graphics for data analysis*;
* [Section 4.6](https://bookdown.org/rdpeng/RProgDA/building-a-new-theme.html) of Peng, Kross, and Anderson's [-@pengMasteringSoftwareDevelopment2017] [*Mastering Software Development in R*](https://bookdown.org/rdpeng/RProgDA/building-a-new-theme.html);
* Lea Waniek's blog post, [*Custom themes in ggplot2*](https://www.statworx.com/de/blog/custom-themes-in-ggplot2/), and
* Joey Stanley's blog post of the same name, [*Custom themes in ggplot2*](https://joeystanley.com/blog/custom-themes-in-ggplot2).
```{r}
theme_pearl_earring <- function(light_color = "#E8DCCF",
dark_color = "#100F14",
my_family = "Courier",
...) {
theme(line = element_line(color = light_color),
text = element_text(color = light_color, family = my_family),
axis.line = element_blank(),
axis.text = element_text(color = light_color),
axis.ticks = element_line(color = light_color),
legend.background = element_rect(fill = dark_color, color = "transparent"),
legend.key = element_rect(fill = dark_color, color = "transparent"),
panel.background = element_rect(fill = dark_color, color = light_color),
panel.grid = element_blank(),
plot.background = element_rect(fill = dark_color, color = dark_color),
strip.background = element_rect(fill = dark_color, color = "transparent"),
strip.text = element_text(color = light_color, family = my_family),
...)
}
# now set `theme_pearl_earring()` as the default theme
theme_set(theme_pearl_earring())
```
Note how our custom `theme_pearl_earing()` function has a few adjustable parameters. Feel free to play around with alternative settings to see how they work. If we just use the defaults as we have defined them, here is our Figure 14.2.
```{r, fig.width = 3.25, fig.height = 3}
vary_effects %>%
ggplot(aes(x = a_cafe, y = b_cafe)) +
geom_point(color = "#80A0C7") +
geom_rug(color = "#8B9DAF", size = 1/7)
```
Again, these are not "data." Figure 14.2 shows a distribution of *parameters*. Here's their Pearson's correlation coefficient.
```{r}
cor(vary_effects$a_cafe, vary_effects$b_cafe)
```
Since there are only 20 rows in our `vary_effects` simulation, it shouldn't be a surprise that the Pearson's correlation is a bit off from the population value of $\rho = -.7$. If you rerun the simulation with `n_cafes <- 200`, the Pearson's correlation is much closer to the data generating value.
### Simulate observations.
Here we put those simulated parameters to use and simulate actual data from them.
```{r}
n_visits <- 10
sigma <- 0.5 # std dev within cafes
set.seed(22) # used to replicate example
d <-
vary_effects %>%
mutate(cafe = 1:n_cafes) %>%
expand(nesting(cafe, a_cafe, b_cafe), visit = 1:n_visits) %>%
mutate(afternoon = rep(0:1, times = n() / 2)) %>%
mutate(mu = a_cafe + b_cafe * afternoon) %>%
mutate(wait = rnorm(n = n(), mean = mu, sd = sigma))
```
We might peek at the data.
```{r}
d %>%
glimpse()
```
Now we've finally simulated our data, we are ready to make our version of Figure 14.1, from way back on page 436.
```{r, fig.width = 3.5, fig.height = 3.5}
d %>%
mutate(afternoon = ifelse(afternoon == 0, "M", "A"),
day = rep(rep(1:5, each = 2), times = n_cafes)) %>%
filter(cafe %in% c(3, 5)) %>%
mutate(cafe = str_c("café #", cafe)) %>%
ggplot(aes(x = visit, y = wait, group = day)) +
geom_point(aes(color = afternoon), size = 2) +
geom_line(color = "#8B9DAF") +
scale_color_manual(values = c("#80A0C7", "#EEDA9D")) +
scale_x_continuous(NULL, breaks = 1:10, labels = rep(c("M", "A"), times = 5)) +
scale_y_continuous("wait time in minutes", limits = c(0, NA)) +
theme_pearl_earring(axis.ticks.x = element_blank(),
legend.position = "none") +
facet_wrap(~ cafe, ncol = 1)
```
#### Rethinking: Simulation and misspecification.
> In this exercise, we are simulating data from a generative process and then analyzing that data with a model that reflects exactly the correct structure of that process. But in the real world, we're never so lucky. Instead we are always forced to analyze data with a model that is misspecified: The true data-generating process is different than the model. Simulation can be used however to explore misspecification. Just simulate data from a process and then see how a number of models, none of which match exactly the data-generating process, perform. And always remember that Bayesian inference does not depend upon data-generating assumptions, such as the likelihood, being true. (p. 441)
### The varying slopes model.
The statistical formula for our varying intercepts and slopes café model follows the form
\begin{align*}
\text{wait}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\
\mu_i & = \alpha_{\text{café}[i]} + \beta_{\text{café}[i]} \text{afternoon}_i \\
\begin{bmatrix} \alpha_\text{café} \\ \beta_\text{café} \end{bmatrix} & \sim \operatorname{MVNormal} \begin{pmatrix} \begin{bmatrix} \alpha \\ \beta \end{bmatrix}, \mathbf \Sigma \end{pmatrix} \\
\mathbf \Sigma & = \begin{bmatrix} \sigma_\alpha & 0 \\ 0 & \sigma_\beta \end{bmatrix} \mathbf R \begin{bmatrix} \sigma_\alpha & 0 \\ 0 & \sigma_\beta \end{bmatrix} \\
\alpha & \sim \operatorname{Normal}(5, 2) \\
\beta & \sim \operatorname{Normal}(-1, 0.5) \\
\sigma & \sim \operatorname{Exponential}(1) \\
\sigma_\alpha & \sim \operatorname{Exponential}(1) \\
\sigma_\beta & \sim \operatorname{Exponential}(1) \\
\mathbf R & \sim \operatorname{LKJcorr}(2),
\end{align*}
where $\mathbf \Sigma$ is the covariance matrix and $\mathbf R$ is the corresponding correlation matrix, which we might more fully express as
$$\mathbf R = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}.$$
And according to our prior, $\mathbf R$ is distributed as $\operatorname{LKJcorr}(2)$. We'll use `rethinking::rlkjcorr()` to get a better sense of what that even is.
```{r, message = F, warning = F}
library(rethinking)
n_sim <- 1e4
set.seed(14)
r_1 <-
rlkjcorr(n_sim, K = 2, eta = 1) %>%
data.frame()
set.seed(14)
r_2 <-
rlkjcorr(n_sim, K = 2, eta = 2) %>%
data.frame()
set.seed(14)
r_4 <-
rlkjcorr(n_sim, K = 2, eta = 4) %>%
data.frame()
```
Here are the $\text{LKJcorr}$ distributions of Figure 14.3.
```{r, fig.width = 3, fig.height = 3}
# for annotation
text <-
tibble(x = c(.83, .625, .45),
y = c(.56, .75, 1.07),
label = c("eta = 1", "eta = 2", "eta = 4"))
# plot
r_1 %>%
ggplot(aes(x = X2)) +
geom_density(color = "transparent", fill = "#394165", alpha = 2/3, adjust = 1/2) +
geom_density(data = r_2,
color = "transparent", fill = "#DCA258", alpha = 2/3, adjust = 1/2) +
geom_density(data = r_4,
color = "transparent", fill = "#FCF9F0", alpha = 2/3, adjust = 1/2) +
geom_text(data = text,
aes(x = x, y = y, label = label),
color = "#A65141", family = "Courier") +
scale_y_continuous(NULL, breaks = NULL) +
labs(title = expression(LKJcorr(eta)),
x = "correlation")
```
As it turns out, the shape of the LKJ is sensitive to both $\eta$ and the $K$ dimensions of the correlation matrix. Our simulations only considered the shapes for when $K = 2$. We can use a combination of the `parse_dist()` and `stat_dist_halfeye()` functions from the **tidybayes** package to derive analytic solutions for different combinations of $\eta$ and $K$.
```{r, fig.width = 8, fig.height = 4}
library(tidybayes)
crossing(k = 2:5,
eta = 1:4) %>%
mutate(prior = str_c("lkjcorr_marginal(", k, ", ", eta, ")"),
strip = str_c("K==", k)) %>%
parse_dist(prior) %>%
ggplot(aes(y = eta, dist = .dist, args = .args)) +
stat_dist_halfeye(.width = c(.5, .95),
color = "#FCF9F0", fill = "#A65141") +
scale_x_continuous(expression(rho), limits = c(-1, 1),
breaks = c(-1, -.5, 0, .5, 1), labels = c("-1", "-.5", "0", ".5", "1")) +
scale_y_continuous(expression(eta), breaks = 1:4) +
ggtitle(expression("Marginal correlation for the LKJ prior relative to K and "*eta)) +
facet_wrap(~ strip, labeller = label_parsed, ncol = 4)
```
To learn more about this plotting method, check out Kay's [-@kayMarginalDistributionSingle2020] [*Marginal distribution of a single correlation from an LKJ distribution*](https://mjskay.github.io/ggdist/reference/lkjcorr_marginal.html). To get a better intuition what that plot means, check out the illuminating blog post by [Stephen Martin](https://twitter.com/smartin2018), [*Is the LKJ(1) prior uniform? "Yes"*](http://srmart.in/is-the-lkj1-prior-uniform-yes/).
Okay, let's get ready to model and switch out **rethinking** for **brms**.
```{r, message = F, warning = F}
detach(package:rethinking, unload = T)
library(brms)
```
As defined above, our first model has both varying intercepts and `afternoon` slopes. I should point out that the `(1 + afternoon | cafe)` syntax specifies that we'd like `brm()` to fit the random effects for `1` (i.e., the intercept) and the `afternoon` slope as correlated. Had we wanted to fit a model in which they were orthogonal, we'd have coded `(1 + afternoon || cafe)`.
```{r b14.1}
b14.1 <-
brm(data = d,
family = gaussian,
wait ~ 1 + afternoon + (1 + afternoon | cafe),
prior = c(prior(normal(5, 2), class = Intercept),
prior(normal(-1, 0.5), class = b),
prior(exponential(1), class = sd),
prior(exponential(1), class = sigma),
prior(lkj(2), class = cor)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 867530,
file = "fits/b14.01")
```
With Figure 14.4, we assess how the posterior for the correlation of the random effects compares to its prior.
```{r, fig.width = 3, fig.height = 3}
post <- as_draws_df(b14.1)
post %>%
ggplot() +
geom_density(data = r_2, aes(x = X2),
color = "transparent", fill = "#EEDA9D", alpha = 3/4) +
geom_density(aes(x = cor_cafe__Intercept__afternoon),
color = "transparent", fill = "#A65141", alpha = 9/10) +
annotate(geom = "text",
x = c(-0.15, 0), y = c(2.21, 0.85),
label = c("posterior", "prior"),
color = c("#A65141", "#EEDA9D"), family = "Courier") +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Correlation between intercepts\nand slopes, prior and posterior",
x = "correlation")
```
McElreath then depicted multidimensional shrinkage by plotting the posterior mean of the varying effects compared to their raw, unpooled estimated. With **brms**, we can get the `cafe`-specific intercepts and `afternoon` slopes with `coef()`, which returns a three-dimensional list.
```{r}
# coef(b14.1) %>% glimpse()
coef(b14.1)
```
Here's the code to extract the relevant elements from the `coef()` list, convert them to a tibble, and add the `cafe` index.
```{r}
partially_pooled_params <-
# with this line we select each of the 20 cafe's posterior mean (i.e., Estimate)
# for both `Intercept` and `afternoon`
coef(b14.1)$cafe[ , 1, 1:2] %>%
data.frame() %>% # convert the two vectors to a data frame
rename(Slope = afternoon) %>%
mutate(cafe = 1:nrow(.)) %>% # add the `cafe` index
select(cafe, everything()) # simply moving `cafe` to the leftmost position
```
Like McElreath, we'll compute the unpooled estimates directly from the data.
```{r, warning = F, message = F}
# compute unpooled estimates directly from data
un_pooled_params <-
d %>%
# with these two lines, we compute the mean value for each cafe's wait time
# in the morning and then the afternoon
group_by(afternoon, cafe) %>%
summarise(mean = mean(wait)) %>%
ungroup() %>% # ungrouping allows us to alter afternoon, one of the grouping variables
mutate(afternoon = ifelse(afternoon == 0, "Intercept", "Slope")) %>%
spread(key = afternoon, value = mean) %>% # use `spread()` just as in the previous block
mutate(Slope = Slope - Intercept) # finally, here's our slope!
# here we combine the partially-pooled and unpooled means into a single data object,
# which will make plotting easier.
params <-
# `bind_rows()` will stack the second tibble below the first
bind_rows(partially_pooled_params, un_pooled_params) %>%
# index whether the estimates are pooled
mutate(pooled = rep(c("partially", "not"), each = nrow(.)/2))
# here's a glimpse at what we've been working for
params %>%
slice(c(1:5, 36:40))
```
Finally, here's our code for Figure 14.5.a, showing shrinkage in two dimensions.
```{r, fig.width = 4.5, fig.height = 3}
p1 <-
ggplot(data = params, aes(x = Intercept, y = Slope)) +
stat_ellipse(geom = "polygon", type = "norm", level = 1/10, size = 0, alpha = 1/20, fill = "#E7CDC2") +
stat_ellipse(geom = "polygon", type = "norm", level = 2/10, size = 0, alpha = 1/20, fill = "#E7CDC2") +
stat_ellipse(geom = "polygon", type = "norm", level = 3/10, size = 0, alpha = 1/20, fill = "#E7CDC2") +
stat_ellipse(geom = "polygon", type = "norm", level = 4/10, size = 0, alpha = 1/20, fill = "#E7CDC2") +
stat_ellipse(geom = "polygon", type = "norm", level = 5/10, size = 0, alpha = 1/20, fill = "#E7CDC2") +
stat_ellipse(geom = "polygon", type = "norm", level = 6/10, size = 0, alpha = 1/20, fill = "#E7CDC2") +
stat_ellipse(geom = "polygon", type = "norm", level = 7/10, size = 0, alpha = 1/20, fill = "#E7CDC2") +
stat_ellipse(geom = "polygon", type = "norm", level = 8/10, size = 0, alpha = 1/20, fill = "#E7CDC2") +
stat_ellipse(geom = "polygon", type = "norm", level = 9/10, size = 0, alpha = 1/20, fill = "#E7CDC2") +
stat_ellipse(geom = "polygon", type = "norm", level = .99, size = 0, alpha = 1/20, fill = "#E7CDC2") +
geom_point(aes(group = cafe, color = pooled)) +
geom_line(aes(group = cafe), size = 1/4) +
scale_color_manual("Pooled?", values = c("#80A0C7", "#A65141")) +
coord_cartesian(xlim = range(params$Intercept),
ylim = range(params$Slope))
p1
```
Learn more about `stat_ellipse()`, [here](https://ggplot2.tidyverse.org/reference/stat_ellipse.html). Let's prep for Figure 14.5.b.
```{r, warning = F, message = F}
# retrieve the partially-pooled estimates with `coef()`
partially_pooled_estimates <-
coef(b14.1)$cafe[ , 1, 1:2] %>%
# convert the two vectors to a data frame
data.frame() %>%
# the Intercept is the wait time for morning (i.e., `afternoon == 0`)
rename(morning = Intercept) %>%
# `afternoon` wait time is the `morning` wait time plus the afternoon slope
mutate(afternoon = morning + afternoon,
cafe = 1:n()) %>% # add the `cafe` index
select(cafe, everything())
# compute unpooled estimates directly from data
un_pooled_estimates <-
d %>%
# as above, with these two lines, we compute each cafe's mean wait value by time of day
group_by(afternoon, cafe) %>%
summarise(mean = mean(wait)) %>%
# ungrouping allows us to alter the grouping variable, afternoon
ungroup() %>%
mutate(afternoon = ifelse(afternoon == 0, "morning", "afternoon")) %>%
# this separates out the values into morning and afternoon columns
spread(key = afternoon, value = mean)
estimates <-
bind_rows(partially_pooled_estimates, un_pooled_estimates) %>%
mutate(pooled = rep(c("partially", "not"), each = n() / 2))
```
The code for Figure 14.5.b.
```{r, fig.width = 4.7, fig.height = 3}
p2 <-
ggplot(data = estimates, aes(x = morning, y = afternoon)) +
# nesting `stat_ellipse()` within `mapply()` is a less redundant way to produce the
# ten-layered semitransparent ellipses we did with ten lines of `stat_ellipse()`
# functions in the previous plot
mapply(function(level) {
stat_ellipse(geom = "polygon", type = "norm",
size = 0, alpha = 1/20, fill = "#E7CDC2",
level = level)
},
# enter the levels here
level = c(1:9 / 10, .99)) +
geom_point(aes(group = cafe, color = pooled)) +
geom_line(aes(group = cafe), size = 1/4) +
scale_color_manual("Pooled?", values = c("#80A0C7", "#A65141")) +
labs(x = "morning wait (mins)",
y = "afternoon wait (mins)") +
coord_cartesian(xlim = range(estimates$morning),
ylim = range(estimates$afternoon))
```
Here we combine the two subplots together with **patchwork** syntax.
```{r, fig.width = 8, fig.height = 3.5}
library(patchwork)
(p1 + theme(legend.position = "none")) +
p2 +
plot_annotation(title = "Shrinkage in two dimensions")
```
> What I want you to appreciate in this plot is that shrinkage on the parameter scale naturally produces shrinkage where we actually care about it: on the outcome scale. And it also implies a population of wait times, shown by the [semitransparent ellipses]. That population is now positively correlated--cafés with longer morning waits also tend to have longer afternoon waits. They are popular, after all. But the population lies mostly below the dashed line where the waits are equal. You'll wait less in the afternoon, on average. (p. 446)
## Advanced varying slopes
In [Section 13.3][More than one type of cluster] we saw that data can be considered **cross-classified** if they have multiple grouping factors. We used the `chipanzees` data in that section and we only considered cross-classification by single intercepts. Turns out cross-classified models can be extended further. Let's load and wrangle those data.
```{r, warning = F, message = F}
data(chimpanzees, package = "rethinking")
d <- chimpanzees
rm(chimpanzees)
# wrangle
d <-
d %>%
mutate(actor = factor(actor),
block = factor(block),
treatment = factor(1 + prosoc_left + 2 * condition),
# this will come in handy, later
labels = factor(treatment,
levels = 1:4,
labels = c("r/n", "l/n", "r/p", "l/p")))
glimpse(d)
```
If I'm following along correctly with the text, McElreath's `m14.2` uses the centered parameterization. Recall from the last chapter that **brms** only supports the non-centered parameterization. Happily, McElreath's `m14.3` appears to use the non-centered parameterization. Thus, we'll skip making a `b14.2` and jump directly into making a `b14.3`. I believe one could describe the statistical model as
\begin{align*}
\text{left_pull}_i & \sim \operatorname{Binomial}(n_i = 1, p_i) \\
\operatorname{logit} (p_i) & = \gamma_{\text{treatment}[i]} + \alpha_{\text{actor}[i], \text{treatment}[i]} + \beta_{\text{block}[i], \text{treatment}[i]} \\
\gamma_j & \sim \operatorname{Normal}(0, 1), \;\;\; \text{for } j = 1, \dots, 4 \\
\begin{bmatrix} \alpha_{j, 1} \\ \alpha_{j, 2} \\ \alpha_{j, 3} \\ \alpha_{j, 4} \end{bmatrix} & \sim \operatorname{MVNormal} \begin{pmatrix} \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \mathbf \Sigma_\text{actor} \end{pmatrix} \\
\begin{bmatrix} \beta_{j, 1} \\ \beta_{j, 2} \\ \beta_{j, 3} \\ \beta_{j, 4} \end{bmatrix} & \sim \operatorname{MVNormal} \begin{pmatrix} \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \mathbf \Sigma_\text{block} \end{pmatrix} \\
\mathbf \Sigma_\text{actor} & = \mathbf{S_\alpha R_\alpha S_\alpha} \\
\mathbf \Sigma_\text{block} & = \mathbf{S_\beta R_\beta S_\beta} \\
\sigma_{\alpha, [1]}, \dots, \sigma_{\alpha, [4]} & \sim \operatorname{Exponential}(1) \\
\sigma_{\beta, [1]}, \dots, \sigma_{\beta, [4]} & \sim \operatorname{Exponential}(1) \\
\mathbf R_\alpha & \sim \operatorname{LKJ}(2) \\
\mathbf R_\beta & \sim \operatorname{LKJ}(2).
\end{align*}
In this model, we have four population-level intercepts, $\gamma_1, \dots, \gamma_4$, one for each of the four levels of `treatment`. There are two higher-level grouping variables, `actor` and `block`, making this a cross-classified model.
The term $\alpha_{\text{actor}[i], \text{treatment}[i]}$ is meant to convey that each of the `treatment` effects can vary by `actor`. The first line containing the $\operatorname{MVNormal}(\cdot)$ operator indicates the `actor`-level deviations from the population-level estimates for $\gamma_j$ follow the multivariate normal distribution where the four means are set to zero (i.e., they are deviations) and their spread around those zeros are controlled by $\Sigma_\text{actor}$. In the first line below the last line containing $\operatorname{MVNormal}(\cdot)$, we learn that $\Sigma_\text{actor}$ can be decomposed into two terms, $\mathbf S_\alpha$ and $\mathbf R_\alpha$. It may not yet be clear by the notation, but $\mathbf S_\alpha$ is a $4 \times 4$ matrix,
$$
\mathbf S_\alpha = \begin{bmatrix} \sigma_{\alpha, [1]} & 0 & 0 & 0 \\ 0 & \sigma_{\alpha, [2]} & 0 & 0 \\ 0 & 0 & \sigma_{\alpha, [3]} & 0 \\ 0 & 0 & 0 & \sigma_{\alpha, [4]} \end{bmatrix}.
$$
In a similar way, $\mathbf R_\alpha$ is a $4 \times 4$ matrix,
$$
\mathbf R_\alpha = \begin{bmatrix} 1 & \rho_{\alpha, [1, 2]} & \rho_{\alpha, [1, 3]} & \rho_{\alpha, [1, 4]} \\ \rho_{\alpha, [2, 1]} & 1 & \rho_{\alpha, [2, 3]} & \rho_{\alpha, [2, 4]} \\ \rho_{\alpha, [3, 1]} & \rho_{\alpha, [3, 2]} & 1 & \rho_{\alpha, [3, 4]} \\ \rho_{\alpha, [4, 1]} & \rho_{\alpha, [4, 2]} & \rho_{\alpha, [4, 3]} & 1 \end{bmatrix}.
$$
The same overall pattern holds true for $\beta_{\text{block}[i], \text{treatment}[i]}$ and the associated $\beta$ parameters connected to the `block` grouping variable. All the population parameters $\sigma_{\alpha, [1]}, \dots, \sigma_{\alpha, [4]}$ and $\sigma_{\beta, [1]}, \dots, \sigma_{\beta, [4]}$ have individual $\operatorname{Exponential}(1)$ priors. The two $\mathbf R_{< \cdot >}$ matrices have the priors $\operatorname{LKJ}(2)$.
I know; this is a lot. This all takes time to grapple with. Here's how to fit such a model with **brms**.
```{r b14.3}
b14.3 <-
brm(data = d,
family = binomial,
pulled_left | trials(1) ~ 0 + treatment + (0 + treatment | actor) + (0 + treatment | block),
prior = c(prior(normal(0, 1), class = b),
prior(exponential(1), class = sd, group = actor),
prior(exponential(1), class = sd, group = block),
prior(lkj(2), class = cor, group = actor),
prior(lkj(2), class = cor, group = block)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 4387510,
file = "fits/b14.03")
```
Happily, we got no warnings about divergent transitions. Since it's been a while, we'll use `bayesplot::mcmc_rank_overlay()` to examine the primary parameters with a trank plot.
```{r, message = F, warning = F, fig.width = 8, fig.height = 7.5}
library(bayesplot)
# give the parameters fancy names
names <-
c(str_c("treatment[", 1:4, "]"),
str_c("sigma['actor[", 1:4, "]']"),
str_c("sigma['block[", 1:4, "]']"),
str_c("rho['actor:treatment[", c(1, 1:2, 1:3), ",", rep(2:4, times = 1:3), "]']"),
str_c("rho['block:treatment[", c(1, 1:2, 1:3), ",", rep(2:4, times = 1:3), "]']"),
"Chain")
# wrangle
as_draws_df(b14.3) %>%
select(b_treatment1:`cor_block__treatment3__treatment4`, .chain) %>%
set_names(names) %>%
# plot
mcmc_rank_overlay() +
scale_color_manual(values = c("#80A0C7", "#B1934A", "#A65141", "#EEDA9D")) +
scale_x_continuous(NULL, breaks = 0:4 * 1e3, labels = c(0, str_c(1:4, "K"))) +
coord_cartesian(ylim = c(30, NA)) +
ggtitle("McElreath would love this trank plot.") +
theme(legend.position = "bottom") +
facet_wrap(~ parameter, labeller = label_parsed, ncol = 4)
```
Because we only fit a non-centered version of the model, we aren't able to make a faithful version of McElreath's Figure 14.6. However, we can still use `posterior::summarise_draws()` to help make histograms of the two kinds of effective sample sizes for our `b14.3`.
```{r, fig.width = 6, fig.height = 2.5, warning = F, message = F}
library(posterior)
as_draws_df(b14.3) %>%
summarise_draws() %>%
pivot_longer(starts_with("ess")) %>%
ggplot(aes(x = value)) +
geom_histogram(binwidth = 250, fill = "#EEDA9D", color = "#DCA258") +
xlim(0, NA) +
facet_wrap(~ name)
```
Here is a summary of the model parameters.
```{r}
print(b14.3)
```
Like McElreath explained on page 450, our `b14.3` has 76 parameters:
* 4 average `treatment` effects, as listed in the 'Population-Level Effects' section;
* 7 $\times$ 4 = 28 varying effects on `actor`, as indicated in the '~actor:treatment (Number of levels: 7)' header multiplied by the four levels of `treatment`;
* 6 $\times$ 4 = 24 varying effects on `block`, as indicated in the '~block:treatment (Number of levels: 6)' header multiplied by the four levels of `treatment`;
* 8 standard deviations listed in the eight rows beginning with `sd(`; and
* 12 free correlation parameters listed in the eight rows beginning with `cor(`.
Compute the WAIC estimate.
```{r, warning = F, message = F}
b14.3 <- add_criterion(b14.3, "waic")
waic(b14.3)
```
Like the $p_\text{WAIC}$, our **brms** version of the model has about 27 effective parameters. Now we'll get a better sense of the model with a posterior predictive check in the form of our version of Figure 14.7. McElreath described his **R** code 14.22 as "a big chunk of code" (p. 451). I'll leave up to the reader to decide whether our big code chunk is any better.
```{r, fig.width = 8, fig.height = 2.5}
# for annotation
text <-
distinct(d, labels) %>%
mutate(actor = 1,
prop = c(.07, .8, .08, .795))
nd <-
d %>%
distinct(actor, condition, labels, prosoc_left, treatment) %>%
mutate(block = 5)
# compute and wrangle the posterior predictions
fitted(b14.3,
newdata = nd) %>%
data.frame() %>%
bind_cols(nd) %>%
# add the empirical proportions
left_join(
d %>%
group_by(actor, treatment) %>%
mutate(proportion = mean(pulled_left)) %>%
distinct(actor, treatment, proportion),
by = c("actor", "treatment")
) %>%
mutate(condition = factor(condition)) %>%
# plot!
ggplot(aes(x = labels)) +
geom_hline(yintercept = .5, color = "#E8DCCF", alpha = 1/2, linetype = 2) +
# empirical proportions
geom_line(aes(y = proportion, group = prosoc_left),
size = 1/4, color = "#394165") +
geom_point(aes(y = proportion, shape = condition),
color = "#394165", fill = "#100F14", size = 2.5, show.legend = F) +
# posterior predictions
geom_line(aes(y = Estimate, group = prosoc_left),
size = 3/4, color = "#80A0C7") +
geom_pointrange(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5, shape = condition),
color = "#80A0C7", fill = "#100F14", fatten = 8, size = 1/3, show.legend = F) +
# annotation for the conditions
geom_text(data = text,
aes(y = prop, label = labels),
color = "#DCA258", family = "Courier", size = 3) +
scale_shape_manual(values = c(21, 19)) +
scale_x_discrete(NULL, breaks = NULL) +
scale_y_continuous("proportion left lever", breaks = 0:2 / 2, labels = c("0", ".5", "1")) +
labs(subtitle = "Posterior predictions, in light blue, against the raw data, in dark blue, for\nmodel b14.3, the cross-classified varying effects model.") +
facet_wrap(~ actor, nrow = 1, labeller = label_both)
```
> These chimpanzees simply did not behave in any consistently different way in the partner treatments. The model we've used here does have some advantages, though. Since it allows for some individuals to differ in how they respond to the treatments, it could reveal a situation in which a treatment has no effect on average, even though some of the individuals respond strongly. That wasn't the case here. But often we are more interested in the distribution of responses than in the average response, so a model that estimates the distribution of treatment effects is very useful. (p. 452)
For more practice with models of this kind, check out my blog post, [*Multilevel models and the index-variable approach*](https://solomonkurz.netlify.app/post/2020-12-09-multilevel-models-and-the-index-variable-approach/).
## Instruments and causal designs
> Of course sometimes it won't be possible to close all of the non-causal paths or rule of unobserved confounds. What can be done in that case? More than nothing. If you are lucky, there are ways to exploit a combination of natural experiments and clever modeling that allow causal inference even when non-causal paths cannot be closed. (p. 455)
### Instrumental variables.
Say were are interested in the causal impact of education $E$ on wages $W$, $E \rightarrow W$. Further imagine there is some unmeasured variable $U$ that has causal relations with both, $E \leftarrow U \rightarrow W$, creating a backdoor path. We might use good old **ggdag** to plot the DAG.
```{r, warning = F, message = F}
library(ggdag)
dag_coords <-
tibble(name = c("E", "U", "W"),
x = c(1, 2, 3),
y = c(1, 2, 1))
```
Before we make the plot, we'll make a custom theme, `theme_pearl_dag()`, to streamline our DAG plots.
```{r, fig.width = 3, fig.height = 1.75, message = F, warning = F}
theme_pearl_dag <- function(...) {
theme_pearl_earring() +
theme_dag() +
theme(panel.background = element_rect(fill = "#100F14"),
...)
}
dagify(E ~ U,
W ~ E + U,
coords = dag_coords) %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point(aes(color = name == "U"),
shape = 21, stroke = 2, fill = "#FCF9F0", size = 6, show.legend = F) +
geom_dag_text(color = "#100F14", family = "Courier") +
geom_dag_edges(edge_colour = "#FCF9F0") +
scale_color_manual(values = c("#EEDA9D", "#A65141")) +
theme_pearl_dag()
```
Instrumental variables will solve some of the difficulties we have in not being able to condition on $U$. Here we'll call our instrumental variable $Q$. In the terms of the present example, the **instrumental variable** has the qualities that
* $Q$ is independent of $U$,
* $Q$ is not independent of $E$, and
* $Q$ can only influence $W$ through $E$ (i.e., the effect of $Q$ on $W$ is fully mediated by $E$).
There is what this looks like in a DAG.
```{r, fig.width = 3.5, fig.height = 1.75, message = F, warning = F}
dag_coords <-
tibble(name = c("Q", "E", "U", "W"),
x = c(0, 1, 2, 3),
y = c(2, 1, 2, 1))
dagify(E ~ Q + U,
W ~ E + U,
coords = dag_coords) %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point(aes(color = name == "U"),
shape = 21, stroke = 2, fill = "#FCF9F0", size = 6, show.legend = F) +
geom_dag_text(color = "#100F14", family = "Courier") +
geom_dag_edges(edge_colour = "#FCF9F0") +
scale_color_manual(values = c("#EEDA9D", "#A65141")) +
theme_pearl_dag()
```
Sadly, our condition that $Q$ can only influence $W$ through $E$--often called the **exclusion restriction**--generally cannot be tested. Given $U$ is unmeasured, by definition, we also cannot test that $Q$ is independent of $U$. These are model assumptions.
Let's simulate data based on @angristDoesCompulsorySchool1991 to get a sense of how this works.
```{r}
# make a standardizing function
standardize <- function(x) {
(x - mean(x)) / sd(x)
}
# simulate
set.seed(73)
n <- 500
dat_sim <-
tibble(u_sim = rnorm(n, mean = 0, sd = 1),
q_sim = sample(1:4, size = n, replace = T)) %>%
mutate(e_sim = rnorm(n, mean = u_sim + q_sim, sd = 1)) %>%
mutate(w_sim = rnorm(n, mean = u_sim + 0 * e_sim, sd = 1)) %>%
mutate(w = standardize(w_sim),
e = standardize(e_sim),
q = standardize(q_sim))
dat_sim
```
$Q$ in this context is like quarter in the school year, but inversely scaled such that larger numbers indicate more quarters. In this simulation, we have set the true effect of education on wages--$E \rightarrow W$--to be zero. Any univariate association is through the confounding variable $U$. Also, $Q$ has no direct effect on $W$ or $U$, but it does have a causal relation with $E$, which is $Q \rightarrow E \leftarrow U$. First we fit the univariable model corresponding to $E \rightarrow W$.
```{r}
b14.4 <-
brm(data = dat_sim,
family = gaussian,
w ~ 1 + e,
prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 14,
file = "fits/b14.04")
```
```{r}
print(b14.4)
```
Because we have not conditioned on $U$, then model suggests a moderately large spurious causal relation for $E \rightarrow W$. Now see what happens when we also condition directly on $Q$, as in $Q \rightarrow W \leftarrow E$.
```{r}
b14.5 <-
brm(data = dat_sim,
family = gaussian,
w ~ 1 + e + q,
prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 14,
file = "fits/b14.05")
```
```{r}
print(b14.5)
```
Holy smokes that's a mess. This model suggests both $E$ and $Q$ have moderate to strong causal effects on $W$, even though we know neither do based on the true data-generating model. Like McElreath said, "bad stuff happens" when we condition on an instrumental variable this way.
> There is no backdoor path through $Q$, as you can see. But there is a non-causal path from $Q$ to $W$ through $U$: $Q \rightarrow E \leftarrow U \rightarrow W$. This is a non-causal path, because changing $Q$ doesn't result in any change in $W$ through this path. But since we are conditioning on $E$ in the same model, and $E$ is a collider of $Q$ and $U$, the non-causal path is open. This confounds the coefficient on $Q$. It won't be zero, because it'll pick up the association between $U$ and $W$. And then, as a result, the coefficient on $E$ can get even more confounded. Used this way, an instrument like $Q$ might be called a **bias amplifier**. (p. 456, **emphasis** in the original)
The statistical solution to this mess is to express the data-generating DAG as a multivariate statistical model following the form
\begin{align*}
\begin{bmatrix} W_i \\ E_i \end{bmatrix} & \sim \operatorname{MVNormal} \begin{pmatrix} \begin{bmatrix} \mu_{\text W,i} \\ \mu_{\text E,i} \end{bmatrix}, \color{#A65141}{\mathbf \Sigma} \end{pmatrix} \\
\mu_{\text W,i} & = \alpha_\text W + \beta_\text{EW} E_i \\
\mu_{\text E,i} & = \alpha_\text E + \beta_\text{QE} Q_i \\
\color{#A65141}{\mathbf\Sigma} & \color{#A65141}= \color{#A65141}{\begin{bmatrix} \sigma_\text W & 0 \\ 0 & \sigma_\text E \end{bmatrix} \mathbf R \begin{bmatrix} \sigma_\text W & 0 \\ 0 & \sigma_\text E \end{bmatrix}} \\
\color{#A65141}{\mathbf R} & \color{#A65141}= \color{#A65141}{\begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}} \\
\alpha_\text W \text{ and } \alpha_\text E & \sim \operatorname{Normal}(0, 0.2) \\
\beta_\text{EW} \text{ and } \beta_\text{QE} & \sim \operatorname{Normal}(0, 0.5) \\
\sigma_\text W \text{ and } \sigma_\text E & \sim \operatorname{Exponential}(1) \\
\rho & \sim \operatorname{LKJ}(2).
\end{align*}
You might not remember, but we've actually fit a model like this before. It was `b5.3_A` from way back in [Section 5.1.5.3][Counterfactual plots.]. The big difference between that earlier model and this one is whereas the former did not include a residual correlation, $\rho$, this one will. Thus, this time we will make sure to set `set_rescor(TRUE)` in the `formula`. Within **brms** parlance, priors for residual correlations are of `class = rescor`.
```{r b14.6}
e_model <- bf(e ~ 1 + q)
w_model <- bf(w ~ 1 + e)
b14.6 <-
brm(data = dat_sim,
family = gaussian,
e_model + w_model + set_rescor(TRUE),
prior = c(# E model
prior(normal(0, 0.2), class = Intercept, resp = e),
prior(normal(0, 0.5), class = b, resp = e),
prior(exponential(1), class = sigma, resp = e),
# W model
prior(normal(0, 0.2), class = Intercept, resp = w),
prior(normal(0, 0.5), class = b, resp = w),
prior(exponential(1), class = sigma, resp = w),
# rho
prior(lkj(2), class = rescor)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 14,
file = "fits/b14.06")
```
```{r}
print(b14.6)
```
Now the parameter for $E \rightarrow W$, `w_e`, is just where it should be--near zero. The residual correlation between $E$ and $Q$, `rescor(e,w)`, is positive and large in magnitude, indicating their common influence from the unmeasured variable $U$. Next we'll take McElreath's direction to "adjust the simulation and try other scenarios" (p. 459) by adjusting the causal relations, as in his **R** code 14.28.
```{r}
set.seed(73)
n <- 500
dat_sim <-
tibble(u_sim = rnorm(n, mean = 0, sd = 1),
q_sim = sample(1:4, size = n, replace = T)) %>%
mutate(e_sim = rnorm(n, mean = u_sim + q_sim, sd = 1)) %>%
mutate(w_sim = rnorm(n, mean = -u_sim + 0.2 * e_sim, sd = 1)) %>%
mutate(w = standardize(w_sim),
e = standardize(e_sim),
q = standardize(q_sim))
dat_sim
```
We'll use `update()` to avoid re-compiling the models.
```{r b14.4x, warning = F, message = F, results = "hide"}
b14.4x <-
update(b14.4,
newdata = dat_sim,
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 14,
file = "fits/b14.04x")
b14.6x <-
update(b14.6,
newdata = dat_sim,
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 14,
file = "fits/b14.06x")
```
Just for kicks, let's examine the results with a coefficient plot.
```{r, fig.width = 6, fig.height = 1.9}
text <-
tibble(Estimate = c(fixef(b14.4x)[2, 3], fixef(b14.6x)[4, 4]),
y = c(4.35, 3.65),
hjust = c(0, 1),
fit = c("b14.4x", "b14.6x"))
bind_rows(
# b_14.4x
posterior_summary(b14.4x)[1:3, ] %>%
data.frame() %>%
mutate(param = c("alpha[W]", "beta[EW]", "sigma[W]"),
fit = "b14.4x"),
# b_14.6x
posterior_summary(b14.6x)[1:7, ] %>%
data.frame() %>%
mutate(param = c("alpha[E]", "alpha[W]", "beta[QE]", "beta[EW]", "sigma[E]", "sigma[W]", "rho"),
fit = "b14.6x")) %>%
mutate(param = factor(param,
levels = c("rho", "sigma[W]", "sigma[E]", "beta[EW]", "beta[QE]", "alpha[W]", "alpha[E]"))) %>%
ggplot(aes(x = param, y = Estimate, color = fit)) +
geom_hline(yintercept = 0, color = "#E8DCCF", alpha = 1/4) +
geom_pointrange(aes(ymin = Q2.5, ymax = Q97.5),
fatten = 2, position = position_dodge(width = 0.5)) +
geom_text(data = text,
aes(x = y, label = fit, hjust = hjust)) +
scale_color_manual(NULL, values = c("#E7CDC2", "#A65141")) +
scale_x_discrete(NULL, labels = ggplot2:::parse_safe) +
ylab("marginal posterior") +
coord_flip() +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank(),
legend.position = "none")
```
With the help from `b14.6x`, we found "that $E$ and $W$ have a negative correlation in their residual variance, because the confound positively influences one and negatively influences the other" (p. 459).
One can use the `dagitty()` and `instrumentalVariables()` functions from the **dagitty** package to first define a DAG and then query whether there are instrumental variables for a given exposure and outcome.
```{r}
library(dagitty)
dagIV <- dagitty("dag{Q -> E <- U -> W <- E}")
instrumentalVariables(dagIV, exposure = "E", outcome = "W")
```
> The hardest thing about instrumental variables is believing in any particular instrument. If you believe in your DAG, they are easy to believe. But should you believe in your DAG?...
>
> In general, it is not possible to statistically prove whether a variable is a good instrument. As always, we need scientific knowledge outside of the data to make sense of the data. (p. 460)
#### Rethinking: Two-stage worst squares.
"The instrumental variable model is often discussed with an estimation procedure known as **two-stage least squares** (2SLS)" (p. 460, **emphasis** in the original). For a nice introduction to instrumental variables via 2SLS, see this [practical introduction](https://evalf20.classes.andrewheiss.com/example/iv/), and also the [slides and video-lecture files](https://evalf20.classes.andrewheiss.com/content/11-content/), from the great [Andrew Heiss](https://twitter.com/andrewheiss).
### Other designs.
> There are potentially many ways to find natural experiments. Not all of them are strictly instrumental variables. But they can provide theoretically correct designs for causal inference, if you can believe the assumptions. Let's consider two more.
>
> In addition to the backdoor criterion you met in [Chapter 6][The Haunted DAG & The Causal Terror], there is something called the **front-door criterion**. (p. 460, **emphasis** in the original)
To get a sense of the front-door criterion, consider the following DAG with observed variables $X$, $Y$, and $Z$ and an unobserved variable, $U$.
```{r, fig.width = 3, fig.height = 1.75, message = F, warning = F}
dag_coords <-
tibble(name = c("X", "Z", "U", "Y"),
x = c(1, 2, 2, 3),
y = c(1, 1, 2, 1))
dagify(X ~ U,
Z ~ X,
Y ~ U + Z,
coords = dag_coords) %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point(aes(color = name == "U"),
shape = 21, stroke = 2, fill = "#FCF9F0", size = 6, show.legend = F) +
geom_dag_text(color = "#100F14", family = "Courier") +
geom_dag_edges(edge_colour = "#FCF9F0") +
scale_color_manual(values = c("#EEDA9D", "#A65141")) +
theme_pearl_dag()
```
> We are interested, as usual, in the causal influence of $X$ on $Y$. But there is an unobserved confound $U$, again as usual. It turns out that, if we can find a perfect mediator $Z$, then we can possibly estimate the causal effect of $X$ on $Y$. It isn't crazy to think that causes are mediated by other causes. Everything has a mechanism. $Z$ in the DAG above is such a mechanism. If you have a believable $Z$ variable, then the causal effect of $X$ on $Y$ is estimated by expressing the generative model as a statistical model, similar to the instrumental variable example before. (p. 461)
McElreath's second example is the **regression discontinuity** approach. If you have a time series where the variable of interest is measured before and after some relevant intervention variable, you can estimate intercepts and slopes before and after the intervention, the cutoff. However,
> in practice, one trend is fit for individuals above the cutoff and another to those below the cutoff. Then an estimate of the causal effect is the average difference between individuals just above and just below the cutoff. While the difference near the cuttoff is of interest, the entire function influences this difference. So some care is needed in choosing functions for the overall relationship between the exposure and the outcome. (p. 461)
McElreath's not kidding about the need for care when fitting regression discontinuity models. Gleman's blog is littered with awful examples (e.g., [here](https://statmodeling.stat.columbia.edu/2019/06/25/another-regression-discontinuity-disaster-and-what-can-we-learn-from-it/), [here](https://statmodeling.stat.columbia.edu/2018/08/02/38160/), [here](https://statmodeling.stat.columbia.edu/2020/07/02/no-i-dont-believe-that-claim-based-on-regression-discontinuity-analysis-that/), [here](https://statmodeling.stat.columbia.edu/2020/01/13/how-to-get-out-of-the-credulity-rut-regression-discontinuity-edition-getting-beyond-whack-a-mole/), [here](https://statmodeling.stat.columbia.edu/2020/07/18/please-socially-distance-me-from-this-regression-model/)). See also Gelman and Imbens' [-@gelmanWhyHighorderPolynomials2019] paper, [*Why high-order polynomials should not be used in regression discontinuity designs*](https://stat.columbia.edu/~gelman/research/published/2018_gelman_jbes.pdf), or [Nick HK](https://twitter.com/nickchk)'s informative [tweet](https://twitter.com/nickchk/status/1329338429360914433) on how this applies to autocorrelated data [^7].
## Social relations as correlated varying effects
It looks like **brms** is not set up to fit a model like this, at this time. See the [Social relations model (SRM) thread](https://discourse.mc-stan.org/t/social-relations-model-srm/17121) on the Stan Forums and [issue #502](https://github.com/paul-buerkner/brms/issues/502) on the **brms** GitHub repo for details. In short, the difficulty is **brms** is not set up to allow covariances among distinct random effects with the same levels and it looks like this will not change any time soon. So, in this section we will fit the model with **rethinking**, but still use **ggplot2** and friends in the post processing.
Let's load the `kl_dyads` data [@kosterFoodSharingNetworks2014].
```{r, warning = F, message = F}
library(rethinking)
data(KosterLeckie)
```
Take a look at the data.
```{r}
kl_dyads %>% glimpse()
# kl_households %>% glimpse()
```
"The variables `hidA` and `hidB` tell us the household IDs in each dyad, and `did` is a unique dyad ID number" (p. 462). To get a sense of the interrelation among those three ID variables, we'll make a tile plot.
```{r, fig.width = 6, fig.height = 3.75}
kl_dyads %>%
ggplot(aes(x = hidA, y = hidB, label = did)) +
geom_tile(aes(fill = did),