-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathch4_linear-models.Rmd
1026 lines (803 loc) · 31.6 KB
/
ch4_linear-models.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Chapter 4. Linear Models"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, comment = "#>", dpi = 500)
library(glue)
library(mustashe)
library(rethinking)
library(patchwork)
library(tidyverse)
library(conflicted)
theme_set(theme_minimal())
conflict_prefer("filter", "dplyr")
set.seed(0)
```
- this chapter introduce linear regression as a Bayesian procedure
* under a probability interpretation (necessary for Bayesian word), linear reg. uses a Gaussian distribution for uncertainty about the measurement of interest
## 4.1 Why normal distributions are normal
- example:
* you have 1,000 people stand at the half-way line of a soccer field
* they each flip a coin 16 times, moving left if the coin comes up heads, and right if it comes up tails
* the final distribution of people around the half-way line will be Gaussian even though the underlying model is binomial
### 4.1.1 Normal by addition
- we can simulate the above example
* to show that the underlying coin-flip is nothing special, we will instead use a random values between -1 and 1 for each person to step
```{r}
pos <- replicate(1e3, sum(runif(16, -1, 1)))
plot(density(pos))
```
```{r}
set.seed(0)
n_steps <- 16
position_data <- tibble(person = 1:1e3) %>%
mutate(position = purrr::map(person, ~ c(0, cumsum(runif(n_steps, -1, 1))))) %>%
unnest(position) %>%
group_by(person) %>%
mutate(step = 0:n_steps) %>%
ungroup()
walks_plot <- position_data %>%
ggplot(aes(x = step, y = position, group = person)) +
geom_line(alpha = 0.2, size = 0.1, color = "dodgerblue") +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0)) +
labs(x = "step", y = "position",
title = "Random-walks")
step_densities <- position_data %>%
filter(step %in% c(4, 8, 16)) %>%
ggplot(aes(x = position)) +
facet_wrap(~ step, scales = "free_y", nrow = 1) +
geom_density(fill = "grey90", color = "grey40") +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.02))) +
labs(x = "position",
y = "density",
title = "Position distribution at several steps")
walks_plot / step_densities + plot_layout(heights = c(3, 2))
```
### 4.1.2 Normal by multiplication
- another example of how to get a normal distribution:
* the growth rate of an organism is influenced by a dozen loci, each with several alleles that code for more growth
* suppose that these loci interact with one another, and each increase growth by a percentage
* therefore, their effects multiply instead of add
* below is a simulation of sampling growth rates
* this distribution is approximately normal because the multiplication of small numbers is approximately the same as addition
```{r}
# A single growth rate
prod(1 + runif(12, 0, 0.1))
growth <- replicate(1e4, prod(1 + runif(12, 0, 0.1)))
dens(growth, norm.comp = TRUE)
```
### 4.1.3 Normal by log-multiplication
- large deviates multiplied together produce Gaussian distributions on the log-scale
```{r}
growth <- replicate(1e4, prod(1 + runif(12, 0, 0.5)))
dens(growth, norm.comp = TRUE, main = "Not scaled")
dens(log(growth), norm.comp = TRUE, main = "Log-scaled")
```
### 4.1.4 Using Gaussian distributions
- we will build models of measurements as aggregations of normal distributions
- this is appropriate because:
* the world is full of approximately normal distributions
* we often are quite ignorant of the underlying distribution so modeling it as a mean and variance is often the best we can do
## 4.2 A language for describing models
- here is an outline of the process commonly used:
1. recognize a set of measurements to predict or understand - the *outcome* variables
2. for each variable, define a likelihood distribution that defines the plausibility of individual observations
* this is always Gaussian for linear regression
3. recognize a set of other measurements to use to predict or understand the outcome - the *predictor* variables
4. relate the shape of the likelihood distribution to the predictor variables
5. choose priors for all parameters in the model; this is the initial state of the model before seeing any data
6. summarize the model with math expressions; for example:
$$
\text{outcome}_i \sim \text{Normal}(\mu_i, \sigma) \\
\mu_i = \beta \times \text{predictor}_i \\
\beta \sim \text{Normal}(0, 10) \\
\sigma \sim \text{HalfCauchy}(0, 1)
$$
### 4.2.1 Re-describing the globe tossing model
- this example was trying to estimate the proportion of water $p$ of a globe by tossing it and counting of how often our finger was on water upon catching the globe
* it could be described as such where $w$ is the number of waters observed, $n$ is the total number of tosses, and $p$ is the proportion of the water on the globe
$$
w \sim \text{Binomial(n,p)} \\
p \sim \text{Uniform}(0,1)
$$
- this should be read as:
* "The count $w$ is distributed binomially with sample size $n$ and probability $p$."
* "The prior for $p$ is assumed to be uniform between zero and one."
## 4.3 A Gaussian model of height
- we will now build a linear regression model
* this section will build the scaffold
* the next will construct the predictor variable
- model a single measurement variable as a Gaussian distribution
* two parameters: $\mu$ = mean; $\sigma$ = standard deviation
* Bayesian updating will consider each possible combination of $\mu$ and $\sigma$ and provide a score for the plausibility of each
### 4.3.1 The data
- we will use the `Howll1` data from the 'rethinking' package
* we will use height information for people older that 18 years
```{r}
data("Howell1")
d <- Howell1
str(d)
```
```{r}
d2 <- d[d$age >= 18, ]
nrow(d2)
```
### 4.3.2 The model
- the goal is to model these values using a Gaussian distribution
```{r}
dens(d2$height)
```
- our model is
$$
h_i \sim \text{Normal}(\mu, \sigma) \quad \text{or} \quad h_i \sim \mathcal{N}(\mu, \sigma)
$$
- the priors for the model parameters are below
* the mean and s.d. for the normal distribution for $\mu$ were just chosen by the author as likely a good guess for the average heights
$$
\mu \sim \mathcal{N}(178, 20) \\
\sigma \sim \text{Uniform}(0, 50)
$$
- it is often good to plot the priors
```{r}
curve(dnorm(x, 178, 20), from = 100, to = 250)
```
```{r}
curve(dunif(x, 0, 50), from = 10, to = 60)
```
- we can sample from the priors to build our "expected" distribution of heights
* its the relative plausibility of different heights before seeing any data
```{r}
sample_mu <- rnorm(1e4, 178, 20)
sample_sigma <- runif(1e4, 0, 50)
prior_h <- rnorm(1e4, sample_mu, sample_sigma)
dens(prior_h)
```
### 4.3.3 Grid approximation of the posterior distribution
- as an example, we will map the posterior distribution using brute force
* later, we will switch to the quadratic approximation that we will use for the next few chapters
* we use a few shortcuts here, including summing the log likelihood instead of multiplying the likelihoods
```{r}
mu_list <- seq(from = 140, to = 160, length.out = 200)
sigma_list <- seq(from = 4, to = 9, length.out = 200)
post <- expand.grid(mu = mu_list, sigma = sigma_list)
head(post)
```
```{r}
set.seed(0)
post$LL <- pmap_dbl(post, function(mu, sigma, ...) {
sum(dnorm(
d2$height,
mean = mu,
sd = sigma,
log = TRUE
))
})
post$prod <- post$LL + dnorm(post$mu, 178, 20, log = TRUE) + dunif(post$sigma, 0, 500, log = TRUE)
post$prob <- exp(post$prod - max(post$prod))
```
```{r}
post %>%
as_tibble() %>%
ggplot(aes(x = mu, y = sigma, z = prob)) +
geom_contour()
```
```{r}
as_tibble(post) %>%
ggplot(aes(x = mu, y = sigma, fill = prob)) +
geom_tile(color = NA) +
scale_fill_viridis_b()
```
### 4.3.4 Sampling from the posterior
- we sample from the posterior just like normal, except now we get pairs of parameters
```{r}
sample_rows <- sample(1:nrow(post),
size = 1e4,
replace = TRUE,
prob = post$prob)
sample_mu <- post$mu[sample_rows]
sample_sigma <- post$sigma[sample_rows]
tibble(mu = sample_mu, sigma = sample_sigma) %>%
ggplot(aes(x = mu, y = sigma)) +
geom_jitter(size = 1, alpha = 0.2, color = "grey30",
width = 0.1, height = 0.1)
```
- now we can describe these parameters just like data
* the distributions are the results
```{r}
tibble(name = c(rep("mu", 1e4), rep("sigma", 1e4)),
value = c(sample_mu, sample_sigma)) %>%
ggplot(aes(x = value)) +
facet_wrap(~ name, nrow = 1, scales = "free") +
geom_density(fill = "grey90", alpha = 0.5)
```
```{r}
cat("HPDI of mu:\n")
HPDI(sample_mu)
cat("\nHPDI of sigma:\n")
HPDI(sample_sigma)
```
4.3.5 Fitting the model with `map()`
**Note that the `map()` function has been changed to `quap()` in the 2nd Edition of the course.**
- now we can use the `quap()` function to conduct the quadratic approximation of the posterior
- recall that this is the model definition:
$$
h_i \sim \text{Normal}(\mu, \sigma) \\
\mu \sim \text{Normal}(178, 20) \\
\sigma \sim \text{Uniform}(0, 50)
$$
- first, we copy this formula into an `alist`
```{r}
formula_list <- alist(
height ~ dnorm(mu, sigma),
mu ~ dnorm(178, 20),
sigma ~ dunif(0, 50)
)
formula_list
```
- then we can fit the model to the data using `quap()` and the data in `d2`
```{r}
m4_1 <- quap(formula_list, data = d2)
summary(m4_1)
```
```{r}
```
### 4.3.6 Sampling from a map fit
- the quadratic approximation to a posterior dist. with multiple parameters is just a multidimensional Gaussian distribution
* therefore, it can be described by its variance-covariance matrix
```{r}
vcov(m4_1)
```
- the variance-covariance matrix tells us how the parameters relate to each other
- it can be decomposed into two pieces:
1. the vector of variances for the parameters
2. a correlation matrix that tells us how the changes in one parameter lead to a correlated change in the others
```{r}
cat("Covariances:\n")
diag(vcov(m4_1))
cat("\nCorrelations:\n")
cov2cor(vcov(m4_1))
```
- instead of sampling single values from a simple Gaussian distribution, we sample vectors of values from a multi-dimensional Gaussian distribution
* the `extract.samples()` function from 'rethinking' does this for us
```{r}
post <- extract.samples(m4_1, n = 1e4)
head(post)
```
```{r}
precis(post)
```
## 4.4 Adding a predictor
- above, we created a Gaussian model of height in a population of adults
* by adding a predictor, we can make a linear regression
* for this example, we will see how height covaries with height
```{r}
d2 %>%
ggplot(aes(x = weight, y = height)) +
geom_point()
```
### 4.4.1 The linear model strategy
- the strategy is to make the parameter for the mean of a Gaussian dist., $\mu$, into a linear function of the predictor variable and other, new parameters we make
- some of the parameters of the linear model indicate the strength of association between the mean of the outcome and the value of the predictor
* the posterior provides relative plausibilities of the different possible strengths of association
- here is the formula for the linear model
* let $x$ be the mathematical name for the weight measurements
* we model the mean $\mu$ as a function of $x$
$$
h_i \sim \text{Normal}(\mu_i, \sigma) \\
\mu_i = \alpha + \beta x_i \\
\alpha \sim \text{Normal}(178, 100) \\
\beta \sim \text{Normal}(0, 10) \\
\sigma \sim \text{Uniform}(0, 50) \\
$$
### 4.4.2 Fitting the model
```{r}
m4_3 <- quap(
alist(
height ~ dnorm(mu, sigma),
mu <- a + b*weight,
a ~ dnorm(178, 100),
b ~ dnorm(0, 10),
sigma ~ dunif(0, 50)
),
data = d2
)
summary(m4_3)
```
4.4.3 Interpreting the model fit
- we can inspect fit models using tables and plots
- the following questions can be answered by plotting posterior distribution and posterior predictions
* whether or not the model fitting procedure worked correctly
* the absolute magnitude, rather than just relative magnitude, of a relationship between outcome and predictor
* the uncertainty around an average relationship
* the uncertainty surrounding the implied predictions of the model
#### 4.4.3.1 Tables of estimate
- models cannot in general be understood by tables of estimates
* only the simplest of models (such as our current example) can be
- here is how to understand the summary results of our weight-to-height model:
* $\beta$ is a slope of 0.90: "a person 1 kg heavier is expected to be 0.90 cm taller"
- 89% of the posterior probability lies between 0.84 and 0.97
- this suggests strong evidence for a positive relationship between weight and height
* $\alpha$ (intercept) indicates that a person of weight 0 should be 114 cm tall
* $\sigma$ informs us of the width of the distribution of heights around the mean
```{r}
precis(m4_3)
```
- we can also inspect the correlation of parameters
* there is strong correlation between `a` and `b` because this is such a simple model: changing the slope would also change the intercept in the opposite direction
* in more complex models, this can hinder fitting the model
```{r}
cov2cor(vcov(m4_3))
```
- one way to avoid correlation is by centering the data
* subtracting the mean of the variable from each value
* this removes the correlation between `a` and `b` in our model
* but also $\alpha$ (`a`, the y-intercept) became the value of the mean of the heights
- this is because the intercept is the value when the predictors are 0 and now the mean of the predictor is 0
```{r}
d2$weight_c <- d2$weight - mean(d2$weight)
m4_4 <- quap(
alist(
height ~ dnorm(mu, sigma),
mu <- a + b*weight_c,
a ~ dnorm(178, 100),
b ~ dnorm(0, 10),
sigma ~ dunif(0, 50)
),
data = d2
)
summary(m4_4)
cov2cor(vcov(m4_4))
```
#### 4.4.3.2 Plotting posterior inference against the data
- we can start by adding the MAP values for the mean height over the actual data
```{r}
a_map <- coef(m4_3)["a"]
b_map <- coef(m4_3)["b"]
d2 %>%
ggplot(aes(x = weight, y = height)) +
geom_point() +
geom_abline(slope = b_map, intercept = a_map, lty = 2, size = 2, color = "tomato")
```
#### 4.4.3.3 Adding uncertainty around the mean
- we could display uncertainty of the model by plotting many lines on the data
```{r}
post <- extract.samples(m4_3)
head(post)
```
```{r}
d2 %>%
ggplot(aes(x = weight, y = height)) +
geom_point(color = "grey30") +
geom_abline(data = post,
aes(slope = b, intercept = a),
alpha = 0.1, size = 0.1, color = "grey70") +
geom_abline(slope = b_map, intercept = a_map,
lty = 2, size = 1.3, color = "tomato")
```
#### 4.4.3.4 Plotting regression intervals and contours
- we can compute any interval using this cloud of regression lines and plot a shaded region around the MAP line
- lets begin by focusing just on a single weight value, 50
* we can make a list of 10,000 values of $\mu$ for an individual who weights 50 kg
```{r}
# mu = a + b * x
mu_at_50 <- post$a + post$b * 50
plot(density(mu_at_50), xlab = "mu | weight=50")
```
- since the posterior for $\mu$ is a distribution, we can calculate the HDPI intervals to find the 89% highest posterior density intervals
```{r}
HPDI(mu_at_50)
```
- we can use the `link()` function from 'rethinking' to sample from the posterior and compute $\mu$ for each case in the data and sample from the posterior
```{r}
mu <- link(m4_3)
# row: sample from the posterior; column: each individual in the data
dim(mu)
mu[1:5, 1:5]
```
- however, we want something slightly different, so we must pass `link()` each value from the x-axis (weight)
```{r}
weight_seq <- seq(25, 70, by = 1)
mu <- link(m4_3, data = data.frame(weight = weight_seq))
dim(mu)
```
```{r}
as_tibble(as.data.frame(mu)) %>%
set_names(as.character(weight_seq)) %>%
pivot_longer(tidyselect::everything(),
names_to = "weight",
values_to = "height") %>%
mutate(weight = as.numeric(weight)) %>%
ggplot(aes(weight, height)) +
geom_point(size = 0.5, alpha = 0.2)
```
- finally, we can get the HPDI at each value of weight
```{r}
mu_mean <- apply(mu, 2, mean)
mu_hpdi <- apply(mu, 2, HPDI, prob = 0.89)
mu_data <- tibble(weight = weight_seq,
mu = mu_mean) %>%
bind_cols(as.data.frame(t(mu_hpdi))) %>%
set_names(c("weight", "height", "hpdi_low", "hpdi_high"))
d2 %>%
ggplot(aes(x = weight, y = height)) +
geom_point() +
geom_ribbon(data = mu_data,
aes(x = weight, ymin = hpdi_low, ymax = hpdi_high),
alpha = 0.5, color = NA, fill = "grey50") +
geom_line(data = mu_data,
aes(weight, height),
color = "tomato", size = 1) +
labs(title = "Bayesian estimate for the relationship between weight and height",
subtitle = "Line is the MAP of the weight and height; ribbon is the 89% HPDI")
```
#### 4.4.3.5 Prediction intervals
- now we can generate an 89% prediction interval for actual heights
* so far we have been looking at the uncertainty in $\mu$ which is the mean for the heights
* there is also $\sigma$ in the equation for heights $h_i \sim \mathcal{N}(\mu_i, \sigma)$
- we can simulate height values for a given weight by sampling from a Gaussian with some mean and standard deviation sampled from the posterior
* this will provide a collection of simulated heights with the uncertainty in the posterior distribution and the Gaussian likelihood of the heights
* we can plot the 89% percentile interval on the simulated data which represents where the model thinks 89% of actual heights in the population at each weight
```{r}
sim_height <- sim(m4_3, data = list(weight = weight_seq))
str(sim_height)
```
```{r}
height_pi <- apply(sim_height, 2, PI, prob = 0.89)
```
```{r}
height_pi_data <- height_pi %>%
t() %>%
as.data.frame() %>%
as_tibble() %>%
set_names(c("low_pi", "high_pi")) %>%
mutate(weight = weight_seq)
simulated_heights <- as.data.frame(sim_height) %>%
as_tibble() %>%
set_names(as.character(weight_seq)) %>%
pivot_longer(tidyselect::everything(),
names_to = "weight",
values_to = "sim_height") %>%
mutate(weight = as.numeric(weight)) %>%
ggplot() +
geom_jitter(aes(x = weight, y = sim_height),
size = 0.2, alpha = 0.1) +
labs(x = "weight",
y = "height",
title = "Simulated heights from the model")
height_pi_ribbon <- d2 %>%
ggplot() +
geom_point(aes(x = weight, y = height)) +
geom_ribbon(data = height_pi_data,
aes(x = weight, ymin = low_pi, ymax = high_pi),
color = NA, fill = "black", alpha = 0.25) +
labs(x = "weight",
y = "height",
title = "89% percentile interval of the heights")
simulated_heights | height_pi_ribbon
```
## 4.5 Polynomial regression
- we can build a model using a curved function of a single predictor
- in the following example, we will use all of the height and weight data in `Howell1`, not just that of adults
```{r}
d %>%
ggplot(aes(x = weight, y = height)) +
geom_point() +
labs(title = "All data from `Howell1`")
```
- polynomials are often discouraged because they are hard to interpret
* it can be better to instead "build the non-linear relationship up from a principled beginning"
* this is explored in the practice problems
- here is the common polynomial regression
* a parabolic model of the mean
$$
\mu_i = \alpha + \beta_1 x_i + \beta_2 x_i^2
$$
- before fitting, we must standardize the predictor variable
* center and divide by std. dev.
* this helps with interpretation because one unit is equivalent to a change of one std. dev.
* this also helps the software fit the model
```{r}
d$weight_std <- (d$weight - mean(d$weight)) / sd(d$weight)
d %>%
ggplot(aes(x = weight_std, y = height)) +
geom_point() +
labs(title = "Standardized data from `Howell1`",
x = "standardized weight")
```
- here is the model we will fit
* it has very weak priors
$$
h_i \sim \text{Normal}(\mu_i, \sigma) \\
\mu_i = \alpha + \beta_1 x_i + \beta_2 x_i^2 \\
\alpha \sim \text{Normal}(178, 100) \\
\beta_1 \sim \text{Normal}(0, 10) \\
\beta_2 \sim \text{Normal}(0, 10) \\
\sigma \sim \text{Uniform}(0, 50)
$$
```{r}
d$weight_std2 <- d$weight_std^2
m4_5 <- quap(
alist(
height ~ dnorm(mu, sigma),
mu <- a + b1*weight_std + b2*weight_std2,
a ~ dnorm(178, 100),
b1 ~ dnorm(0, 10),
b2 ~ dnorm(0, 10),
sigma ~ dunif(0, 50)
),
data = d
)
summary(m4_5)
```
- we have to plot the fit of the model to make sense of these values
```{r}
weight_seq <- seq(from = -2.2, to = 2.0, length.out = 30)
pred_dat <- list(weight_std = weight_seq, weight_std2 = weight_seq^2)
mu <- link(m4_5, data = pred_dat)
mu_mean <- apply(mu, 2, mean)
mu_pi <- apply(mu, 2, PI, prob = 0.89)
sim_height <- sim(m4_5, data = pred_dat)
height_pi <- apply(sim_height, 2, PI, prob = 0.89)
```
```{r}
sim_height_data <- t(height_pi) %>%
as.data.frame() %>%
as_tibble() %>%
set_names(c("pi5", "pi94")) %>%
mutate(weight = weight_seq,
mu = mu_mean)
d %>%
ggplot() +
geom_point(aes(x = weight_std, y = height), color = "grey50") +
geom_line(data = sim_height_data,
aes(x = weight, y = mu),
size = 1, color = "blue") +
geom_ribbon(data = sim_height_data,
aes(x = weight, ymin = pi5, ymax = pi94),
alpha = 0.2, color = NA) +
scale_x_continuous(
labels = function(x) { round(x * sd(d$weight) + mean(d$weight), 1) }
) +
labs(title = "Polynomial model of height by weight",
x = "weight")
```
---
## 4.7 Practice
### Easy
**4E1. In the model definition below, which line is the likelihood?**
$$
y_i \sim \text{Normal}(\mu, \sigma) \quad \text{(likelihood)}\\
\mu \sim \text{Normal}(0, 10) \\
\sigma \sim \text{Uniform}(0, 10)
$$
**4E2. In the model definition just above, how many parameters are in the posterior distribution?**
2
**4E3. Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors.**
$$
\Pr(y|\mu, \sigma) = \frac{\Pr(\mu, \sigma | y) \Pr(y)}{\Pr(\mu, \sigma)}
$$
4E4. In the model definition below, which line is the linear model?
$$
y_i \sim \text{Normal}(\mu, \sigma) \\
\mu = \alpha + \beta x_i \quad \text{(linear model)} \\
\alpha \sim \text{Normal}(0, 10) \\
\beta \sim \text{Normal(0, 1)} \\
\sigma \sim \text{Uniform}(0, 10)
$$
**4E5. In the model definition just above, how many parameters are in the posterior distribution?**
3
### Medium
**4M1. For the model definition below, simulate observed heights from the prior (not the posterior).**
$$
y_i \sim \text{Normal}(\mu, \sigma) \\
\mu \sim \text{Normal}(0, 10) \\
\sigma \sim \text{Uniform}(0, 10) \\
$$
```{r}
mu_prior <- rnorm(1e4, 0, 10)
sigma_prior <- runif(1e4, 0, 10)
heights_prior <- rnorm(1e4, mu_prior, sigma_prior)
plot(density(heights_prior))
```
**4M2. Translate the model just above into a map formula.**
```{r}
alist(
y ~ dnorm(mu, sigma),
mu ~ dnorm(0, 10),
sigma ~ dunif(0, 10)
)
```
**4M3. Translate the map model formula below into a mathematical model definition.**
```{r}
flist <- alist(
y ~ dnorm(mu, sigma),
mu <- a + b*x,
a ~ dnorm(0, 50),
b ~ dunif(0, 10),
sigma ~ dunif(0, 50)
)
```
$$
y_i \sim \text{Normal}(\mu, \sigma) \\
\mu = \alpha + \beta x_i \\
\alpha \sim \text{Normal}(0, 50) \\
\beta \sim \text{Uniform}(0, 10) \\
\sigma \sim \text{Uniform}(0, 50)
$$
**4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors.**
$$
y_i \sim \text{Normal}(\mu, \sigma) \\
\mu = \alpha + \beta x_i \\
\alpha \sim \text{Normal}(0, 100) \\
\beta \sim \text{Normal}(0, 10) \\
\sigma \sim \text{Uniform}(0, 50)
$$
**4M5. Now suppose I tell you that the average height in the first year was 120 cm and that every student got taller each year. Does this information lead you to change your choice of priors? How?**
I would make the prior for $\beta$ have a positive mean.
$$
\beta \sim \text{Normal}(1, 10)
$$
**4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?**
I would reduce the standard deviation for the prior distribution of $alpha$.
$$
\alpha \sim \text{Normal}(0, 20)
$$
### Hard
**4H1. The weights listed below were recorded in the !Kung census, but heights were not recorded for these individuals. Provide predicted heights and 89% intervals (either HPDI or PI) for each of these individuals. That is, fill in the table below, using model-based predictions.**
```{r}
# Model created and fit in the notes above.
m4_5 <- quap(
alist(
height ~ dnorm(mu, sigma),
mu <- a + b1*weight_std + b2*weight_std2,
a ~ dnorm(178, 100),
b1 ~ dnorm(0, 10),
b2 ~ dnorm(0, 10),
sigma ~ dunif(0, 50)
),
data = d
)
new_data <- tibble(weight = c(46.95, 43.72, 64.78, 32.59, 54.63)) %>%
mutate(individual = 1:n(),
weight_std = (weight - mean(d$weight)) / sd(d$weight),
weight_std2 = weight_std^2)
new_pred <- link(m4_5, new_data, n = 1e3)
new_data %>%
mutate(expected_height = apply(new_pred, 2, chainmode),
hpdi = apply(new_pred, 2, function(x) {
pi <- round(HPDI(x), 2)
glue("{pi[[1]]} - {pi[[2]]}")
})) %>%
select(individual, weight, expected_height, hpdi)
```
**4H2. Select out all the rows in the Howelll data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.**
```{r}
qh2_data <- Howell1[Howell1$age < 18, ]
dim(qh2_data)
qh2_data %>%
ggplot(aes(x = weight, y = height)) +
geom_point()
```
**(a) Fit a linear regression to these data, using map. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?**
```{r}
qh2_data$weight_std <- (qh2_data$weight - mean(qh2_data$weight)) / sd(qh2_data$weight)
qh2_model <- quap(
alist(
height ~ dnorm(mu, sigma),
mu <- a + b * weight_std,
a ~ dnorm(100, 20),
b ~ dnorm(10, 20),
sigma ~ dunif(0, 10)
),
data = qh2_data
)
summary(qh2_model)
```
```{r}
# Change in 10 original units of weight.
weight_seq <- c(10, 20)
weight_seq_std <- (weight_seq - mean(qh2_data$weight)) / sd(qh2_data$weight)
diff(coef(qh2_model)["a"] + weight_seq_std * coef(qh2_model)["b"])
```
**(b) Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Superimpose the MAP regression line and 89% HPDI for the mean. Also superimpose the 89% HPDI for predicted heights.**
```{r}
weight_seq <- seq(range(qh2_data$weight_std)[[1]],
range(qh2_data$weight_std)[[2]],
length.out = 100)
mu_est <- link(qh2_model, data = data.frame(weight_std = weight_seq))
mu_mean <- apply(mu_est, 2, mean)
mu_hpdi <- apply(mu_est, 2, function(x) {
y <- HPDI(x)
tibble(hpdi_low = y[[1]], hpdi_high = y[[2]])
}) %>%
bind_rows()
mu_pred <- tibble(weight_std = weight_seq,
mu = mu_mean) %>%
bind_cols(mu_hpdi) %>%
mutate(weight = weight_std * sd(qh2_data$weight) + mean(qh2_data$weight))
qh2_data %>%
ggplot() +
geom_point(aes(x = weight, y = height)) +
geom_ribbon(data = mu_pred,
aes(x = weight, ymin = hpdi_low, ymax = hpdi_high),
fill = "black", alpha = 0.2, color = NA) +
geom_line(data = mu_pred,
aes(x = weight, y = mu),
color = "dodgerblue", lty = 2, size = 1.2)
```
**(c) What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.**
I really don't like how confident the model is in its predictions of height.
It does not appear to be a very good fit for this data.
I would change the model by maybe adding a polynomial or other predictors such as age.
**4H3. Suppose a colleague of yours, who works on allometry, glances at the practice problems just above. Your colleague exclaims, “That’s silly. Everyone knows that it’s only the logarithm of body weight that scales with height!” Let’s take your colleague’s advice and see what happens.**
**(a) Model the relationship between height (cm) and the natural logarithm of weight (log-kg). Use the entire Howelll data frame, all 544 rows, adults and non-adults. Fit this model, using quadratic approximation:**
$$
h_i \sim \text{Normal}(\mu_i, \sigma) \\
\mu_i = \alpha + \beta \log(w_i) \\
\alpha \sim \text{Normal}(178, 100) \\
\beta \sim \text{Normal}(0, 100) \\
\sigma \sim \text{Uniform}(0, 50)
$$
**where $h_i$ is the height of individual $i$ and $w_i$ is the weight (in kg) of individual $i$. The function for computing a natural log in R is just `log`. Can you interpret the resulting estimates?**
```{r}
qh3_data <- Howell1 %>%
mutate(weight_ln = log(weight),
weight_ln_std = (weight_ln - mean(weight_ln)) / sd(weight_ln))
qh3_model <- quap(
alist(
height ~ dnorm(mu, sigma),
mu <- a + b * weight_ln_std,
a ~ dnorm(178, 100),
b ~ dnorm(0, 100),
sigma ~ dunif(0, 50)
),
data = qh3_data
)
summary(qh3_model)
```
**(b) Begin with this plot:**
```r
plot(height ~ weight , data = Howell1, col = col.alpha(rangi2, 0.4))
```
**Then use samples from the quadratic approximate posterior of the model in (a) to superimpose on the plot: (1) the predicted mean height as a function of weight, (2) the 97% HPDI for the mean, and (3) the 97% HPDI for predicted heights.**
```{r}
weight_seq <- seq(range(qh3_data$weight_ln_std)[[1]],
range(qh3_data$weight_ln_std)[[2]],
length.out = 100)
mu_est <- link(qh3_model, data = data.frame(weight_ln_std = weight_seq))
mu_mean <- apply(mu_est, 2, mean)
mu_hpdi <- apply(mu_est, 2, function(x) {
y <- HPDI(x, prob = 0.97)
tibble(hpdi_low = y[[1]], hpdi_high = y[[2]])
}) %>%
bind_rows()
sim_height <- sim(qh3_model, data = list(weight_ln_std = weight_seq), n = 1e4)
height_pi <- apply(sim_height, 2, function(x) {
y <- HPDI(x, prob = 0.97)