ch01_intro.qmd

# Introduction {#intro}

```{r }
library(conflicted)
library(dplyr, quietly = TRUE)
library(ggplot2, quietly = TRUE)
library(ggdag, quietly = TRUE)
options(dplyr.summarise.inform = FALSE)
conflicts_prefer(dplyr::filter)
```

## A Brief History

## Data Examples

### Mortality Rates by Country

This dataset is available with `fciR::mortality`. The summary table is

```{r}
#| label: tbl-ch01_mortality
#| tbl-cap: "Mortality Rates by Age and Country"
data("mortality", package = "fciR")
mortality |>
  gt::gt()
```

### National Center for Education Statistics

This dataset is available with `fciR::nces`.

The statistical summary is

```{r}
#| label: tbl-ch01_nces_skim
data("nces", package = "fciR")
nces |>
  skimr::skim()
```

and the frequency table is

```{r}
#| label: tbl-ch01_nces
#| tbl-cap: "NCES Data"
nces |>
  dplyr::count(selective, female, highmathsat) |>
  gt::gt()
```

### Reducing Alcohol Consumption

#### The What-If? Study

This dataset is available with `fciR::whatifdat`.

The statistical summary is

```{r}
#| label: ch01_whatifdat
data("whatifdat", package = "fciR")
whatifdat |>
  skimr::skim()
```

The frequency table is

```{r}
#| label: tbl-ch01_whatifdat
#| tbl-cap: "The What-If? Study"
whatifdat |>
  dplyr::count(`T`, A, H, Y) |>
  gt::gt()
```

##### The Double What-If? Study

This dataset is available with `fciR::doublewhatifdat`.

The statistical summary is

```{r}
#| label: ch01_doublewhatifdat
data("doublewhatifdat", package = "fciR")
doublewhatifdat |>
  skimr::skim()
```

The DAG for the *Double What-If?* study in the `dagitty` version is

```{r}
#| label: fig-ch01_DoubleWhatIf
#| fig-cap: "The Double What-If? Study"
scm <- list()
scm <- within(scm, {
  the_nodes <- c("U" = "Unmeasured, healthy behavior (U=1)", 
                 "AD0" = "Adherence time 0", 
                 "VL0" = "Viral Load time 0", 
                 "T" = "Naltrexone (T=1)", 
                 "A" = "Reduced drinking (A=1)", 
                 "AD1" = "Adherence time 1", 
                 "VL1" = "Viral Load time 1")
  coords <- data.frame(
    name = names(the_nodes),
    x = c(2, 3, 4, 1, 2, 3, 4),
    y = c(2, 2, 2, 1, 1, 1, 1)
  )
  dag <- dagify(
    AD0 ~ U,
    VL0 ~ AD0,
    A ~ `T` + U,
    AD1 ~ A,
    VL1 ~ AD0 + AD1 + U,
  outcome = "VL1",
  exposure = "T",
  latent = "U",
  coords = coords,
  labels = the_nodes)
  
  # this is the only technique known to have a subscript in a DAG
  # IMPORTANT: the expression must be exactly in alphabetical order
  the_text_labels <- c(
    expression(bold(A)), expression(bold(AD[0])),expression(bold(AD[1])),
    expression(bold(T)), expression(bold(U)), expression(bold(VL[0])), 
    expression(bold(VL[1])))
  
  # status' colors
  colrs <- c("latent" = "palevioletred", "exposure" = "mediumspringgreen", 
             "outcome" = "cornflowerblue")
  # plot the DAG
  plot <- dag |> 
    tidy_dagitty() |>
    ggdag_status(color = status, text = FALSE) +
    geom_dag_text(size = 5, color = "white", fontface = "bold",
      parse = TRUE, label = the_text_labels) +
    scale_color_manual(values = colrs, na.value = "honeydew3") +
    scale_fill_manual(values = colrs, na.value = "honeydew3") +
    ggdag::theme_dag_blank(panel.background = 
                             element_rect(fill="snow", color="snow")) +
    theme(title = element_text(color = "darkblue"),
          legend.position = "bottom",
          legend.title = element_blank()) +
    labs(title = "The Double What-If? Study")
})
scm$plot
```

and the code for `doublewhatifsim.R` is

```{r }
#| file: "lib\\fci_01-B_doublewhatifsim.R"
```

### General Social Survey

This dataset is available with `fciR::gss`.

The statistical summary is

```{r}
#| label: ch01_gss
data("gss", package = "fciR")
gss |>
  skimr::skim()
```

### A Cancer Clinical Trial

This dataset is available with `fciR::cogdat`.

The statistical summary is

```{r}
#| label: ch01_cogat
data("cogdat", package = "fciR")
cogdat |>
  skimr::skim()
```

and the frequency table is

```{r}
#| label: tbl-ch01_cogdat
#| tbl-cap: "A Hypothetical Cancer Clinical Trial"
df <- cogdat |>
  filter(Y == 1) |>
  count(A1, H2, A2, name = "nY")
cogdat |>
  count(A1, H2, A2) |>
  left_join(df) |>
  mutate(nY = dplyr::if_else(is.na(nY), 0, nY),
         prop = round(nY / n, 2)) |>
  gt::gt()
rm(df)
```

## Exercises

{{< include _warn_ex.qmd >}}