diff --git a/_freeze/0_unpublished/danger_disturb/execute-results/html.json b/_freeze/0_unpublished/danger_disturb/execute-results/html.json index 48f47fe..b759c42 100644 --- a/_freeze/0_unpublished/danger_disturb/execute-results/html.json +++ b/_freeze/0_unpublished/danger_disturb/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "586e661582509636cc5191b23eecc79c", + "hash": "5492d2e93a488a3d0f09c2e8a19ce849", "result": { - "markdown": "---\ntitle: \"Code Along With Me (Episode 1)\"\nsubtitle: \"An assessment of the livestock numbers in the six counties declared to be 'dangerous and disturbed' in Kenya\"\nauthor: \"William Okech\"\ndate: \"2023-11-28\"\nimage: \"\"\ncategories: [RStudio, R, Tutorial, Blog]\ntoc: true\nformat: \n html:\n code-fold: show\n code-overflow: wrap\n warning: false\n---\n\n\n# Introduction\n\nIn February 2023, the government of Kenya described six counties as \"disturbed\" and \"dangerous.\" This is because in the preceding six months, over 100 civilians and 16 police officers have lost their lives as criminals engaged in banditry and livestock theft have terrorized the rural villages. One of the main causes of the conflict is livestock theft, therefore, the goal of this analysis is to perform an exploratory data analysis of livestock numbers from the Kenya Population and Housing Census (2019) report.\n\nReferences\n\n1. [Nation Media News Brief](https://nation.africa/kenya/news/killing-fields-kdf-police-versus-bandits-who-will-prevail--4122912)\n2. [Citizen TV Summary](https://www.youtube.com/watch?v=nOqzHeeSS2A)\n\n# Section 1: Load all the required libraries\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse) # a collection of packages used to model, transform, and visualize data\nlibrary(rKenyaCensus) # tidy datasets obtained from the Kenya Population and Housing Census results\nlibrary(patchwork) # combine separate ggplots into the same graphic\nlibrary(janitor) # initial data exploration and cleaning for a new data set\nlibrary(ggrepel)# repel overlapping text labels\nlibrary(ggthemes) # Extra Themes, Scales and Geoms for 'ggplot2'\nlibrary(scales) # tools to override the default breaks, labels, transformations and palettes\n# install.packages(\"treemapify\")\nlibrary(treemapify) # allows the creation of treemaps in ggplot2\n\nlibrary(sf) # simple features, a method to encode spatial vector data\n#install.packages(\"devtools\")\nlibrary(devtools) # helps to install packages not on CRAN\n#devtools::install_github(\"yutannihilation/ggsflabel\")\nlibrary(ggsflabel) # place labels on maps\n\nlibrary(knitr) # a tool for dynamic report generation in R\n#install.packages(\"kableExtra\")\nlibrary(kableExtra) # build common complex tables and manipulate table styles\n```\n:::\n\n\nNote: If you have package loading issues check the timeout with getOption('timeout') and use options(timeout = ) to increase package loading time.\n\n# Section 2: Create a map of the \"dangerous and disturbed\" counties\n\nThe rKenyaCensus package includes a built-in county boundaries dataset to facilitate mapping of the various indicators in the Census. The required shapefile for this analysis is KenyaCounties_SHP\n\n## a) Sample plot of the map of Kenya\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load the shapefile\nkenya_counties_sf <- st_as_sf(KenyaCounties_SHP)\n\n# Plot the map of Kenya\np0 <- ggplot(kenya_counties_sf) + \n geom_sf(fill = \"bisque2\", linewidth = 0.6, color = \"black\") + \n theme_void()\np0\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-2-1.png){width=672}\n:::\n:::\n\n\n## b) Highlight the dangerous and disturbed counties in Kenya\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# First, remove the \"/\" from the county names\n\nkenya_counties_sf$County <- gsub(\"/\", \n \" \", \n kenya_counties_sf$County)\n\n# Select the six counties to highlight\nhighlight_counties <- c(\"TURKANA\", \"WEST POKOT\", \"ELGEYO MARAKWET\", \"BARINGO\", \"LAIKIPIA\", \"SAMBURU\")\n\n# Filter the counties dataset to only include the highlighted counties\nhighlighted <- kenya_counties_sf %>% filter(County %in% highlight_counties)\n\n# Plot the highlighted counties in the map\np1 <- ggplot() + \n geom_sf(data = kenya_counties_sf, fill = \"bisque2\", linewidth = 0.6, color = \"black\") + \n geom_sf(data = highlighted, fill = \"chocolate4\", linewidth = 0.8, color = \"black\") +\n theme_void()\np1\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-3-1.png){width=672}\n:::\n:::\n\n\n## c) Plot only the required counties\n\n\n::: {.cell}\n\n```{.r .cell-code}\np2 <- ggplot(data = highlighted) +\n geom_sf(aes(fill = County), linewidth = 1, show.legend = FALSE) +\n geom_label_repel(aes(label = County, geometry = geometry), size = 3,\n stat = \"sf_coordinates\",\n force=10, # force of repulsion between overlapping text labels\n seed = 1,\n segment.size = 0.75,\n min.segment.length=0) +\n scale_fill_brewer(palette = \"OrRd\") +\n labs(title = \"\",\n caption = \"\") +\n theme(plot.title = element_text(family = \"Helvetica\",size = 10, hjust = 0.5),\n legend.title = element_blank(),\n legend.position = \"none\",\n plot.caption = element_text(family = \"Helvetica\",size = 12)) +\n theme_void() \np2\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n\n```{.r .cell-code}\n# Notes: geom_label_repel() with geometry and stat defined can be used as an \n# alternative to geom_sf_label_repel()\n```\n:::\n\n\n## d) Combine the plots using patchwork to clearly highlight the counties of interest\n\n\n::: {.cell}\n\n```{.r .cell-code}\np1 + \n p2 + \n plot_annotation(title = \"\",\n subtitle = \"\",\n caption = \"\",\n theme = theme(plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 25),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\")))\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n\n# Section 3: Load the livestock data from the census report and generate the dataframes required for analysis.\n\n## a) View the data available in the data catalogue\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"DataCatalogue\")\n```\n:::\n\n\n## b) Load the livestock data\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Select the livestock data from the census report\ndf_livestock <- V4_T2.24\nlivestock <- df_livestock[2:393, ]\nlivestock <- livestock %>%\n clean_names()\n\n# Remove the \"/\" from the county names in the dataset\nlivestock$county <- gsub(\"/\", \" \", livestock$county)\nlivestock$sub_county <- gsub(\"/\", \" \", livestock$sub_county)\n\n# Select the variables of interest from the dataset\n# These include the county, subcounty, land area, number of farming households, \n# sheep, goats, and indigenous cattle.\n\n# New variables listed below include:\n# pasto_livestock is the total number of sheep, goats, and indigenous cattle\n# ind_cattle_household is the number of indigenous cattle per household\n# goats_household is the number of goats per household\n# sheep_household is the number of sheep per household\n# pasto_livestock_household is the number of pastoral livestock per household\n\nlivestock_select <- livestock %>%\n select(county, sub_county, admin_area, farming, sheep, goats, indigenous_cattle) %>%\n mutate(pasto_livestock = sheep + goats + indigenous_cattle) %>% \n mutate(ind_cattle_household = round(indigenous_cattle/farming)) %>%\n mutate(goats_household = round(goats/farming)) %>%\n mutate(sheep_household = round(sheep/farming)) %>%\n mutate(pasto_livestock_household = round(pasto_livestock/farming))\n```\n:::\n\n\n## c) Filter data for the selected \"disturbed and dangerous\" counties\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Select the data for the \"dangerous and disturbed\" counties\ndan_dist <- c(\"TURKANA\", \"WEST POKOT\", \"ELGEYO MARAKWET\", \"BARINGO\", \"LAIKIPIA\", \"SAMBURU\")\n\nlivestock_select_county <- livestock_select %>%\n filter(admin_area == \"County\") %>%\n filter(county %in% dan_dist)\n\n# Select subcounty data for the \"disturbed and dangerous\" counties\nlivestock_select_subcounty <- livestock_select %>%\n filter(admin_area == \"SubCounty\") %>%\n filter(county %in% dan_dist)\n\n# Create an area dataset for the \"dangerous and disturbed\" counties\ndf_land_area <- V1_T2.7\nland_area <- df_land_area[2:396,]\nland_area <- land_area %>%\n clean_names()\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create a function to remove the \"/\", \" County\" from label, and change the label to UPPERCASE\n\nclean_county_names <- function(dataframe, column_name) {\n dataframe[[column_name]] <- toupper(gsub(\"/\", \" \", gsub(\" County\", \"\", dataframe[[column_name]])))\n return(dataframe)\n}\n\nland_area <- clean_county_names(land_area, 'county')\nland_area <- clean_county_names(land_area, 'sub_county')\n\n# The code above does the processes listed below:\n\n# land_area$county <- gsub(\"/\", \" \", land_area$county)\n# land_area$county <- gsub(\" County\", \"\", land_area$county)\n# land_area$county <- toupper(land_area$county)\n# land_area$sub_county <- toupper(land_area$sub_county)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Obtain the area data for \"disturbed and dangerous\" counties\nland_area_county <- land_area %>%\n filter(admin_area == \"County\") %>%\n select(county, land_area_in_sq_km) %>%\n filter(county %in% dan_dist)\n\n# Get the subcounty area data for \"disturbed and dangerous\" counties\nland_area_subcounty <- land_area %>%\n filter(admin_area == \"SubCounty\") %>%\n select(county, sub_county, land_area_in_sq_km) %>%\n filter(county %in% dan_dist) %>%\n select(-county)\n```\n:::\n\n\n## d) Create the final datasets to be used for analysis. Use inner_join() and creating new variables.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create a county dataset with area and livestock numbers for the disturbed and dangerous regions\n\nlivestock_area_county <- inner_join(livestock_select_county, land_area_county, by = \"county\")\n\n# New variables listed below include:\n# ind_cattle_area is the number of indigenous cattle per area_in_sq_km\n# goats_area is the number of goats per household per area_in_sq_km\n# sheep_area is the number of sheep per area_in_sq_km\n# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km\n\nlivestock_area_county <- livestock_area_county %>%\n mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),\n sheep_area = round(sheep/land_area_in_sq_km),\n goats_area = round(goats/land_area_in_sq_km),\n pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))\n\n# Create a subcounty dataset with area and livestock numbers\n# for the disturbed and dangerous regions\n\nlivestock_area_subcounty <- inner_join(livestock_select_subcounty, land_area_subcounty, by = \"sub_county\")\n\n# New variables listed below include:\n# ind_cattle_area is the number of indigenous cattle per area_in_sq_km\n# goats_area is the number of goats per household per area_in_sq_km\n# sheep_area is the number of sheep per area_in_sq_km\n# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km\n\nlivestock_area_subcounty <- livestock_area_subcounty %>%\n mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),\n sheep_area = round(sheep/land_area_in_sq_km),\n goats_area = round(goats/land_area_in_sq_km),\n pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))\n```\n:::\n\n\n# Section 4: Create a table with land area (sq. km) for the six counties\n\n**These six counties cover approximately one-fifth (1/5) of Kenya**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_county %>%\n select(county, land_area_in_sq_km) %>%\n mutate(county = str_to_title(county)) %>%\n arrange(desc(land_area_in_sq_km)) %>%\n adorn_totals(\"row\") %>%\n rename(\"County\" = \"county\",\n \"Land Area (sq. km)\" = \"land_area_in_sq_km\") %>%\n kbl(align = \"c\") %>%\n kable_classic() %>% \n row_spec(row = 0, font_size = 21, color = \"white\", background = \"#000000\") %>%\n row_spec(row = c(1:7), font_size = 15) %>%\n row_spec(row = 6, extra_css = \"border-bottom: 1px solid;\") %>%\n row_spec(row = 7, bold = T)\n```\n\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
County Land Area (sq. km)
Turkana 68232.9
Samburu 21065.1
Baringo 10976.4
Laikipia 9532.2
West Pokot 9123.2
Elgeyo Marakwet 3032.0
Total 121961.8
\n\n`````\n:::\n:::\n\n\n# Section 5: Perform an exploratory data analysis to gain key insights about the data\n\n## a) Number of Farming Households at the County Level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlac_1 <- livestock_area_county %>%\n ggplot() + \n geom_col(aes(x= reorder(county, farming), y = farming, fill = county)) +\n geom_text(aes(x = county, y = 0, label = county),\n hjust = 0, nudge_y = 0.25) +\n geom_text(aes(x = county, y = farming, label = farming),\n hjust = 1, nudge_y = 15000) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() + \n labs(x = \"County\",\n y = \"Number of Farming Households\",\n title = \"\",\n subtitle = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma) +\n theme(axis.title.x =element_blank(), \n axis.title.y =element_blank(),\n axis.text.x = element_blank(),\n axis.text.y = element_blank(),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_blank(),\n legend.position = \"none\",\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"))\n\n# Create a patchwork plot with the map and the bar graph\n\nlac_1 + p2\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-13-1.png){width=672}\n:::\n:::\n\n\n## b) Number of Farming Households at the Subcounty Level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_subcounty %>%\n ggplot() + \n geom_col(aes(x= reorder(sub_county, farming), y = farming, fill = county), width = 0.95) + geom_text(aes(x = sub_county, y = farming, label = farming),\n hjust = 1, nudge_y = 2500, size = 4) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() +\n guides(fill = guide_legend(nrow = 1)) +\n labs(x = \"County\",\n y = \"Number of Farming Households\",\n fill = \"County\",\n title = \"\",\n subtitle = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(),\n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_text(size = 10),\n legend.text=element_text(size=8),\n legend.position = \"top\")\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-14-1.png){width=864}\n:::\n:::\n\n\n## c) Number of pastoral livestock per county\n\n### Treemap\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(livestock_area_county, aes(area = pasto_livestock, fill = county, label = comma(pasto_livestock)\n )) +\n geom_treemap() +\n geom_treemap_text(colour = \"black\",\n place = \"centre\",\n size = 24) +\n scale_fill_brewer(palette = \"OrRd\") +\n labs(x = \"\",\n y = \"\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n theme(axis.title.x =element_text(size = 20),\n axis.title.y =element_text(size = 20),\n axis.text.x = element_text(size = 15),\n axis.text.y = element_text(size = 15),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 28),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n legend.title = element_blank(),\n legend.text=element_text(size=12),\n legend.position = \"bottom\") \n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-15-1.png){width=672}\n:::\n:::\n\n\n## d) Number of pastoral livestock per subcounty\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_subcounty %>%\n ggplot() + \n geom_col(aes(x= reorder(sub_county, pasto_livestock), y = pasto_livestock, fill = county), width = 0.95) + \n geom_text(aes(x = sub_county, y = pasto_livestock, label = pasto_livestock),\n hjust = 1, nudge_y = 125000) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() + \n guides(fill = guide_legend(nrow = 1)) +\n labs(x = \"Subcounty\",\n y = \"Number of Pastoral Livestock\",\n fill = \"County\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(),\n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_text(size = 10),\n legend.text=element_text(size=8),\n legend.position = \"top\")\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-16-1.png){width=864}\n:::\n:::\n\n\n## e) Number of pastoral livestock per household at the county level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlac_2 <- livestock_area_county %>%\n ggplot() + \n geom_col(aes(x= reorder(county, pasto_livestock_household), y = pasto_livestock_household, fill = county)) + \ngeom_text(aes(x = county, y = pasto_livestock_household, label = pasto_livestock_household), hjust = 1, nudge_y = 5) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() + \n labs(x = \"County\",\n y = \"Number of Pastoral Livestock per household\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(), \n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_blank(),\n legend.position = \"none\",\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"))\n\n# Create a patchwork plot with the map and the bar graph\n\nlac_2 + p2\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-17-1.png){width=672}\n:::\n:::\n\n\n## f) Number of pastoral livestock per household at the subcounty level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_subcounty %>%\n ggplot() + \n geom_col(aes(x= reorder(sub_county, pasto_livestock_household), y = pasto_livestock_household, fill = county), width = 0.95) + \n scale_fill_brewer(palette = \"OrRd\") +\n geom_text(aes(x = sub_county, y = pasto_livestock_household, label = pasto_livestock_household),\n hjust = 1, nudge_y = 5) +\n coord_flip() + \n guides(fill = guide_legend(nrow = 1)) +\n labs(x = \"Subcounty\",\n y = \"Number of Pastoral Livestock per household\",\n fill = \"County\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(),\n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_text(size = 10),\n legend.text=element_text(size=8),\n legend.position = \"top\")\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-18-1.png){width=864}\n:::\n:::\n\n\n# Section 5: Conclusion\n\nThe goal of this analysis was to look at pastoral livestock distributions....\n\nShow the reader how to plot....\n\nConclude that...", + "markdown": "---\ntitle: \"Code Along With Me (Episode 1)\"\nsubtitle: \"An assessment of the livestock numbers in the six counties declared to be 'disturbed and dangerous' in Kenya\"\nauthor: \"William Okech\"\ndate: \"2023-11-28\"\nimage: \"\"\ncategories: [RStudio, R, Tutorial, Blog]\ntoc: true\nformat: \n html:\n code-fold: show\n code-overflow: wrap\n warning: false\n---\n\n\n# Introduction\n\nIn February 2023, the government of Kenya described six counties as \"disturbed\" and \"dangerous.\" This is because in the preceding six months, over 100 civilians and 16 police officers have lost their lives, as criminals engaged in banditry have terrorized the rural villages. One of the main causes of the conflict is livestock theft, therefore, my goal is to perform an exploratory data analysis of livestock numbers from the Kenya Population and Housing Census (2019) report.\n\nReferences\n\n1. [Nation Media News Brief](https://nation.africa/kenya/news/killing-fields-kdf-police-versus-bandits-who-will-prevail--4122912)\n2. [Citizen TV Summary](https://www.youtube.com/watch?v=nOqzHeeSS2A)\n\n# Section 1: Load all the required libraries\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse) # a collection of packages used to model, transform, and visualize data\nlibrary(rKenyaCensus) # tidy datasets obtained from the Kenya Population and Housing Census results\nlibrary(patchwork) # combine separate ggplots into the same graphic\nlibrary(janitor) # initial data exploration and cleaning for a new data set\nlibrary(ggrepel)# repel overlapping text labels\nlibrary(ggthemes) # Extra Themes, Scales and Geoms for 'ggplot2'\nlibrary(scales) # tools to override the default breaks, labels, transformations and palettes\n# install.packages(\"treemapify\")\nlibrary(treemapify) # allows the creation of treemaps in ggplot2\n\nlibrary(sf) # simple features, a method to encode spatial vector data\n#install.packages(\"devtools\")\nlibrary(devtools) # helps to install packages not on CRAN\n#devtools::install_github(\"yutannihilation/ggsflabel\")\nlibrary(ggsflabel) # place labels on maps\n\nlibrary(knitr) # a tool for dynamic report generation in R\n#install.packages(\"kableExtra\")\nlibrary(kableExtra) # build common complex tables and manipulate table styles\n```\n:::\n\n\nNote: If you have package loading issues check the timeout with getOption('timeout') and use options(timeout = ) to increase package loading time.\n\n# Section 2: Create a map of the \"dangerous and disturbed\" counties\n\nThe rKenyaCensus package includes a built-in county boundaries dataset to facilitate mapping of the various indicators in the Census. The required shapefile for this analysis is KenyaCounties_SHP\n\n## a) Sample plot of the map of Kenya\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load the shapefile\nkenya_counties_sf <- st_as_sf(KenyaCounties_SHP)\n\n# Plot the map of Kenya\np0 <- ggplot(kenya_counties_sf) + \n geom_sf(fill = \"bisque2\", linewidth = 0.6, color = \"black\") + \n theme_void()\np0\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-2-1.png){width=672}\n:::\n:::\n\n\n## b) Highlight the dangerous and disturbed counties in Kenya\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# First, remove the \"/\" from the county names\n\nkenya_counties_sf$County <- gsub(\"/\", \n \" \", \n kenya_counties_sf$County)\n\n# Select the six counties to highlight\nhighlight_counties <- c(\"TURKANA\", \"WEST POKOT\", \"ELGEYO MARAKWET\", \"BARINGO\", \"LAIKIPIA\", \"SAMBURU\")\n\n# Filter the counties dataset to only include the highlighted counties\nhighlighted <- kenya_counties_sf %>% filter(County %in% highlight_counties)\n\n# Plot the highlighted counties in the map\np1 <- ggplot() + \n geom_sf(data = kenya_counties_sf, fill = \"bisque2\", linewidth = 0.6, color = \"black\") + \n geom_sf(data = highlighted, fill = \"chocolate4\", linewidth = 0.8, color = \"black\") +\n theme_void()\np1\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-3-1.png){width=672}\n:::\n:::\n\n\n## c) Plot only the required counties\n\n\n::: {.cell}\n\n```{.r .cell-code}\np2 <- ggplot(data = highlighted) +\n geom_sf(aes(fill = County), linewidth = 1, show.legend = FALSE) +\n geom_label_repel(aes(label = County, geometry = geometry), size = 3,\n stat = \"sf_coordinates\",\n force=10, # force of repulsion between overlapping text labels\n seed = 1,\n segment.size = 0.75,\n min.segment.length=0) +\n scale_fill_brewer(palette = \"OrRd\") +\n labs(title = \"\",\n caption = \"\") +\n theme(plot.title = element_text(family = \"Helvetica\",size = 10, hjust = 0.5),\n legend.title = element_blank(),\n legend.position = \"none\",\n plot.caption = element_text(family = \"Helvetica\",size = 12)) +\n theme_void() \np2\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n\n```{.r .cell-code}\n# Notes: geom_label_repel() with geometry and stat defined can be used as an \n# alternative to geom_sf_label_repel()\n```\n:::\n\n\n## d) Combine the plots using patchwork to clearly highlight the counties of interest\n\n\n::: {.cell}\n\n```{.r .cell-code}\np1 + \n p2 + \n plot_annotation(title = \"\",\n subtitle = \"\",\n caption = \"\",\n theme = theme(plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 25),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\")))\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n\n# Section 3: Load the livestock data from the census report and generate the dataframes required for analysis.\n\n## a) View the data available in the data catalogue\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"DataCatalogue\")\n```\n:::\n\n\n## b) Load the livestock data\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Select the livestock data from the census report\ndf_livestock <- V4_T2.24\nlivestock <- df_livestock[2:393, ]\nlivestock <- livestock %>%\n clean_names()\n\n# Remove the \"/\" from the county names in the dataset\nlivestock$county <- gsub(\"/\", \" \", livestock$county)\nlivestock$sub_county <- gsub(\"/\", \" \", livestock$sub_county)\n\n# Select the variables of interest from the dataset\n# These include the county, subcounty, land area, number of farming households, \n# sheep, goats, and indigenous cattle.\n\n# New variables listed below include:\n# pasto_livestock is the total number of sheep, goats, and indigenous cattle\n# ind_cattle_household is the number of indigenous cattle per household\n# goats_household is the number of goats per household\n# sheep_household is the number of sheep per household\n# pasto_livestock_household is the number of pastoral livestock per household\n\nlivestock_select <- livestock %>%\n select(county, sub_county, admin_area, farming, sheep, goats, indigenous_cattle) %>%\n mutate(pasto_livestock = sheep + goats + indigenous_cattle) %>% \n mutate(ind_cattle_household = round(indigenous_cattle/farming)) %>%\n mutate(goats_household = round(goats/farming)) %>%\n mutate(sheep_household = round(sheep/farming)) %>%\n mutate(pasto_livestock_household = round(pasto_livestock/farming))\n```\n:::\n\n\n## c) Filter data for the selected \"disturbed and dangerous\" counties\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Select the data for the \"dangerous and disturbed\" counties\ndan_dist <- c(\"TURKANA\", \"WEST POKOT\", \"ELGEYO MARAKWET\", \"BARINGO\", \"LAIKIPIA\", \"SAMBURU\")\n\nlivestock_select_county <- livestock_select %>%\n filter(admin_area == \"County\") %>%\n filter(county %in% dan_dist)\n\n# Select subcounty data for the \"disturbed and dangerous\" counties\nlivestock_select_subcounty <- livestock_select %>%\n filter(admin_area == \"SubCounty\") %>%\n filter(county %in% dan_dist)\n\n# Create an area dataset for the \"dangerous and disturbed\" counties\ndf_land_area <- V1_T2.7\nland_area <- df_land_area[2:396,]\nland_area <- land_area %>%\n clean_names()\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create a function to remove the \"/\", \" County\" from label, and change the label to UPPERCASE\n\nclean_county_names <- function(dataframe, column_name) {\n dataframe[[column_name]] <- toupper(gsub(\"/\", \" \", gsub(\" County\", \"\", dataframe[[column_name]])))\n return(dataframe)\n}\n\nland_area <- clean_county_names(land_area, 'county')\nland_area <- clean_county_names(land_area, 'sub_county')\n\n# The code above does the processes listed below:\n\n# land_area$county <- gsub(\"/\", \" \", land_area$county)\n# land_area$county <- gsub(\" County\", \"\", land_area$county)\n# land_area$county <- toupper(land_area$county)\n# land_area$sub_county <- toupper(land_area$sub_county)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Obtain the area data for \"disturbed and dangerous\" counties\nland_area_county <- land_area %>%\n filter(admin_area == \"County\") %>%\n select(county, land_area_in_sq_km) %>%\n filter(county %in% dan_dist)\n\n# Get the subcounty area data for \"disturbed and dangerous\" counties\nland_area_subcounty <- land_area %>%\n filter(admin_area == \"SubCounty\") %>%\n select(county, sub_county, land_area_in_sq_km) %>%\n filter(county %in% dan_dist) %>%\n select(-county)\n```\n:::\n\n\n## d) Create the final datasets to be used for analysis. Use inner_join() and creating new variables.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create a county dataset with area and livestock numbers for the disturbed and dangerous regions\n\nlivestock_area_county <- inner_join(livestock_select_county, land_area_county, by = \"county\")\n\n# New variables listed below include:\n# ind_cattle_area is the number of indigenous cattle per area_in_sq_km\n# goats_area is the number of goats per household per area_in_sq_km\n# sheep_area is the number of sheep per area_in_sq_km\n# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km\n\nlivestock_area_county <- livestock_area_county %>%\n mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),\n sheep_area = round(sheep/land_area_in_sq_km),\n goats_area = round(goats/land_area_in_sq_km),\n pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))\n\n# Create a subcounty dataset with area and livestock numbers\n# for the disturbed and dangerous regions\n\nlivestock_area_subcounty <- inner_join(livestock_select_subcounty, land_area_subcounty, by = \"sub_county\")\n\n# New variables listed below include:\n# ind_cattle_area is the number of indigenous cattle per area_in_sq_km\n# goats_area is the number of goats per household per area_in_sq_km\n# sheep_area is the number of sheep per area_in_sq_km\n# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km\n\nlivestock_area_subcounty <- livestock_area_subcounty %>%\n mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),\n sheep_area = round(sheep/land_area_in_sq_km),\n goats_area = round(goats/land_area_in_sq_km),\n pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))\n```\n:::\n\n\n# Section 4: Create a table with land area (sq. km) for the six counties\n\n**These six counties cover approximately one-fifth (1/5) of Kenya**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_county %>%\n select(county, land_area_in_sq_km) %>%\n mutate(county = str_to_title(county)) %>%\n arrange(desc(land_area_in_sq_km)) %>%\n adorn_totals(\"row\") %>%\n rename(\"County\" = \"county\",\n \"Land Area (sq. km)\" = \"land_area_in_sq_km\") %>%\n kbl(align = \"c\") %>%\n kable_classic() %>% \n row_spec(row = 0, font_size = 21, color = \"white\", background = \"#000000\") %>%\n row_spec(row = c(1:7), font_size = 15) %>%\n row_spec(row = 6, extra_css = \"border-bottom: 1px solid;\") %>%\n row_spec(row = 7, bold = T)\n```\n\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
County Land Area (sq. km)
Turkana 68232.9
Samburu 21065.1
Baringo 10976.4
Laikipia 9532.2
West Pokot 9123.2
Elgeyo Marakwet 3032.0
Total 121961.8
\n\n`````\n:::\n:::\n\n\n# Section 5: Perform an exploratory data analysis to gain key insights about the data\n\n## a) Number of Farming Households at the County Level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlac_1 <- livestock_area_county %>%\n ggplot() + \n geom_col(aes(x= reorder(county, farming), y = farming, fill = county)) +\n geom_text(aes(x = county, y = 0, label = county),\n hjust = 0, nudge_y = 0.25) +\n geom_text(aes(x = county, y = farming, label = farming),\n hjust = 1, nudge_y = 15000) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() + \n labs(x = \"County\",\n y = \"Number of Farming Households\",\n title = \"\",\n subtitle = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma) +\n theme(axis.title.x =element_blank(), \n axis.title.y =element_blank(),\n axis.text.x = element_blank(),\n axis.text.y = element_blank(),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_blank(),\n legend.position = \"none\",\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"))\n\n# Create a patchwork plot with the map and the bar graph\n\nlac_1 + p2\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-13-1.png){width=672}\n:::\n:::\n\n\n## b) Number of Farming Households at the Subcounty Level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_subcounty %>%\n ggplot() + \n geom_col(aes(x= reorder(sub_county, farming), y = farming, fill = county), width = 0.95) + geom_text(aes(x = sub_county, y = farming, label = farming),\n hjust = 1, nudge_y = 2500, size = 4) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() +\n guides(fill = guide_legend(nrow = 1)) +\n labs(x = \"County\",\n y = \"Number of Farming Households\",\n fill = \"County\",\n title = \"\",\n subtitle = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(),\n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_text(size = 10),\n legend.text=element_text(size=8),\n legend.position = \"top\")\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-14-1.png){width=864}\n:::\n:::\n\n\n## c) Number of pastoral livestock per county\n\n### Treemap\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(livestock_area_county, aes(area = pasto_livestock, fill = county, label = comma(pasto_livestock)\n )) +\n geom_treemap() +\n geom_treemap_text(colour = \"black\",\n place = \"centre\",\n size = 24) +\n scale_fill_brewer(palette = \"OrRd\") +\n labs(x = \"\",\n y = \"\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n theme(axis.title.x =element_text(size = 20),\n axis.title.y =element_text(size = 20),\n axis.text.x = element_text(size = 15),\n axis.text.y = element_text(size = 15),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 28),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n legend.title = element_blank(),\n legend.text=element_text(size=12),\n legend.position = \"bottom\") \n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-15-1.png){width=672}\n:::\n:::\n\n\n## d) Number of pastoral livestock per subcounty\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_subcounty %>%\n ggplot() + \n geom_col(aes(x= reorder(sub_county, pasto_livestock), y = pasto_livestock, fill = county), width = 0.95) + \n geom_text(aes(x = sub_county, y = pasto_livestock, label = pasto_livestock),\n hjust = 1, nudge_y = 125000) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() + \n guides(fill = guide_legend(nrow = 1)) +\n labs(x = \"Subcounty\",\n y = \"Number of Pastoral Livestock\",\n fill = \"County\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(),\n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_text(size = 10),\n legend.text=element_text(size=8),\n legend.position = \"top\")\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-16-1.png){width=864}\n:::\n:::\n\n\n## e) Number of pastoral livestock per household at the county level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlac_2 <- livestock_area_county %>%\n ggplot() + \n geom_col(aes(x= reorder(county, pasto_livestock_household), y = pasto_livestock_household, fill = county)) + \ngeom_text(aes(x = county, y = pasto_livestock_household, label = pasto_livestock_household), hjust = 1, nudge_y = 5) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() + \n labs(x = \"County\",\n y = \"Number of Pastoral Livestock per household\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(), \n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_blank(),\n legend.position = \"none\",\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"))\n\n# Create a patchwork plot with the map and the bar graph\n\nlac_2 + p2\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-17-1.png){width=672}\n:::\n:::\n\n\n## f) Number of pastoral livestock per household at the subcounty level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_subcounty %>%\n ggplot() + \n geom_col(aes(x= reorder(sub_county, pasto_livestock_household), y = pasto_livestock_household, fill = county), width = 0.95) + \n scale_fill_brewer(palette = \"OrRd\") +\n geom_text(aes(x = sub_county, y = pasto_livestock_household, label = pasto_livestock_household),\n hjust = 1, nudge_y = 5) +\n coord_flip() + \n guides(fill = guide_legend(nrow = 1)) +\n labs(x = \"Subcounty\",\n y = \"Number of Pastoral Livestock per household\",\n fill = \"County\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(),\n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_text(size = 10),\n legend.text=element_text(size=8),\n legend.position = \"top\")\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-18-1.png){width=864}\n:::\n:::\n\n# Section 5: Conclusion\n\nIn this post, I have assessed the pastoral livestock (indigenous cattle, sheep, and goats) populations in the Kenyan counties described as “disturbed and dangerous.” A major contributor to this classification is the banditry, livestock theft, and limited amounts of pasture and water available to livestock owners. To get a better sense of the number of households engaged in farming and the pastoral livestock populations in these counties, I performed an exploratory data analysis and visualized my results.\n\nKey findings from the study were that:\n1. Turkana and Samburu had some of the highest numbers of pastoral livestock, yet they had the fewest numbers of households engaged in farming. This meant that both these counties had the highest average livestock to farming household ratios in the region (Turkana: 55 per household and Samburu: 36 per household).\n2. The top three (3) subcounties with the highest average ratios of pastoral livestock to farming households were in Turkana. They included Turkana East (126), Kibish (96), and Turkana West (56). Surprisingly, counties such as Keiyo North (4), Koibatek (4), and Nyahururu (4) had very low ratios of pastoral livestock to farming household ratios despite having relatively high numbers of farming households. This may have resulted from the unsuitability of the land for grazing animals, small average land sizes per farming household, a switch to exotic dairy and beef livestock, or simply, a preference for crop, rather than livestock farming.\n\nIn the next analysis, I will assess the livestock numbers per area (square kilometers), the numbers of indigenous cattle, sheep, and goats in every county and subcounty, as well as the other animal husbandry and/or crop farming activities that take place in the region. The reader is encouraged to use this code and data package (rKenyaCensus) to come up with their own analyses to share with the world. \n\n", "supporting": [ "danger_disturb_files" ], diff --git a/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/execute-results/html.json b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/execute-results/html.json new file mode 100644 index 0000000..ed020ef --- /dev/null +++ b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/execute-results/html.json @@ -0,0 +1,20 @@ +{ + "hash": "ec849772692d679ca5ae144a08b86ef2", + "result": { + "markdown": "---\ntitle: \"Code Along With Me (Episode 1)\"\nsubtitle: \"An assessment of the livestock numbers in the six counties declared to be 'disturbed and dangerous' in Kenya\"\nauthor: \"William Okech\"\ndate: \"2023-11-28\"\nimage: \"images/code_along_with_me_cover.png\"\ncategories: [RStudio, R, Tutorial, Blog]\ntoc: true\nformat: \n html:\n code-fold: show\n code-overflow: wrap\n warning: false\n---\n\n\n![*Image created using \"Bing Image Creator\" with the prompt keywords \"cows, sheep, indigenous cattle, running, dry savanna, river bed, traditional herdsman, nature photography, --ar 5:4 --style raw\"*](images/running_livestock.jpeg){fig-align=\"center\" width=\"80%\" fig-alt=\"Pastoral livestock running through a dry savanna followed by a traditional herder on horseback\"}\n\n# Introduction\n\nIn February 2023, the government of Kenya described six counties as \"disturbed\" and \"dangerous.\" This is because in the preceding six months, over 100 civilians and 16 police officers have lost their lives, as criminals engaged in banditry have terrorized the rural villages. One of the main causes of the conflict is livestock theft, therefore, my goal is to perform an exploratory data analysis of livestock numbers from the Kenya Population and Housing Census (2019) report.\n\nReferences\n\n1. [Nation Media News Brief](https://nation.africa/kenya/news/killing-fields-kdf-police-versus-bandits-who-will-prevail--4122912)\n2. [Citizen TV Summary](https://www.youtube.com/watch?v=nOqzHeeSS2A)\n\n# Section 1: Load all the required libraries\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse) # a collection of packages used to model, transform, and visualize data\nlibrary(rKenyaCensus) # tidy datasets obtained from the Kenya Population and Housing Census results\nlibrary(patchwork) # combine separate ggplots into the same graphic\nlibrary(janitor) # initial data exploration and cleaning for a new data set\nlibrary(ggrepel)# repel overlapping text labels\nlibrary(ggthemes) # Extra Themes, Scales and Geoms for 'ggplot2'\nlibrary(scales) # tools to override the default breaks, labels, transformations and palettes\n# install.packages(\"treemapify\")\nlibrary(treemapify) # allows the creation of treemaps in ggplot2\n\nlibrary(sf) # simple features, a method to encode spatial vector data\n#install.packages(\"devtools\")\nlibrary(devtools) # helps to install packages not on CRAN\n#devtools::install_github(\"yutannihilation/ggsflabel\")\nlibrary(ggsflabel) # place labels on maps\n\nlibrary(knitr) # a tool for dynamic report generation in R\n#install.packages(\"kableExtra\")\nlibrary(kableExtra) # build common complex tables and manipulate table styles\n```\n:::\n\n\nNote: If you have package loading issues check the timeout with getOption('timeout') and use options(timeout = ) to increase package loading time.\n\n# Section 2: Create a map of the \"dangerous and disturbed\" counties\n\nThe rKenyaCensus package includes a built-in county boundaries dataset to facilitate mapping of the various indicators in the Census. The required shapefile for this analysis is KenyaCounties_SHP\n\n## a) Sample plot of the map of Kenya\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load the shapefile\nkenya_counties_sf <- st_as_sf(KenyaCounties_SHP)\n\n# Plot the map of Kenya\np0 <- ggplot(kenya_counties_sf) + \n geom_sf(fill = \"bisque2\", linewidth = 0.6, color = \"black\") + \n theme_void()\np0\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-2-1.png){width=672}\n:::\n:::\n\n\n## b) Highlight the dangerous and disturbed counties in Kenya\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# First, remove the \"/\" from the county names\n\nkenya_counties_sf$County <- gsub(\"/\", \n \" \", \n kenya_counties_sf$County)\n\n# Select the six counties to highlight\nhighlight_counties <- c(\"TURKANA\", \"WEST POKOT\", \"ELGEYO MARAKWET\", \"BARINGO\", \"LAIKIPIA\", \"SAMBURU\")\n\n# Filter the counties dataset to only include the highlighted counties\nhighlighted <- kenya_counties_sf %>% filter(County %in% highlight_counties)\n\n# Plot the highlighted counties in the map\np1 <- ggplot() + \n geom_sf(data = kenya_counties_sf, fill = \"bisque2\", linewidth = 0.6, color = \"black\") + \n geom_sf(data = highlighted, fill = \"chocolate4\", linewidth = 0.8, color = \"black\") +\n theme_void()\np1\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-3-1.png){width=672}\n:::\n:::\n\n\n## c) Plot only the required counties\n\n\n::: {.cell}\n\n```{.r .cell-code}\np2 <- ggplot(data = highlighted) +\n geom_sf(aes(fill = County), linewidth = 1, show.legend = FALSE) +\n geom_label_repel(aes(label = County, geometry = geometry), size = 3,\n stat = \"sf_coordinates\",\n force=10, # force of repulsion between overlapping text labels\n seed = 1,\n segment.size = 0.75,\n min.segment.length=0) +\n scale_fill_brewer(palette = \"OrRd\") +\n labs(title = \"\",\n caption = \"\") +\n theme(plot.title = element_text(family = \"Helvetica\",size = 10, hjust = 0.5),\n legend.title = element_blank(),\n legend.position = \"none\",\n plot.caption = element_text(family = \"Helvetica\",size = 12)) +\n theme_void() \np2\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n\n```{.r .cell-code}\n# Notes: geom_label_repel() with geometry and stat defined can be used as an \n# alternative to geom_sf_label_repel()\n```\n:::\n\n\n## d) Combine the plots using patchwork to clearly highlight the counties of interest\n\n\n::: {.cell}\n\n```{.r .cell-code}\np1 + \n p2 + \n plot_annotation(title = \"\",\n subtitle = \"\",\n caption = \"\",\n theme = theme(plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 25),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\")))\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n\n# Section 3: Load the livestock data from the census report and generate the dataframes required for analysis.\n\n## a) View the data available in the data catalogue\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"DataCatalogue\")\n```\n:::\n\n\n## b) Load the livestock data\n\nHere, pastoral livestock are defined as sheep, goats, and indigenous cattle.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Select the livestock data from the census report\ndf_livestock <- V4_T2.24\nlivestock <- df_livestock[2:393, ]\nlivestock <- livestock %>%\n clean_names()\n\n# Remove the \"/\" from the county names in the dataset\nlivestock$county <- gsub(\"/\", \" \", livestock$county)\nlivestock$sub_county <- gsub(\"/\", \" \", livestock$sub_county)\n\n# Select the variables of interest from the dataset\n# These include the county, subcounty, land area, number of farming households, \n# sheep, goats, and indigenous cattle.\n\n# New variables listed below include:\n# pasto_livestock is the total number of sheep, goats, and indigenous cattle\n# ind_cattle_household is the number of indigenous cattle per household\n# goats_household is the number of goats per household\n# sheep_household is the number of sheep per household\n# pasto_livestock_household is the number of pastoral livestock per household\n\nlivestock_select <- livestock %>%\n select(county, sub_county, admin_area, farming, sheep, goats, indigenous_cattle) %>%\n mutate(pasto_livestock = sheep + goats + indigenous_cattle) %>% \n mutate(ind_cattle_household = round(indigenous_cattle/farming)) %>%\n mutate(goats_household = round(goats/farming)) %>%\n mutate(sheep_household = round(sheep/farming)) %>%\n mutate(pasto_livestock_household = round(pasto_livestock/farming))\n```\n:::\n\n\n## c) Filter data for the selected \"disturbed and dangerous\" counties\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Select the data for the \"dangerous and disturbed\" counties\ndan_dist <- c(\"TURKANA\", \"WEST POKOT\", \"ELGEYO MARAKWET\", \"BARINGO\", \"LAIKIPIA\", \"SAMBURU\")\n\nlivestock_select_county <- livestock_select %>%\n filter(admin_area == \"County\") %>%\n filter(county %in% dan_dist)\n\n# Select subcounty data for the \"disturbed and dangerous\" counties\nlivestock_select_subcounty <- livestock_select %>%\n filter(admin_area == \"SubCounty\") %>%\n filter(county %in% dan_dist)\n\n# Create an area dataset for the \"dangerous and disturbed\" counties\ndf_land_area <- V1_T2.7\nland_area <- df_land_area[2:396,]\nland_area <- land_area %>%\n clean_names()\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create a function to remove the \"/\", \" County\" from label, and change the label to UPPERCASE\n\nclean_county_names <- function(dataframe, column_name) {\n dataframe[[column_name]] <- toupper(gsub(\"/\", \" \", gsub(\" County\", \"\", dataframe[[column_name]])))\n return(dataframe)\n}\n\nland_area <- clean_county_names(land_area, 'county')\nland_area <- clean_county_names(land_area, 'sub_county')\n\n# The code above does the processes listed below:\n\n# land_area$county <- gsub(\"/\", \" \", land_area$county)\n# land_area$county <- gsub(\" County\", \"\", land_area$county)\n# land_area$county <- toupper(land_area$county)\n# land_area$sub_county <- toupper(land_area$sub_county)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Obtain the area data for \"disturbed and dangerous\" counties\nland_area_county <- land_area %>%\n filter(admin_area == \"County\") %>%\n select(county, land_area_in_sq_km) %>%\n filter(county %in% dan_dist)\n\n# Get the subcounty area data for \"disturbed and dangerous\" counties\nland_area_subcounty <- land_area %>%\n filter(admin_area == \"SubCounty\") %>%\n select(county, sub_county, land_area_in_sq_km) %>%\n filter(county %in% dan_dist) %>%\n select(-county)\n```\n:::\n\n\n## d) Create the final datasets to be used for analysis. Use inner_join() and creating new variables.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create a county dataset with area and livestock numbers for the disturbed and dangerous regions\n\nlivestock_area_county <- inner_join(livestock_select_county, land_area_county, by = \"county\")\n\n# New variables listed below include:\n# ind_cattle_area is the number of indigenous cattle per area_in_sq_km\n# goats_area is the number of goats per household per area_in_sq_km\n# sheep_area is the number of sheep per area_in_sq_km\n# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km\n\nlivestock_area_county <- livestock_area_county %>%\n mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),\n sheep_area = round(sheep/land_area_in_sq_km),\n goats_area = round(goats/land_area_in_sq_km),\n pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))\n\n# Create a subcounty dataset with area and livestock numbers\n# for the disturbed and dangerous regions\n\nlivestock_area_subcounty <- inner_join(livestock_select_subcounty, land_area_subcounty, by = \"sub_county\")\n\n# New variables listed below include:\n# ind_cattle_area is the number of indigenous cattle per area_in_sq_km\n# goats_area is the number of goats per household per area_in_sq_km\n# sheep_area is the number of sheep per area_in_sq_km\n# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km\n\nlivestock_area_subcounty <- livestock_area_subcounty %>%\n mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),\n sheep_area = round(sheep/land_area_in_sq_km),\n goats_area = round(goats/land_area_in_sq_km),\n pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))\n```\n:::\n\n\n# Section 4: Create a table with land area (sq. km) for the six counties\n\n**These six counties cover approximately one-fifth (1/5) of Kenya**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_county %>%\n select(county, land_area_in_sq_km) %>%\n mutate(county = str_to_title(county)) %>%\n arrange(desc(land_area_in_sq_km)) %>%\n adorn_totals(\"row\") %>%\n rename(\"County\" = \"county\",\n \"Land Area (sq. km)\" = \"land_area_in_sq_km\") %>%\n kbl(align = \"c\") %>%\n kable_classic() %>% \n row_spec(row = 0, font_size = 21, color = \"white\", background = \"#000000\") %>%\n row_spec(row = c(1:7), font_size = 15) %>%\n row_spec(row = 6, extra_css = \"border-bottom: 1px solid;\") %>%\n row_spec(row = 7, bold = T)\n```\n\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
County Land Area (sq. km)
Turkana 68232.9
Samburu 21065.1
Baringo 10976.4
Laikipia 9532.2
West Pokot 9123.2
Elgeyo Marakwet 3032.0
Total 121961.8
\n\n`````\n:::\n:::\n\n\n# Section 5: Perform an exploratory data analysis to gain key insights about the data\n\n## a) Number of Farming Households at the County Level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlac_1 <- livestock_area_county %>%\n ggplot() + \n geom_col(aes(x= reorder(county, farming), y = farming, fill = county)) +\n geom_text(aes(x = county, y = 0, label = county),\n hjust = 0, nudge_y = 0.25) +\n geom_text(aes(x = county, y = farming, label = farming),\n hjust = 1, nudge_y = 15000) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() + \n labs(x = \"County\",\n y = \"Number of Farming Households\",\n title = \"\",\n subtitle = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma) +\n theme(axis.title.x =element_blank(), \n axis.title.y =element_blank(),\n axis.text.x = element_blank(),\n axis.text.y = element_blank(),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_blank(),\n legend.position = \"none\",\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"))\n\n# Create a patchwork plot with the map and the bar graph\n\nlac_1 + p2\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-13-1.png){width=672}\n:::\n:::\n\n\n## b) Number of Farming Households at the Subcounty Level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_subcounty %>%\n ggplot() + \n geom_col(aes(x= reorder(sub_county, farming), y = farming, fill = county), width = 0.95) + geom_text(aes(x = sub_county, y = farming, label = farming),\n hjust = 1, nudge_y = 2500, size = 4) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() +\n guides(fill = guide_legend(nrow = 1)) +\n labs(x = \"County\",\n y = \"Number of Farming Households\",\n fill = \"County\",\n title = \"\",\n subtitle = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(),\n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_text(size = 10),\n legend.text=element_text(size=8),\n legend.position = \"top\")\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-14-1.png){width=864}\n:::\n:::\n\n\n## c) Number of pastoral livestock per county\n\n### Treemap\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(livestock_area_county, aes(area = pasto_livestock, fill = county, label = comma(pasto_livestock)\n )) +\n geom_treemap() +\n geom_treemap_text(colour = \"black\",\n place = \"centre\",\n size = 24) +\n scale_fill_brewer(palette = \"OrRd\") +\n labs(x = \"\",\n y = \"\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n theme(axis.title.x =element_text(size = 20),\n axis.title.y =element_text(size = 20),\n axis.text.x = element_text(size = 15),\n axis.text.y = element_text(size = 15),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 28),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n legend.title = element_blank(),\n legend.text=element_text(size=12),\n legend.position = \"bottom\") \n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-15-1.png){width=672}\n:::\n:::\n\n\n## d) Number of pastoral livestock per subcounty\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_subcounty %>%\n ggplot() + \n geom_col(aes(x= reorder(sub_county, pasto_livestock), y = pasto_livestock, fill = county), width = 0.95) + \n geom_text(aes(x = sub_county, y = pasto_livestock, label = pasto_livestock),\n hjust = 1, nudge_y = 125000) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() + \n guides(fill = guide_legend(nrow = 1)) +\n labs(x = \"Subcounty\",\n y = \"Number of Pastoral Livestock\",\n fill = \"County\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(),\n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_text(size = 10),\n legend.text=element_text(size=8),\n legend.position = \"top\")\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-16-1.png){width=864}\n:::\n:::\n\n\n## e) Number of pastoral livestock per household at the county level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlac_2 <- livestock_area_county %>%\n ggplot() + \n geom_col(aes(x= reorder(county, pasto_livestock_household), y = pasto_livestock_household, fill = county)) + \ngeom_text(aes(x = county, y = pasto_livestock_household, label = pasto_livestock_household), hjust = 1, nudge_y = 5) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() + \n labs(x = \"County\",\n y = \"Number of Pastoral Livestock per household\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(), \n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_blank(),\n legend.position = \"none\",\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"))\n\n# Create a patchwork plot with the map and the bar graph\n\nlac_2 + p2\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-17-1.png){width=672}\n:::\n:::\n\n\n## f) Number of pastoral livestock per household at the subcounty level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_subcounty %>%\n ggplot() + \n geom_col(aes(x= reorder(sub_county, pasto_livestock_household), y = pasto_livestock_household, fill = county), width = 0.95) + \n scale_fill_brewer(palette = \"OrRd\") +\n geom_text(aes(x = sub_county, y = pasto_livestock_household, label = pasto_livestock_household),\n hjust = 1, nudge_y = 5) +\n coord_flip() + \n guides(fill = guide_legend(nrow = 1)) +\n labs(x = \"Subcounty\",\n y = \"Number of Pastoral Livestock per household\",\n fill = \"County\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(),\n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_text(size = 10),\n legend.text=element_text(size=8),\n legend.position = \"top\")\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-18-1.png){width=864}\n:::\n:::\n\n\n# Section 5: Conclusion\n\nIn this post, I have assessed the pastoral livestock (indigenous cattle, sheep, and goats) populations in the Kenyan counties described as \"disturbed and dangerous.\" A major contributor to this classification is the banditry, livestock theft, and limited amounts of pasture and water available to livestock owners. To get a better sense of the number of households engaged in farming and the pastoral livestock populations in these counties, I performed an exploratory data analysis and visualized my results.\n\nKey findings from the study were that: 1. Turkana and Samburu had some of the highest numbers of pastoral livestock, yet they had the fewest numbers of households engaged in farming. This meant that both these counties had the highest average livestock to farming household ratios in the region (Turkana: 55 per household and Samburu: 36 per household). 2. The top three (3) subcounties with the highest average ratios of pastoral livestock to farming households were in Turkana. They included Turkana East (126), Kibish (96), and Turkana West (56). Surprisingly, counties such as Keiyo North (4), Koibatek (4), and Nyahururu (4) had very low ratios of pastoral livestock to farming household ratios despite having relatively high numbers of farming households. This may have resulted from the unsuitability of the land for grazing animals, small average land sizes per farming household, a switch to exotic dairy and beef livestock, or simply, a preference for crop, rather than livestock farming.\n\nIn the next analysis, I will assess the livestock numbers per area (square kilometers), the numbers of indigenous cattle, sheep, and goats in every county and subcounty, as well as the other animal husbandry and/or crop farming activities that take place in the region. The reader is encouraged to use this code and data package (rKenyaCensus) to come up with their own analyses to share with the world.\n", + "supporting": [ + "danger_disturb_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\r\n\r\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-13-1.png b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-13-1.png new file mode 100644 index 0000000..65c959e Binary files /dev/null and b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-13-1.png differ diff --git a/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-14-1.png b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-14-1.png new file mode 100644 index 0000000..641ac7a Binary files /dev/null and b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-14-1.png differ diff --git a/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-15-1.png b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-15-1.png new file mode 100644 index 0000000..159506c Binary files /dev/null and b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-15-1.png differ diff --git a/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-16-1.png b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-16-1.png new file mode 100644 index 0000000..61a5dc2 Binary files /dev/null and b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-16-1.png differ diff --git a/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-17-1.png b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-17-1.png new file mode 100644 index 0000000..d74d10f Binary files /dev/null and b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-17-1.png differ diff --git a/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-18-1.png b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-18-1.png new file mode 100644 index 0000000..db9b123 Binary files /dev/null and b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-18-1.png differ diff --git a/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-2-1.png b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-2-1.png new file mode 100644 index 0000000..aba528e Binary files /dev/null and b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-2-1.png differ diff --git a/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-3-1.png b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-3-1.png new file mode 100644 index 0000000..ea83c2e Binary files /dev/null and b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-3-1.png differ diff --git a/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-4-1.png b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-4-1.png new file mode 100644 index 0000000..90a5d22 Binary files /dev/null and b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-4-1.png differ diff --git a/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-5-1.png b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-5-1.png new file mode 100644 index 0000000..714f92c Binary files /dev/null and b/_freeze/posts/code_along_with_me/danger_disturb/danger_disturb/figure-html/unnamed-chunk-5-1.png differ diff --git a/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/execute-results/html.json b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/execute-results/html.json new file mode 100644 index 0000000..d456cd7 --- /dev/null +++ b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/execute-results/html.json @@ -0,0 +1,20 @@ +{ + "hash": "9d0ebbcadd9dd88c58f4deff83b5d098", + "result": { + "markdown": "---\ntitle: \"Code Along With Me (Episode 1)\"\nsubtitle: \"An assessment of the livestock numbers in the six counties declared to be 'disturbed and dangerous' in Kenya\"\nauthor: \"William Okech\"\ndate: \"2023-11-28\"\nimage: \"\"\ncategories: [RStudio, R, Tutorial, Blog]\ntoc: true\nformat: \n html:\n code-fold: show\n code-overflow: wrap\n warning: false\n---\n\n\n![*Image created using \"Bing Image Creator\" with the prompt keywords \"cows, sheep, indigenous cattle, running, dry savanna, river bed, traditional herdsman, nature photography, --ar 5:4 --style raw\"*](){fig-align=\"center\" width=\"80%\" fig-alt=\"Pastoral livestock running through a dry savanna followed by a traditional herder on horseback\"}\n\n# Introduction\n\nIn February 2023, the government of Kenya described six counties as \"disturbed\" and \"dangerous.\" This is because in the preceding six months, over 100 civilians and 16 police officers have lost their lives, as criminals engaged in banditry have terrorized the rural villages. One of the main causes of the conflict is livestock theft, therefore, my goal is to perform an exploratory data analysis of livestock numbers from the Kenya Population and Housing Census (2019) report.\n\nReferences\n\n1. [Nation Media News Brief](https://nation.africa/kenya/news/killing-fields-kdf-police-versus-bandits-who-will-prevail--4122912)\n2. [Citizen TV Summary](https://www.youtube.com/watch?v=nOqzHeeSS2A)\n\n# Section 1: Load all the required libraries\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse) # a collection of packages used to model, transform, and visualize data\nlibrary(rKenyaCensus) # tidy datasets obtained from the Kenya Population and Housing Census results\nlibrary(patchwork) # combine separate ggplots into the same graphic\nlibrary(janitor) # initial data exploration and cleaning for a new data set\nlibrary(ggrepel)# repel overlapping text labels\nlibrary(ggthemes) # Extra Themes, Scales and Geoms for 'ggplot2'\nlibrary(scales) # tools to override the default breaks, labels, transformations and palettes\n# install.packages(\"treemapify\")\nlibrary(treemapify) # allows the creation of treemaps in ggplot2\n\nlibrary(sf) # simple features, a method to encode spatial vector data\n#install.packages(\"devtools\")\nlibrary(devtools) # helps to install packages not on CRAN\n#devtools::install_github(\"yutannihilation/ggsflabel\")\nlibrary(ggsflabel) # place labels on maps\n\nlibrary(knitr) # a tool for dynamic report generation in R\n#install.packages(\"kableExtra\")\nlibrary(kableExtra) # build common complex tables and manipulate table styles\n```\n:::\n\n\nNote: If you have package loading issues check the timeout with getOption('timeout') and use options(timeout = ) to increase package loading time.\n\n# Section 2: Create a map of the \"dangerous and disturbed\" counties\n\nThe rKenyaCensus package includes a built-in county boundaries dataset to facilitate mapping of the various indicators in the Census. The required shapefile for this analysis is KenyaCounties_SHP\n\n## a) Sample plot of the map of Kenya\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load the shapefile\nkenya_counties_sf <- st_as_sf(KenyaCounties_SHP)\n\n# Plot the map of Kenya\np0 <- ggplot(kenya_counties_sf) + \n geom_sf(fill = \"bisque2\", linewidth = 0.6, color = \"black\") + \n theme_void()\np0\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-2-1.png){width=672}\n:::\n:::\n\n\n## b) Highlight the dangerous and disturbed counties in Kenya\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# First, remove the \"/\" from the county names\n\nkenya_counties_sf$County <- gsub(\"/\", \n \" \", \n kenya_counties_sf$County)\n\n# Select the six counties to highlight\nhighlight_counties <- c(\"TURKANA\", \"WEST POKOT\", \"ELGEYO MARAKWET\", \"BARINGO\", \"LAIKIPIA\", \"SAMBURU\")\n\n# Filter the counties dataset to only include the highlighted counties\nhighlighted <- kenya_counties_sf %>% filter(County %in% highlight_counties)\n\n# Plot the highlighted counties in the map\np1 <- ggplot() + \n geom_sf(data = kenya_counties_sf, fill = \"bisque2\", linewidth = 0.6, color = \"black\") + \n geom_sf(data = highlighted, fill = \"chocolate4\", linewidth = 0.8, color = \"black\") +\n theme_void()\np1\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-3-1.png){width=672}\n:::\n:::\n\n\n## c) Plot only the required counties\n\n\n::: {.cell}\n\n```{.r .cell-code}\np2 <- ggplot(data = highlighted) +\n geom_sf(aes(fill = County), linewidth = 1, show.legend = FALSE) +\n geom_label_repel(aes(label = County, geometry = geometry), size = 3,\n stat = \"sf_coordinates\",\n force=10, # force of repulsion between overlapping text labels\n seed = 1,\n segment.size = 0.75,\n min.segment.length=0) +\n scale_fill_brewer(palette = \"OrRd\") +\n labs(title = \"\",\n caption = \"\") +\n theme(plot.title = element_text(family = \"Helvetica\",size = 10, hjust = 0.5),\n legend.title = element_blank(),\n legend.position = \"none\",\n plot.caption = element_text(family = \"Helvetica\",size = 12)) +\n theme_void() \np2\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n\n```{.r .cell-code}\n# Notes: geom_label_repel() with geometry and stat defined can be used as an \n# alternative to geom_sf_label_repel()\n```\n:::\n\n\n## d) Combine the plots using patchwork to clearly highlight the counties of interest\n\n\n::: {.cell}\n\n```{.r .cell-code}\np1 + \n p2 + \n plot_annotation(title = \"\",\n subtitle = \"\",\n caption = \"\",\n theme = theme(plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 25),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\")))\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n\n# Section 3: Load the livestock data from the census report and generate the dataframes required for analysis.\n\n## a) View the data available in the data catalogue\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"DataCatalogue\")\n```\n:::\n\n\n## b) Load the livestock data\n\nHere, pastoral livestock are defined as sheep, goats, and indigenous cattle.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Select the livestock data from the census report\ndf_livestock <- V4_T2.24\nlivestock <- df_livestock[2:393, ]\nlivestock <- livestock %>%\n clean_names()\n\n# Remove the \"/\" from the county names in the dataset\nlivestock$county <- gsub(\"/\", \" \", livestock$county)\nlivestock$sub_county <- gsub(\"/\", \" \", livestock$sub_county)\n\n# Select the variables of interest from the dataset\n# These include the county, subcounty, land area, number of farming households, \n# sheep, goats, and indigenous cattle.\n\n# New variables listed below include:\n# pasto_livestock is the total number of sheep, goats, and indigenous cattle\n# ind_cattle_household is the number of indigenous cattle per household\n# goats_household is the number of goats per household\n# sheep_household is the number of sheep per household\n# pasto_livestock_household is the number of pastoral livestock per household\n\nlivestock_select <- livestock %>%\n select(county, sub_county, admin_area, farming, sheep, goats, indigenous_cattle) %>%\n mutate(pasto_livestock = sheep + goats + indigenous_cattle) %>% \n mutate(ind_cattle_household = round(indigenous_cattle/farming)) %>%\n mutate(goats_household = round(goats/farming)) %>%\n mutate(sheep_household = round(sheep/farming)) %>%\n mutate(pasto_livestock_household = round(pasto_livestock/farming))\n```\n:::\n\n\n## c) Filter data for the selected \"disturbed and dangerous\" counties\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Select the data for the \"dangerous and disturbed\" counties\ndan_dist <- c(\"TURKANA\", \"WEST POKOT\", \"ELGEYO MARAKWET\", \"BARINGO\", \"LAIKIPIA\", \"SAMBURU\")\n\nlivestock_select_county <- livestock_select %>%\n filter(admin_area == \"County\") %>%\n filter(county %in% dan_dist)\n\n# Select subcounty data for the \"disturbed and dangerous\" counties\nlivestock_select_subcounty <- livestock_select %>%\n filter(admin_area == \"SubCounty\") %>%\n filter(county %in% dan_dist)\n\n# Create an area dataset for the \"dangerous and disturbed\" counties\ndf_land_area <- V1_T2.7\nland_area <- df_land_area[2:396,]\nland_area <- land_area %>%\n clean_names()\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create a function to remove the \"/\", \" County\" from label, and change the label to UPPERCASE\n\nclean_county_names <- function(dataframe, column_name) {\n dataframe[[column_name]] <- toupper(gsub(\"/\", \" \", gsub(\" County\", \"\", dataframe[[column_name]])))\n return(dataframe)\n}\n\nland_area <- clean_county_names(land_area, 'county')\nland_area <- clean_county_names(land_area, 'sub_county')\n\n# The code above does the processes listed below:\n\n# land_area$county <- gsub(\"/\", \" \", land_area$county)\n# land_area$county <- gsub(\" County\", \"\", land_area$county)\n# land_area$county <- toupper(land_area$county)\n# land_area$sub_county <- toupper(land_area$sub_county)\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Obtain the area data for \"disturbed and dangerous\" counties\nland_area_county <- land_area %>%\n filter(admin_area == \"County\") %>%\n select(county, land_area_in_sq_km) %>%\n filter(county %in% dan_dist)\n\n# Get the subcounty area data for \"disturbed and dangerous\" counties\nland_area_subcounty <- land_area %>%\n filter(admin_area == \"SubCounty\") %>%\n select(county, sub_county, land_area_in_sq_km) %>%\n filter(county %in% dan_dist) %>%\n select(-county)\n```\n:::\n\n\n## d) Create the final datasets to be used for analysis. Use inner_join() and creating new variables.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create a county dataset with area and livestock numbers for the disturbed and dangerous regions\n\nlivestock_area_county <- inner_join(livestock_select_county, land_area_county, by = \"county\")\n\n# New variables listed below include:\n# ind_cattle_area is the number of indigenous cattle per area_in_sq_km\n# goats_area is the number of goats per household per area_in_sq_km\n# sheep_area is the number of sheep per area_in_sq_km\n# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km\n\nlivestock_area_county <- livestock_area_county %>%\n mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),\n sheep_area = round(sheep/land_area_in_sq_km),\n goats_area = round(goats/land_area_in_sq_km),\n pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))\n\n# Create a subcounty dataset with area and livestock numbers\n# for the disturbed and dangerous regions\n\nlivestock_area_subcounty <- inner_join(livestock_select_subcounty, land_area_subcounty, by = \"sub_county\")\n\n# New variables listed below include:\n# ind_cattle_area is the number of indigenous cattle per area_in_sq_km\n# goats_area is the number of goats per household per area_in_sq_km\n# sheep_area is the number of sheep per area_in_sq_km\n# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km\n\nlivestock_area_subcounty <- livestock_area_subcounty %>%\n mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),\n sheep_area = round(sheep/land_area_in_sq_km),\n goats_area = round(goats/land_area_in_sq_km),\n pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))\n```\n:::\n\n\n# Section 4: Create a table with land area (sq. km) for the six counties\n\n**These six counties cover approximately one-fifth (1/5) of Kenya**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_county %>%\n select(county, land_area_in_sq_km) %>%\n mutate(county = str_to_title(county)) %>%\n arrange(desc(land_area_in_sq_km)) %>%\n adorn_totals(\"row\") %>%\n rename(\"County\" = \"county\",\n \"Land Area (sq. km)\" = \"land_area_in_sq_km\") %>%\n kbl(align = \"c\") %>%\n kable_classic() %>% \n row_spec(row = 0, font_size = 21, color = \"white\", background = \"#000000\") %>%\n row_spec(row = c(1:7), font_size = 15) %>%\n row_spec(row = 6, extra_css = \"border-bottom: 1px solid;\") %>%\n row_spec(row = 7, bold = T)\n```\n\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
County Land Area (sq. km)
Turkana 68232.9
Samburu 21065.1
Baringo 10976.4
Laikipia 9532.2
West Pokot 9123.2
Elgeyo Marakwet 3032.0
Total 121961.8
\n\n`````\n:::\n:::\n\n\n# Section 5: Perform an exploratory data analysis to gain key insights about the data\n\n## a) Number of Farming Households at the County Level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlac_1 <- livestock_area_county %>%\n ggplot() + \n geom_col(aes(x= reorder(county, farming), y = farming, fill = county)) +\n geom_text(aes(x = county, y = 0, label = county),\n hjust = 0, nudge_y = 0.25) +\n geom_text(aes(x = county, y = farming, label = farming),\n hjust = 1, nudge_y = 15000) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() + \n labs(x = \"County\",\n y = \"Number of Farming Households\",\n title = \"\",\n subtitle = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma) +\n theme(axis.title.x =element_blank(), \n axis.title.y =element_blank(),\n axis.text.x = element_blank(),\n axis.text.y = element_blank(),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_blank(),\n legend.position = \"none\",\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"))\n\n# Create a patchwork plot with the map and the bar graph\n\nlac_1 + p2\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-13-1.png){width=672}\n:::\n:::\n\n\n## b) Number of Farming Households at the Subcounty Level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_subcounty %>%\n ggplot() + \n geom_col(aes(x= reorder(sub_county, farming), y = farming, fill = county), width = 0.95) + geom_text(aes(x = sub_county, y = farming, label = farming),\n hjust = 1, nudge_y = 2500, size = 4) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() +\n guides(fill = guide_legend(nrow = 1)) +\n labs(x = \"County\",\n y = \"Number of Farming Households\",\n fill = \"County\",\n title = \"\",\n subtitle = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(),\n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_text(size = 10),\n legend.text=element_text(size=8),\n legend.position = \"top\")\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-14-1.png){width=864}\n:::\n:::\n\n\n## c) Number of pastoral livestock per county\n\n### Treemap\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(livestock_area_county, aes(area = pasto_livestock, fill = county, label = comma(pasto_livestock)\n )) +\n geom_treemap() +\n geom_treemap_text(colour = \"black\",\n place = \"centre\",\n size = 24) +\n scale_fill_brewer(palette = \"OrRd\") +\n labs(x = \"\",\n y = \"\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n theme(axis.title.x =element_text(size = 20),\n axis.title.y =element_text(size = 20),\n axis.text.x = element_text(size = 15),\n axis.text.y = element_text(size = 15),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 28),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n legend.title = element_blank(),\n legend.text=element_text(size=12),\n legend.position = \"bottom\") \n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-15-1.png){width=672}\n:::\n:::\n\n\n## d) Number of pastoral livestock per subcounty\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_subcounty %>%\n ggplot() + \n geom_col(aes(x= reorder(sub_county, pasto_livestock), y = pasto_livestock, fill = county), width = 0.95) + \n geom_text(aes(x = sub_county, y = pasto_livestock, label = pasto_livestock),\n hjust = 1, nudge_y = 125000) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() + \n guides(fill = guide_legend(nrow = 1)) +\n labs(x = \"Subcounty\",\n y = \"Number of Pastoral Livestock\",\n fill = \"County\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(),\n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_text(size = 10),\n legend.text=element_text(size=8),\n legend.position = \"top\")\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-16-1.png){width=864}\n:::\n:::\n\n\n## e) Number of pastoral livestock per household at the county level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlac_2 <- livestock_area_county %>%\n ggplot() + \n geom_col(aes(x= reorder(county, pasto_livestock_household), y = pasto_livestock_household, fill = county)) + \ngeom_text(aes(x = county, y = pasto_livestock_household, label = pasto_livestock_household), hjust = 1, nudge_y = 5) +\n scale_fill_brewer(palette = \"OrRd\") +\n coord_flip() + \n labs(x = \"County\",\n y = \"Number of Pastoral Livestock per household\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(), \n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_blank(),\n legend.position = \"none\",\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"))\n\n# Create a patchwork plot with the map and the bar graph\n\nlac_2 + p2\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-17-1.png){width=672}\n:::\n:::\n\n\n## f) Number of pastoral livestock per household at the subcounty level\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlivestock_area_subcounty %>%\n ggplot() + \n geom_col(aes(x= reorder(sub_county, pasto_livestock_household), y = pasto_livestock_household, fill = county), width = 0.95) + \n scale_fill_brewer(palette = \"OrRd\") +\n geom_text(aes(x = sub_county, y = pasto_livestock_household, label = pasto_livestock_household),\n hjust = 1, nudge_y = 5) +\n coord_flip() + \n guides(fill = guide_legend(nrow = 1)) +\n labs(x = \"Subcounty\",\n y = \"Number of Pastoral Livestock per household\",\n fill = \"County\",\n title = \"\",\n caption = \"\") +\n theme_minimal() +\n scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +\n theme(axis.title.x =element_blank(),\n axis.title.y =element_blank(),\n axis.text.x = element_text(size = 12),\n axis.text.y = element_text(size = 12),\n plot.title = element_text(family=\"Helvetica\", face=\"bold\", size = 20),\n plot.subtitle = element_text(family=\"Helvetica\", face=\"bold\", size = 15),\n plot.caption = element_text(family = \"Helvetica\",size = 12, face = \"bold\"),\n plot.margin = unit(c(1, 1, 1, 1), \"cm\"),\n panel.grid.major = element_blank(),\n panel.grid.minor = element_blank(),\n legend.title = element_text(size = 10),\n legend.text=element_text(size=8),\n legend.position = \"top\")\n```\n\n::: {.cell-output-display}\n![](danger_disturb_files/figure-html/unnamed-chunk-18-1.png){width=864}\n:::\n:::\n\n# Section 5: Conclusion\n\nIn this post, I have assessed the pastoral livestock (indigenous cattle, sheep, and goats) populations in the Kenyan counties described as “disturbed and dangerous.” A major contributor to this classification is the banditry, livestock theft, and limited amounts of pasture and water available to livestock owners. To get a better sense of the number of households engaged in farming and the pastoral livestock populations in these counties, I performed an exploratory data analysis and visualized my results.\n\nKey findings from the study were that:\n1. Turkana and Samburu had some of the highest numbers of pastoral livestock, yet they had the fewest numbers of households engaged in farming. This meant that both these counties had the highest average livestock to farming household ratios in the region (Turkana: 55 per household and Samburu: 36 per household).\n2. The top three (3) subcounties with the highest average ratios of pastoral livestock to farming households were in Turkana. They included Turkana East (126), Kibish (96), and Turkana West (56). Surprisingly, counties such as Keiyo North (4), Koibatek (4), and Nyahururu (4) had very low ratios of pastoral livestock to farming household ratios despite having relatively high numbers of farming households. This may have resulted from the unsuitability of the land for grazing animals, small average land sizes per farming household, a switch to exotic dairy and beef livestock, or simply, a preference for crop, rather than livestock farming.\n\nIn the next analysis, I will assess the livestock numbers per area (square kilometers), the numbers of indigenous cattle, sheep, and goats in every county and subcounty, as well as the other animal husbandry and/or crop farming activities that take place in the region. The reader is encouraged to use this code and data package (rKenyaCensus) to come up with their own analyses to share with the world. \n\n", + "supporting": [ + "danger_disturb_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\r\n\r\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-13-1.png b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-13-1.png new file mode 100644 index 0000000..65c959e Binary files /dev/null and b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-13-1.png differ diff --git a/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-14-1.png b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-14-1.png new file mode 100644 index 0000000..641ac7a Binary files /dev/null and b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-14-1.png differ diff --git a/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-15-1.png b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-15-1.png new file mode 100644 index 0000000..159506c Binary files /dev/null and b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-15-1.png differ diff --git a/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-16-1.png b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-16-1.png new file mode 100644 index 0000000..61a5dc2 Binary files /dev/null and b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-16-1.png differ diff --git a/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-17-1.png b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-17-1.png new file mode 100644 index 0000000..d74d10f Binary files /dev/null and b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-17-1.png differ diff --git a/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-18-1.png b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-18-1.png new file mode 100644 index 0000000..db9b123 Binary files /dev/null and b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-18-1.png differ diff --git a/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-2-1.png b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-2-1.png new file mode 100644 index 0000000..aba528e Binary files /dev/null and b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-2-1.png differ diff --git a/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-3-1.png b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-3-1.png new file mode 100644 index 0000000..ea83c2e Binary files /dev/null and b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-3-1.png differ diff --git a/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-4-1.png b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-4-1.png new file mode 100644 index 0000000..90a5d22 Binary files /dev/null and b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-4-1.png differ diff --git a/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-5-1.png b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-5-1.png new file mode 100644 index 0000000..714f92c Binary files /dev/null and b/_freeze/posts/code_along_with_me/new_post_1/danger_disturb/figure-html/unnamed-chunk-5-1.png differ diff --git a/_freeze/posts/r_rstudio/new_post_2/post_2/execute-results/html.json b/_freeze/posts/r_rstudio/new_post_2/post_2/execute-results/html.json new file mode 100644 index 0000000..5d8342e --- /dev/null +++ b/_freeze/posts/r_rstudio/new_post_2/post_2/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "24c6f9027f439b1d4fed6b1a287db4d2", + "result": { + "markdown": "---\ntitle: \"The Basics of R and RStudio\"\nsubtitle: \"Part 1: Simple Arithmetic\"\nauthor: \"William Okech\"\ndate: \"2022-06-15\"\nimage: \"r_and_rstudio.png\"\ncategories: [RStudio, R, Tutorial, Blog]\ntoc: true\ndraft: false\n---\n\n\n## Introduction\n\nThis is the first in a series of blog posts looking at the basics of R and RStudio. These programs allow us to perform various basic and complex calculations.\n\nTo get started, first, we will open R or RStudio. In R, go to the console, and in RStudio, head to the console pane. Next, type in a basic arithmetic calculation such as \"1 + 1\" after the angle bracket (\\>) and hit \"Enter.\"\n\nAn example of a basic calculation:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n1+1\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2\n```\n:::\n:::\n\n\nThe output will be observed next to the square bracket containing the number 1 (\\[1\\]).\n\n![](r_console_1plus1.png){fig-align=\"center\" width=\"90%\"}\n\nAdditionally, to include comments into the code block we use the hash (#) symbol. Anything written after the code block will be commented out and not run.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# A simple arithmetic calculation (which is not run because of the hash symbol)\n1+1\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2\n```\n:::\n:::\n\n\n## Arithmetic operators available in R/RStudio\n\nVarious arithmetic operators (listed below) can be used in R/RStudio.\n\n| Arithmetic Operator | Description |\n|:-------------------:|:----------------------------------:|\n| \\+ | Addition |\n| \\- | Subtraction |\n| \\* | Multiplication |\n| / | Division |\n| \\*\\* or \\^ | Exponentiation |\n| %% | Modulus (remainder after division) |\n| %/% | Integer division |\n\n## Examples\n\n### Addition\n\n\n::: {.cell}\n\n```{.r .cell-code}\n10+30\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 40\n```\n:::\n:::\n\n\n### Subtraction\n\n\n::: {.cell}\n\n```{.r .cell-code}\n30-24\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 6\n```\n:::\n:::\n\n\n### Multiplication\n\n\n::: {.cell}\n\n```{.r .cell-code}\n20*4\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 80\n```\n:::\n:::\n\n\n### Division\n\n\n::: {.cell}\n\n```{.r .cell-code}\n93/4\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 23.25\n```\n:::\n:::\n\n\n### Exponentiation\n\n\n::: {.cell}\n\n```{.r .cell-code}\n3^6\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 729\n```\n:::\n:::\n\n\n### Modulus (remainder with division)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n94%%5\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 4\n```\n:::\n:::\n\n\n### Integer Division\n\n\n::: {.cell}\n\n```{.r .cell-code}\n54%/%7\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 7\n```\n:::\n:::\n\n\n### Slightly more complex arithmetic operations\n\n\n::: {.cell}\n\n```{.r .cell-code}\n5-1+(4*3)/16*3\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 6.25\n```\n:::\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/posts/r_rstudio/new_post_3/post_3/execute-results/html.json b/_freeze/posts/r_rstudio/new_post_3/post_3/execute-results/html.json new file mode 100644 index 0000000..b97514e --- /dev/null +++ b/_freeze/posts/r_rstudio/new_post_3/post_3/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "66087bd080a48fbfddc668e0ea367fa4", + "result": { + "markdown": "---\ntitle: \"The Basics of R and RStudio\"\nsubtitle: \"Part 2: Variables\"\nauthor: \"William Okech\"\ndate: \"2022-06-22\"\nimage: \"r_and_rstudio.png\"\ncategories: [RStudio, R, Tutorial, Blog]\ntoc: true\ndraft: false\n---\n\n\n## Introduction\n\nVariables are instrumental in programming because they are used as \"containers\" to store data values.\n\nTo assign a value to a variable, we can use `<−` or `=`. However, most R users prefer to use `<−`.\n\n## Variable assignment\n\n### 1. Using `<-`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvariable_1 <- 5\nvariable_1\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 5\n```\n:::\n:::\n\n\n### 2. Using `=`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvariable_2 = 10\nvariable_2\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 10\n```\n:::\n:::\n\n\n### 3. Reverse the value and variable with `->`\n\n\n::: {.cell}\n\n```{.r .cell-code}\n15 -> variable_3\nvariable_3\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 15\n```\n:::\n:::\n\n\n### 4. Assign two variables to one value\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvariable_4 <- variable_5 <- 30\nvariable_4\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 30\n```\n:::\n\n```{.r .cell-code}\nvariable_5\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 30\n```\n:::\n:::\n\n\n## Variable output\n\nThe output of the variable can then be obtained by:\n\n1. Typing the variable name and then pressing \"Enter,\"\n2. Typing \"print\" with the variable name in brackets, `print(variable)`, and\n3. Typing \"View\" with the variable name in brackets, `View(variable)`.\n\nBoth `print()` and `View()` are some of the many built-in functions[^1] available in R.\n\n[^1]: Functions are a collection of statements (organized and reusable code) that perform a specific task, and R has many built-in functions.\n\nIn RStudio, the list of variables that have been loaded can be viewed in the environment pane.\n\n![](env_pane_1.png){fig-align=\"center\" width=\"90%\"}\n\nFigure 1: A screenshot of the environment pane with the stored variables.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nprint(variable_1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 5\n```\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nView(variable_2)\n```\n:::\n\n\nOutput of `View()` will be seen in the script pane\n\n## The `assign()` and `rm()` functions\n\nIn addition to using the assignment operators (`<-` and `=`), we can use the `assign()` function to assign a value to a variable.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nassign(\"variable_6\", 555)\nvariable_6\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 555\n```\n:::\n:::\n\n\nTo remove the assignment of the value to the variable, either delete the variable in the \"environment pane\" or use the `rm()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvariable_7 <- 159\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nrm(variable_7)\n```\n:::\n\n\nAfter running `rm()` look at the environment pane to confirm whether `variable_7` has been removed.\n\n## Naming variables\n\nAt this point, you may be wondering what conventions are used for naming variables. First, variables need to have meaningful names such as current_temp, time_24_hr, or weight_lbs. However, we need to be mindful of the [variable](https://web.stanford.edu/class/cs109l/unrestricted/resources/google-style.html) [style guide](http://adv-r.had.co.nz/Style.html) which provides us with the appropriate rules for naming variables.\n\nSome rules to keep in mind are:\n\n1. R is case-sensitive (`variable` is not the same as `Variable`),\n2. Names similar to typical outputs or functions (`TRUE`, `FALSE`, `if`, or `else`) cannot be used,\n3. Appropriate variable names can contain letters, numbers, dots, and underscores. However, you cannot start with an underscore, number, or dot followed by a number.\n\n## Valid and invalid names\n\n### Valid names:\n\n- time_24_hr\n- .time24_hr\n\n### Invalid names:\n\n- \\_24_hr.time\n- 24_hr_time\n- .24_hr_time\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/posts/r_rstudio/new_post_4/post_4/execute-results/html.json b/_freeze/posts/r_rstudio/new_post_4/post_4/execute-results/html.json new file mode 100644 index 0000000..c002cfb --- /dev/null +++ b/_freeze/posts/r_rstudio/new_post_4/post_4/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "422f19d53489a0b7f72a6fc11e2e947b", + "result": { + "markdown": "---\ntitle: \"The Basics of R and RStudio\"\nsubtitle: \"Part 3: Data Types\"\nauthor: \"William Okech\"\ndate: \"2022-06-23\"\nimage: \"r_and_rstudio.png\"\ncategories: [RStudio, R, Tutorial, Blog]\ntoc: true\ndraft: false\n---\n\n\n## Introduction\n\nR and RStudio utilize multiple data types to store different kinds of data.\n\nThe most common data types in R are listed below.\n\n| **Data Type** | **Description** |\n|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| Numeric | The most common data type. The values can be numbers or decimals (all real numbers). |\n| Integer | Special case of numeric data without decimals. |\n| Logical | Boolean data type with only 2 values (`TRUE` or `FALSE`). |\n| Complex | Specifies imaginary values in R. |\n| Character | Assigns a character or string to a variable. The character variables are enclosed in single quotes ('character') while the string variables are enclosed in double quotes (\"string\"). |\n| Factor | Special type of character variable that represents a categorical such as gender. |\n| Raw | Specifies values as raw bytes. It uses built-in functions to convert between raw and character (charToRaw() or rawToChar()). |\n| Dates | Specifies the date variable. Date stores a date and POSIXct stores a date and time. The output is indicated as the number of days (Date) or number of seconds (POSIXct) since 01/01/1970. |\n\n## Data types\n\n### 1. Numeric\n\n\n::: {.cell}\n\n```{.r .cell-code}\n89.98\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 89.98\n```\n:::\n\n```{.r .cell-code}\n55\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 55\n```\n:::\n:::\n\n\n### 2. Integer\n\n\n::: {.cell}\n\n```{.r .cell-code}\n5L\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 5\n```\n:::\n\n```{.r .cell-code}\n5768L\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 5768\n```\n:::\n:::\n\n\n### 3. Logical\n\n\n::: {.cell}\n\n```{.r .cell-code}\nTRUE\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n\n```{.r .cell-code}\nFALSE\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n:::\n\n\n### 4. Complex\n\n\n::: {.cell}\n\n```{.r .cell-code}\n10 + 30i\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 10+30i\n```\n:::\n\n```{.r .cell-code}\n287 + 34i\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 287+34i\n```\n:::\n:::\n\n\n### 5. Character or String\n\n\n::: {.cell}\n\n```{.r .cell-code}\n'abc'\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"abc\"\n```\n:::\n\n```{.r .cell-code}\n\"def\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"def\"\n```\n:::\n\n```{.r .cell-code}\n\"I like learning R\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"I like learning R\"\n```\n:::\n:::\n\n\n### 6. Dates\n\n\n::: {.cell}\n\n```{.r .cell-code}\n\"2022-06-23 14:39:21 EAT\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"2022-06-23 14:39:21 EAT\"\n```\n:::\n\n```{.r .cell-code}\n\"2022-06-23\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"2022-06-23\"\n```\n:::\n:::\n\n\n## Examining various data types\n\nSeveral functions exist to examine the features of the various data types. These include:\n\n1. `typeof()` -- what is the data type of the object (low-level)?\n2. `class()` -- what is the data type of the object (high-level)?\n3. `length()` -- how long is the object?\n4. `attributes()` -- any metadata available?\n\nLet's look at how these functions work with a few examples\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 45.84\nb <- 858L\nc <- TRUE\nd <- 89 + 34i\ne <- 'abc'\n```\n:::\n\n\n### 1. Examine the data type at a low-level with `typeof()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"double\"\n```\n:::\n\n```{.r .cell-code}\ntypeof(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"integer\"\n```\n:::\n\n```{.r .cell-code}\ntypeof(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"logical\"\n```\n:::\n\n```{.r .cell-code}\ntypeof(d)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"complex\"\n```\n:::\n\n```{.r .cell-code}\ntypeof(e)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"character\"\n```\n:::\n:::\n\n\n### 2. Examine the data type at a high-level with `class()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nclass(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"numeric\"\n```\n:::\n\n```{.r .cell-code}\nclass(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"integer\"\n```\n:::\n\n```{.r .cell-code}\nclass(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"logical\"\n```\n:::\n\n```{.r .cell-code}\nclass(d)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"complex\"\n```\n:::\n\n```{.r .cell-code}\nclass(e)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"character\"\n```\n:::\n:::\n\n\n### 3. Use the `is.____()` functions to determine the data type\n\nTo test whether the variable is of a specific type, we can use the `is.____()` functions.\n\nFirst, we test the variable `a` which is numeric.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.numeric(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n\n```{.r .cell-code}\nis.integer(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n\n```{.r .cell-code}\nis.logical(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n\n```{.r .cell-code}\nis.character(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n:::\n\n\nSecond, we test the variable `c` which is logical.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.numeric(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n\n```{.r .cell-code}\nis.integer(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n\n```{.r .cell-code}\nis.logical(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n\n```{.r .cell-code}\nis.character(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n:::\n\n\n## Converting between various data types\n\nTo convert between data types we can use the `as.____()` functions. These include: `as.Date()`, `as.numeric()`, and `as.factor()`. Additionally, other helpful functions include factor() which adds levels to the data and `nchar()` which provides the length of the data.\n\n### Examples\n\n\n::: {.cell}\n\n```{.r .cell-code}\nas.integer(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 45\n```\n:::\n\n```{.r .cell-code}\nas.logical(0)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n\n```{.r .cell-code}\nas.logical(1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n\n```{.r .cell-code}\nnchar(e)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 3\n```\n:::\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/posts/r_rstudio_basics/arithmetic/arithmetic/execute-results/html.json b/_freeze/posts/r_rstudio_basics/arithmetic/arithmetic/execute-results/html.json new file mode 100644 index 0000000..5d8342e --- /dev/null +++ b/_freeze/posts/r_rstudio_basics/arithmetic/arithmetic/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "24c6f9027f439b1d4fed6b1a287db4d2", + "result": { + "markdown": "---\ntitle: \"The Basics of R and RStudio\"\nsubtitle: \"Part 1: Simple Arithmetic\"\nauthor: \"William Okech\"\ndate: \"2022-06-15\"\nimage: \"r_and_rstudio.png\"\ncategories: [RStudio, R, Tutorial, Blog]\ntoc: true\ndraft: false\n---\n\n\n## Introduction\n\nThis is the first in a series of blog posts looking at the basics of R and RStudio. These programs allow us to perform various basic and complex calculations.\n\nTo get started, first, we will open R or RStudio. In R, go to the console, and in RStudio, head to the console pane. Next, type in a basic arithmetic calculation such as \"1 + 1\" after the angle bracket (\\>) and hit \"Enter.\"\n\nAn example of a basic calculation:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n1+1\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2\n```\n:::\n:::\n\n\nThe output will be observed next to the square bracket containing the number 1 (\\[1\\]).\n\n![](r_console_1plus1.png){fig-align=\"center\" width=\"90%\"}\n\nAdditionally, to include comments into the code block we use the hash (#) symbol. Anything written after the code block will be commented out and not run.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# A simple arithmetic calculation (which is not run because of the hash symbol)\n1+1\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2\n```\n:::\n:::\n\n\n## Arithmetic operators available in R/RStudio\n\nVarious arithmetic operators (listed below) can be used in R/RStudio.\n\n| Arithmetic Operator | Description |\n|:-------------------:|:----------------------------------:|\n| \\+ | Addition |\n| \\- | Subtraction |\n| \\* | Multiplication |\n| / | Division |\n| \\*\\* or \\^ | Exponentiation |\n| %% | Modulus (remainder after division) |\n| %/% | Integer division |\n\n## Examples\n\n### Addition\n\n\n::: {.cell}\n\n```{.r .cell-code}\n10+30\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 40\n```\n:::\n:::\n\n\n### Subtraction\n\n\n::: {.cell}\n\n```{.r .cell-code}\n30-24\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 6\n```\n:::\n:::\n\n\n### Multiplication\n\n\n::: {.cell}\n\n```{.r .cell-code}\n20*4\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 80\n```\n:::\n:::\n\n\n### Division\n\n\n::: {.cell}\n\n```{.r .cell-code}\n93/4\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 23.25\n```\n:::\n:::\n\n\n### Exponentiation\n\n\n::: {.cell}\n\n```{.r .cell-code}\n3^6\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 729\n```\n:::\n:::\n\n\n### Modulus (remainder with division)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n94%%5\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 4\n```\n:::\n:::\n\n\n### Integer Division\n\n\n::: {.cell}\n\n```{.r .cell-code}\n54%/%7\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 7\n```\n:::\n:::\n\n\n### Slightly more complex arithmetic operations\n\n\n::: {.cell}\n\n```{.r .cell-code}\n5-1+(4*3)/16*3\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 6.25\n```\n:::\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/posts/r_rstudio_basics/data_structures/data_structures/execute-results/html.json b/_freeze/posts/r_rstudio_basics/data_structures/data_structures/execute-results/html.json new file mode 100644 index 0000000..42dfdf9 --- /dev/null +++ b/_freeze/posts/r_rstudio_basics/data_structures/data_structures/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "fbb5ceb936c7718a895cd8aa18a6cae8", + "result": { + "markdown": "---\ntitle: \"The Basics of R and RStudio\"\nsubtitle: \"Part 7: Data Structures\"\nauthor: \"William Okech\"\ndate: \"2022-11-16\"\nimage: \"r_and_rstudio.png\"\ncategories: [RStudio, R, Tutorial, Blog]\ntoc: true\ndraft: false\n---\n\n\n## Introduction\n\nData structures in R are tools for storing and organizing multiple values.\n\nThey help to organize stored data in a way that the data can be used more effectively. Data structures vary according to the number of dimensions and the data types (heterogeneous or homogeneous) contained. The primary data structures are:\n\n1. Vectors ([link](/posts/series_1/new_post_6/post_6.html))\n\n2. Lists\n\n3. Data frames\n\n4. Matrices\n\n5. Arrays\n\n6. Factors\n\n## Data structures\n\n### 1. Vectors\n\nDiscussed in a previous [post](/posts/series_1/new_post_6/post_6.html)\n\n### 2. Lists\n\nLists are objects/containers that hold elements of the same or different types. They can containing strings, numbers, vectors, matrices, functions, or other lists. Lists are created with the `list()` function\n\n#### Examples\n\n#### a. Three element list\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist_1 <- list(10, 30, 50)\n```\n:::\n\n\n#### b. Single element list\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist_2 <- list(c(10, 30, 50))\n```\n:::\n\n\n#### c. Three element list\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist_3 <- list(1:3, c(50,40), 3:-5)\n```\n:::\n\n\n#### d. List with elements of different types\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist_4 <- list(c(\"a\", \"b\", \"c\"), 5:-1)\n```\n:::\n\n\n#### e. List which contains a list\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist_5 <- list(c(\"a\", \"b\", \"c\"), 5:-1, list_1)\n```\n:::\n\n\n#### f. Set names for the list elements\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames(list_5)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nNULL\n```\n:::\n\n```{.r .cell-code}\nnames(list_5) <- c(\"character vector\", \"numeric vector\", \"list\")\nnames(list_5)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"character vector\" \"numeric vector\" \"list\" \n```\n:::\n:::\n\n\n#### g. Access elements\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist_5[[1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"a\" \"b\" \"c\"\n```\n:::\n\n```{.r .cell-code}\nlist_5[[\"character vector\"]]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"a\" \"b\" \"c\"\n```\n:::\n:::\n\n\n#### h. Length of list\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlength(list_1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 3\n```\n:::\n\n```{.r .cell-code}\nlength(list_5)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 3\n```\n:::\n:::\n\n\n### 3. Data frames\n\nA data frame is one of the most common data objects used to store tabular data in R. Tabular data has rows representing observations and columns representing variables. Dataframes contain lists of equal-length vectors. Each column holds a different type of data, but within each column, the elements must be of the same type. The most common data frame characteristics are listed below:\n\n• Columns should have a name;\n\n• Row names should be unique;\n\n• Various data can be stored (such as numeric, factor, and character);\n\n• The individual columns should contain the same number of data items.\n\n### Creation of data frames\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlevel <- c(\"Low\", \"Mid\", \"High\")\nlanguage <- c(\"R\", \"RStudio\", \"Shiny\")\nage <- c(25, 36, 47)\n\ndf_1 <- data.frame(level, language, age)\n```\n:::\n\n\n### Functions used to manipulate data frames\n\n#### a. Number of rows\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnrow(df_1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 3\n```\n:::\n:::\n\n\n#### b. Number of columns\n\n\n::: {.cell}\n\n```{.r .cell-code}\nncol(df_1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 3\n```\n:::\n:::\n\n\n#### c. Dimensions\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndim(df_1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 3 3\n```\n:::\n:::\n\n\n#### d. Class of data frame\n\n\n::: {.cell}\n\n```{.r .cell-code}\nclass(df_1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"data.frame\"\n```\n:::\n:::\n\n\n#### e. Column names\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncolnames(df_1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"level\" \"language\" \"age\" \n```\n:::\n:::\n\n\n#### f. Row names\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrownames(df_1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"1\" \"2\" \"3\"\n```\n:::\n:::\n\n\n#### g. Top and bottom values\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(df_1, n=2)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n level language age\n1 Low R 25\n2 Mid RStudio 36\n```\n:::\n\n```{.r .cell-code}\ntail(df_1, n=2)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n level language age\n2 Mid RStudio 36\n3 High Shiny 47\n```\n:::\n:::\n\n\n#### h. Access columns\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf_1$level\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"Low\" \"Mid\" \"High\"\n```\n:::\n:::\n\n\n#### i. Access individual elements\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf_1[3,2]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"Shiny\"\n```\n:::\n\n```{.r .cell-code}\ndf_1[2, 1:2]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n level language\n2 Mid RStudio\n```\n:::\n:::\n\n\n#### j. Access columns with index\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf_1[, 3]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 25 36 47\n```\n:::\n\n```{.r .cell-code}\ndf_1[, c(\"language\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"R\" \"RStudio\" \"Shiny\" \n```\n:::\n:::\n\n\n#### k. Access rows with index\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf_1[2, ]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n level language age\n2 Mid RStudio 36\n```\n:::\n:::\n\n\n### 4. Matrices\n\nA matrix is a rectangular two-dimensional (2D) homogeneous data set containing rows and columns. It contains real numbers that are arranged in a fixed number of rows and columns. Matrices are generally used for various mathematical and statistical applications.\n\n#### a. Creation of matrices\n\n\n::: {.cell}\n\n```{.r .cell-code}\nm1 <- matrix(1:9, nrow = 3, ncol = 3) \nm2 <- matrix(21:29, nrow = 3, ncol = 3) \nm3 <- matrix(1:12, nrow = 2, ncol = 6)\n```\n:::\n\n\n#### b. Obtain the dimensions of the matrices\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# m1\nnrow(m1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 3\n```\n:::\n\n```{.r .cell-code}\nncol(m1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 3\n```\n:::\n\n```{.r .cell-code}\ndim(m1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 3 3\n```\n:::\n\n```{.r .cell-code}\n# m3\nnrow(m3)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2\n```\n:::\n\n```{.r .cell-code}\nncol(m3)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 6\n```\n:::\n\n```{.r .cell-code}\ndim(m3)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2 6\n```\n:::\n:::\n\n\n#### c. Arithmetic with matrices\n\n\n::: {.cell}\n\n```{.r .cell-code}\nm1+m2\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [,1] [,2] [,3]\n[1,] 22 28 34\n[2,] 24 30 36\n[3,] 26 32 38\n```\n:::\n\n```{.r .cell-code}\nm1-m2\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [,1] [,2] [,3]\n[1,] -20 -20 -20\n[2,] -20 -20 -20\n[3,] -20 -20 -20\n```\n:::\n\n```{.r .cell-code}\nm1*m2\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [,1] [,2] [,3]\n[1,] 21 96 189\n[2,] 44 125 224\n[3,] 69 156 261\n```\n:::\n\n```{.r .cell-code}\nm1/m2\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [,1] [,2] [,3]\n[1,] 0.04761905 0.1666667 0.2592593\n[2,] 0.09090909 0.2000000 0.2857143\n[3,] 0.13043478 0.2307692 0.3103448\n```\n:::\n\n```{.r .cell-code}\nm1 == m2\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [,1] [,2] [,3]\n[1,] FALSE FALSE FALSE\n[2,] FALSE FALSE FALSE\n[3,] FALSE FALSE FALSE\n```\n:::\n:::\n\n\n#### d. Matrix multiplication\n\n\n::: {.cell}\n\n```{.r .cell-code}\nm5 <- matrix(1:10, nrow = 5)\nm6 <- matrix(43:34, nrow = 5)\n\nm5*m6\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [,1] [,2]\n[1,] 43 228\n[2,] 84 259\n[3,] 123 288\n[4,] 160 315\n[5,] 195 340\n```\n:::\n\n```{.r .cell-code}\n# m5%*%m6 will not work because of the dimesions.\n# the vector m6 needs to be transposed.\n\n# Transpose\nm5%*%t(m6)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [,1] [,2] [,3] [,4] [,5]\n[1,] 271 264 257 250 243\n[2,] 352 343 334 325 316\n[3,] 433 422 411 400 389\n[4,] 514 501 488 475 462\n[5,] 595 580 565 550 535\n```\n:::\n:::\n\n\n#### e. Generate an identity matrix\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndiag(5)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [,1] [,2] [,3] [,4] [,5]\n[1,] 1 0 0 0 0\n[2,] 0 1 0 0 0\n[3,] 0 0 1 0 0\n[4,] 0 0 0 1 0\n[5,] 0 0 0 0 1\n```\n:::\n:::\n\n\n#### f. Column and row names\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncolnames(m5)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nNULL\n```\n:::\n\n```{.r .cell-code}\nrownames(m6)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nNULL\n```\n:::\n:::\n\n\n### 5. Arrays\n\nAn array is a multidimensional vector that stores homogeneous data. It can be thought of as a stacked matrix and stores data in more than 2 dimensions (n-dimensional). An array is composed of rows by columns by dimensions. Example: an array with dimensions, dim = c(2,3,3), has 2 rows, 3 columns, and 3 matrices.\n\n#### a. Creating arrays\n\n\n::: {.cell}\n\n```{.r .cell-code}\narr_1 <- array(1:12, dim = c(2,3,2))\n\narr_1\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n, , 1\n\n [,1] [,2] [,3]\n[1,] 1 3 5\n[2,] 2 4 6\n\n, , 2\n\n [,1] [,2] [,3]\n[1,] 7 9 11\n[2,] 8 10 12\n```\n:::\n:::\n\n\n#### b. Filter array by index\n\n\n::: {.cell}\n\n```{.r .cell-code}\narr_1[1, , ]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [,1] [,2]\n[1,] 1 7\n[2,] 3 9\n[3,] 5 11\n```\n:::\n\n```{.r .cell-code}\narr_1[1, ,1]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1 3 5\n```\n:::\n\n```{.r .cell-code}\narr_1[, , 1]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [,1] [,2] [,3]\n[1,] 1 3 5\n[2,] 2 4 6\n```\n:::\n:::\n\n\n### 6. Factors\n\nFactors are used to store integers or strings which are categorical. They categorize data and store the data in different levels. This form of data storage is useful for statistical modeling. Examples include TRUE or FALSE and male or female.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvector <- c(\"Male\", \"Female\")\nfactor_1 <- factor(vector)\nfactor_1\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] Male Female\nLevels: Female Male\n```\n:::\n:::\n\n\nOR\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfactor_2 <- as.factor(vector)\nfactor_2\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] Male Female\nLevels: Female Male\n```\n:::\n\n```{.r .cell-code}\nas.numeric(factor_2)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2 1\n```\n:::\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/posts/r_rstudio_basics/data_types/data_types/execute-results/html.json b/_freeze/posts/r_rstudio_basics/data_types/data_types/execute-results/html.json new file mode 100644 index 0000000..c002cfb --- /dev/null +++ b/_freeze/posts/r_rstudio_basics/data_types/data_types/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "422f19d53489a0b7f72a6fc11e2e947b", + "result": { + "markdown": "---\ntitle: \"The Basics of R and RStudio\"\nsubtitle: \"Part 3: Data Types\"\nauthor: \"William Okech\"\ndate: \"2022-06-23\"\nimage: \"r_and_rstudio.png\"\ncategories: [RStudio, R, Tutorial, Blog]\ntoc: true\ndraft: false\n---\n\n\n## Introduction\n\nR and RStudio utilize multiple data types to store different kinds of data.\n\nThe most common data types in R are listed below.\n\n| **Data Type** | **Description** |\n|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| Numeric | The most common data type. The values can be numbers or decimals (all real numbers). |\n| Integer | Special case of numeric data without decimals. |\n| Logical | Boolean data type with only 2 values (`TRUE` or `FALSE`). |\n| Complex | Specifies imaginary values in R. |\n| Character | Assigns a character or string to a variable. The character variables are enclosed in single quotes ('character') while the string variables are enclosed in double quotes (\"string\"). |\n| Factor | Special type of character variable that represents a categorical such as gender. |\n| Raw | Specifies values as raw bytes. It uses built-in functions to convert between raw and character (charToRaw() or rawToChar()). |\n| Dates | Specifies the date variable. Date stores a date and POSIXct stores a date and time. The output is indicated as the number of days (Date) or number of seconds (POSIXct) since 01/01/1970. |\n\n## Data types\n\n### 1. Numeric\n\n\n::: {.cell}\n\n```{.r .cell-code}\n89.98\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 89.98\n```\n:::\n\n```{.r .cell-code}\n55\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 55\n```\n:::\n:::\n\n\n### 2. Integer\n\n\n::: {.cell}\n\n```{.r .cell-code}\n5L\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 5\n```\n:::\n\n```{.r .cell-code}\n5768L\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 5768\n```\n:::\n:::\n\n\n### 3. Logical\n\n\n::: {.cell}\n\n```{.r .cell-code}\nTRUE\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n\n```{.r .cell-code}\nFALSE\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n:::\n\n\n### 4. Complex\n\n\n::: {.cell}\n\n```{.r .cell-code}\n10 + 30i\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 10+30i\n```\n:::\n\n```{.r .cell-code}\n287 + 34i\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 287+34i\n```\n:::\n:::\n\n\n### 5. Character or String\n\n\n::: {.cell}\n\n```{.r .cell-code}\n'abc'\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"abc\"\n```\n:::\n\n```{.r .cell-code}\n\"def\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"def\"\n```\n:::\n\n```{.r .cell-code}\n\"I like learning R\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"I like learning R\"\n```\n:::\n:::\n\n\n### 6. Dates\n\n\n::: {.cell}\n\n```{.r .cell-code}\n\"2022-06-23 14:39:21 EAT\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"2022-06-23 14:39:21 EAT\"\n```\n:::\n\n```{.r .cell-code}\n\"2022-06-23\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"2022-06-23\"\n```\n:::\n:::\n\n\n## Examining various data types\n\nSeveral functions exist to examine the features of the various data types. These include:\n\n1. `typeof()` -- what is the data type of the object (low-level)?\n2. `class()` -- what is the data type of the object (high-level)?\n3. `length()` -- how long is the object?\n4. `attributes()` -- any metadata available?\n\nLet's look at how these functions work with a few examples\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 45.84\nb <- 858L\nc <- TRUE\nd <- 89 + 34i\ne <- 'abc'\n```\n:::\n\n\n### 1. Examine the data type at a low-level with `typeof()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"double\"\n```\n:::\n\n```{.r .cell-code}\ntypeof(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"integer\"\n```\n:::\n\n```{.r .cell-code}\ntypeof(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"logical\"\n```\n:::\n\n```{.r .cell-code}\ntypeof(d)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"complex\"\n```\n:::\n\n```{.r .cell-code}\ntypeof(e)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"character\"\n```\n:::\n:::\n\n\n### 2. Examine the data type at a high-level with `class()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nclass(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"numeric\"\n```\n:::\n\n```{.r .cell-code}\nclass(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"integer\"\n```\n:::\n\n```{.r .cell-code}\nclass(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"logical\"\n```\n:::\n\n```{.r .cell-code}\nclass(d)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"complex\"\n```\n:::\n\n```{.r .cell-code}\nclass(e)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"character\"\n```\n:::\n:::\n\n\n### 3. Use the `is.____()` functions to determine the data type\n\nTo test whether the variable is of a specific type, we can use the `is.____()` functions.\n\nFirst, we test the variable `a` which is numeric.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.numeric(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n\n```{.r .cell-code}\nis.integer(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n\n```{.r .cell-code}\nis.logical(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n\n```{.r .cell-code}\nis.character(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n:::\n\n\nSecond, we test the variable `c` which is logical.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.numeric(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n\n```{.r .cell-code}\nis.integer(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n\n```{.r .cell-code}\nis.logical(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n\n```{.r .cell-code}\nis.character(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n:::\n\n\n## Converting between various data types\n\nTo convert between data types we can use the `as.____()` functions. These include: `as.Date()`, `as.numeric()`, and `as.factor()`. Additionally, other helpful functions include factor() which adds levels to the data and `nchar()` which provides the length of the data.\n\n### Examples\n\n\n::: {.cell}\n\n```{.r .cell-code}\nas.integer(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 45\n```\n:::\n\n```{.r .cell-code}\nas.logical(0)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n\n```{.r .cell-code}\nas.logical(1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n\n```{.r .cell-code}\nnchar(e)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 3\n```\n:::\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/posts/r_rstudio_basics/operators/operators/execute-results/html.json b/_freeze/posts/r_rstudio_basics/operators/operators/execute-results/html.json new file mode 100644 index 0000000..93e52ca --- /dev/null +++ b/_freeze/posts/r_rstudio_basics/operators/operators/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "0934fbefb92cd0989f2206d5ef1e85c1", + "result": { + "markdown": "---\ntitle: \"The Basics of R and RStudio\"\nsubtitle: \"Part 4: Operators\"\nauthor: \"William Okech\"\ndate: \"2022-11-09\"\nimage: \"r_and_rstudio.png\"\ncategories: [RStudio, R, Tutorial, Blog]\ntoc: true\ndraft: false\n---\n\n\n## Introduction\n\nR has many different types of operators that can perform different tasks.\n\nHere we will focus on 5 major types of operators. The major types of operators are:\n\n1. Arithmetic,\n\n2. Relational,\n\n3. Logical,\n\n4. Assignment, and\n\n5. Miscellaneous.\n\n## 1. Arithmetic Operators\n\nArithmetic operators are used to perform mathematical operations. These operators have been highlighted in [Part 1](/posts/series_1/new_post_2/post_2.html) of the series.\n\n## 2. Relational Operators\n\nRelational operators are used to find the relationship between 2 variables and compare objects. The output of these comparisons is Boolean (`TRUE` or `FALSE`). The table below describes the most common relational operators.\n\n| Relational Operator | Description |\n|:-------------------:|:------------------------:|\n| \\< | Less than |\n| \\> | Greater than |\n| \\<= | Less than or equal to |\n| \\>= | Greater than or equal to |\n| == | Equal to |\n| != | Not Equal to |\n\nAssign values to variables\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 227\ny <- 639\n```\n:::\n\n\n### a. Less than\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx < y\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n### b. Greater than\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx > y\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n:::\n\n\n### c. Less than or equal to\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <= 300\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n### d. Greater than or equal to\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny >= 700\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n:::\n\n\n### e. Equal to\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny == 639\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n### f. Not Equal to\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx != 227\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n:::\n\n\n## 3. Logical Operators\n\nLogical operators are used to specify multiple conditions between objects. Logical operators work with basic data types such as logical, numeric, and complex data types. This returns `TRUE` or `FALSE` values. Numbers greater that `1` are `TRUE` and `0` equals `FALSE`. The table below describes the most common logical operators.\n\n| Logical Operator | Description |\n|:----------------:|:------------------------:|\n| ! | Logical NOT |\n| \\| | Element-wise logical OR |\n| & | Element-wise logical AND |\n\nAssign vectors to variables\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvector_1 <- c(0,2)\nvector_2 <- c(1,0)\n```\n:::\n\n\n### a. Logical NOT\n\n\n::: {.cell}\n\n```{.r .cell-code}\n!vector_1\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE FALSE\n```\n:::\n\n```{.r .cell-code}\n!vector_2\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE TRUE\n```\n:::\n:::\n\n\n### b. Element-wise Logical OR\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvector_1 | vector_2\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE TRUE\n```\n:::\n:::\n\n\n### c. Element-wise Logical AND\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvector_1 & vector_2\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE FALSE\n```\n:::\n:::\n\n## 4. Assignment Operators\n\nThese operators assign values to variables. A more comprehensive review can be obtained in [Part 2](/posts/series_1/new_post_3/post_3.html) of the series.\n\n## 5. Miscellaneous Operators\n\nThese are helpful operators for working in that can perform a variety of functions. A few common miscellaneous operators are described below.\n\n| Miscellaneous Operator | Description |\n|:-------------------:|:-------------------------------------------------:|\n| %\\*% | Matrix multiplication (to be discussed in subsequent chapters) |\n| %in% | Does an element belong to a vector |\n| : | Generate a sequence |\n\n### a. Sequence\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 1:8\na\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1 2 3 4 5 6 7 8\n```\n:::\n\n```{.r .cell-code}\nb <- 4:10\nb\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 4 5 6 7 8 9 10\n```\n:::\n:::\n\n\n### b. Element in a vector\n\n\n::: {.cell}\n\n```{.r .cell-code}\na %in% b\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE\n```\n:::\n\n```{.r .cell-code}\n9 %in% b\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n\n```{.r .cell-code}\n9 %in% a\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/posts/r_rstudio_basics/variables/post_3/execute-results/html.json b/_freeze/posts/r_rstudio_basics/variables/post_3/execute-results/html.json new file mode 100644 index 0000000..b97514e --- /dev/null +++ b/_freeze/posts/r_rstudio_basics/variables/post_3/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "66087bd080a48fbfddc668e0ea367fa4", + "result": { + "markdown": "---\ntitle: \"The Basics of R and RStudio\"\nsubtitle: \"Part 2: Variables\"\nauthor: \"William Okech\"\ndate: \"2022-06-22\"\nimage: \"r_and_rstudio.png\"\ncategories: [RStudio, R, Tutorial, Blog]\ntoc: true\ndraft: false\n---\n\n\n## Introduction\n\nVariables are instrumental in programming because they are used as \"containers\" to store data values.\n\nTo assign a value to a variable, we can use `<−` or `=`. However, most R users prefer to use `<−`.\n\n## Variable assignment\n\n### 1. Using `<-`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvariable_1 <- 5\nvariable_1\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 5\n```\n:::\n:::\n\n\n### 2. Using `=`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvariable_2 = 10\nvariable_2\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 10\n```\n:::\n:::\n\n\n### 3. Reverse the value and variable with `->`\n\n\n::: {.cell}\n\n```{.r .cell-code}\n15 -> variable_3\nvariable_3\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 15\n```\n:::\n:::\n\n\n### 4. Assign two variables to one value\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvariable_4 <- variable_5 <- 30\nvariable_4\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 30\n```\n:::\n\n```{.r .cell-code}\nvariable_5\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 30\n```\n:::\n:::\n\n\n## Variable output\n\nThe output of the variable can then be obtained by:\n\n1. Typing the variable name and then pressing \"Enter,\"\n2. Typing \"print\" with the variable name in brackets, `print(variable)`, and\n3. Typing \"View\" with the variable name in brackets, `View(variable)`.\n\nBoth `print()` and `View()` are some of the many built-in functions[^1] available in R.\n\n[^1]: Functions are a collection of statements (organized and reusable code) that perform a specific task, and R has many built-in functions.\n\nIn RStudio, the list of variables that have been loaded can be viewed in the environment pane.\n\n![](env_pane_1.png){fig-align=\"center\" width=\"90%\"}\n\nFigure 1: A screenshot of the environment pane with the stored variables.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nprint(variable_1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 5\n```\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nView(variable_2)\n```\n:::\n\n\nOutput of `View()` will be seen in the script pane\n\n## The `assign()` and `rm()` functions\n\nIn addition to using the assignment operators (`<-` and `=`), we can use the `assign()` function to assign a value to a variable.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nassign(\"variable_6\", 555)\nvariable_6\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 555\n```\n:::\n:::\n\n\nTo remove the assignment of the value to the variable, either delete the variable in the \"environment pane\" or use the `rm()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvariable_7 <- 159\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nrm(variable_7)\n```\n:::\n\n\nAfter running `rm()` look at the environment pane to confirm whether `variable_7` has been removed.\n\n## Naming variables\n\nAt this point, you may be wondering what conventions are used for naming variables. First, variables need to have meaningful names such as current_temp, time_24_hr, or weight_lbs. However, we need to be mindful of the [variable](https://web.stanford.edu/class/cs109l/unrestricted/resources/google-style.html) [style guide](http://adv-r.had.co.nz/Style.html) which provides us with the appropriate rules for naming variables.\n\nSome rules to keep in mind are:\n\n1. R is case-sensitive (`variable` is not the same as `Variable`),\n2. Names similar to typical outputs or functions (`TRUE`, `FALSE`, `if`, or `else`) cannot be used,\n3. Appropriate variable names can contain letters, numbers, dots, and underscores. However, you cannot start with an underscore, number, or dot followed by a number.\n\n## Valid and invalid names\n\n### Valid names:\n\n- time_24_hr\n- .time24_hr\n\n### Invalid names:\n\n- \\_24_hr.time\n- 24_hr_time\n- .24_hr_time\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/posts/r_rstudio_basics/vectors/vectors/execute-results/html.json b/_freeze/posts/r_rstudio_basics/vectors/vectors/execute-results/html.json new file mode 100644 index 0000000..777fcfa --- /dev/null +++ b/_freeze/posts/r_rstudio_basics/vectors/vectors/execute-results/html.json @@ -0,0 +1,14 @@ +{ + "hash": "e061cccf533de571762be15211552083", + "result": { + "markdown": "---\ntitle: \"The Basics of R and RStudio\"\nsubtitle: \"Part 5: Vectors\"\nauthor: \"William Okech\"\ndate: \"2022-11-12\"\nimage: \"r_and_rstudio.png\"\ncategories: [RStudio, R, Tutorial, Blog]\ntoc: true\ndraft: false\n---\n\n\n## Introduction\n\nA vector is a collection of elements of the same data type, and they are a basic data structure in R programming.\n\nVectors cannot be of mixed data type. The most common way to create a vector is with `c()`, where \"c\" stands for combine. In R, vectors do not have dimensions; therefore, they cannot be defined by columns or rows. Vectors can be divided into atomic vectors and lists (discussed in [Part 7](https://www.williamokech.com/posts/series_1/new_post_8/post_8.html)). The atomic vectors include logical, character, and numeric (integer or double).\n\nAdditionally, R is a vectorized language because mathematical operations are applied to each element of the vector without the need to loop through the vector.Examples of vectors are shown below:\n\n• Numbers: `c(2, 10, 16, -5)`\n\n• Characters: `c(\"R\", \"RStudio\", \"Shiny\", \"Quarto\")`\n\n• Logicals: `c(\"TRUE\", \"FALSE\", \"TRUE\")`\n\n## Sequence Generation\n\nTo generate a vector with a sequence of consecutive numbers, we can use `:`, `sequence()`, or `seq()`.\n\n### Generate a sequence using `:`\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 9:18\na\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 9 10 11 12 13 14 15 16 17 18\n```\n:::\n\n```{.r .cell-code}\na_rev <- 18:9\na_rev\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 18 17 16 15 14 13 12 11 10 9\n```\n:::\n\n```{.r .cell-code}\na_rev_minus <- 5:-3\na_rev_minus\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 5 4 3 2 1 0 -1 -2 -3\n```\n:::\n:::\n\n\n### Generate a sequence using `sequence()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb <- sequence(7)\nb\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1 2 3 4 5 6 7\n```\n:::\n\n```{.r .cell-code}\nc <- sequence(c(5,9))\nc\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 1 2 3 4 5 1 2 3 4 5 6 7 8 9\n```\n:::\n:::\n\n\n### Generate a sequence using `seq()`\n\nThe `seq()` function has four main arguments: seq(from, to, by, length.out), where \"from\" and \"to\" are the starting and ending elements of the sequence. Additionally, \"by\" is the difference between the elements, and \"length.out\" is the maximum length of the vector.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nd <- seq(2,20,by=2)\nd\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 2 4 6 8 10 12 14 16 18 20\n```\n:::\n\n```{.r .cell-code}\nf <- seq(2,20, length.out=5)\nf\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2.0 6.5 11.0 15.5 20.0\n```\n:::\n\n```{.r .cell-code}\nh <- seq(20,2,by=-2)\nh\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 20 18 16 14 12 10 8 6 4 2\n```\n:::\n\n```{.r .cell-code}\nj <- seq(20, 2, length.out=3)\nj\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 20 11 2\n```\n:::\n:::\n\n\n## Repeating vectors\n\nTo create a repeating vector, we can use `rep()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nk <- rep(c(0,3,6), times = 3)\nk\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0 3 6 0 3 6 0 3 6\n```\n:::\n\n```{.r .cell-code}\nl <- rep(2:6, each = 3)\nl\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6\n```\n:::\n\n```{.r .cell-code}\nm <- rep(7:10, length.out = 20)\nm\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 7 8 9 10 7 8 9 10 7 8 9 10 7 8 9 10 7 8 9 10\n```\n:::\n:::\n\n\n## Vector Operations\n\nVectors of equal length can be operated on together. If one vector is shorter, it will get recycled, as its elements are repeated until it matches the elements of the longer vector. When using vectors of unequal lengths, it would be ideal if the longer vector is a multiple of the shorter vector.\n\n### Basic Vector Operations\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec_1 <- 1:10\n\nvec_1*12 # multiplication\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 12 24 36 48 60 72 84 96 108 120\n```\n:::\n\n```{.r .cell-code}\nvec_1+12 # addition\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 13 14 15 16 17 18 19 20 21 22\n```\n:::\n\n```{.r .cell-code}\nvec_1-12 # subtraction\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] -11 -10 -9 -8 -7 -6 -5 -4 -3 -2\n```\n:::\n\n```{.r .cell-code}\nvec_1/3 # division\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 0.3333333 0.6666667 1.0000000 1.3333333 1.6666667 2.0000000 2.3333333\n [8] 2.6666667 3.0000000 3.3333333\n```\n:::\n\n```{.r .cell-code}\nvec_1^4 # power\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 1 16 81 256 625 1296 2401 4096 6561 10000\n```\n:::\n\n```{.r .cell-code}\nsqrt(vec_1) # square root\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427\n [9] 3.000000 3.162278\n```\n:::\n:::\n\n\n### Operations on vectors of equal length\n\nAdditionally, we can perform operations on two vectors of equal length.\n\n1. Create two vectors\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec_3 <- 5:14\nvec_3\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 5 6 7 8 9 10 11 12 13 14\n```\n:::\n\n```{.r .cell-code}\nvec_4 <- 12:3\nvec_4\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 12 11 10 9 8 7 6 5 4 3\n```\n:::\n:::\n\n\n2. Perform various arithmetic operations\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec_3 + vec_4\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 17 17 17 17 17 17 17 17 17 17\n```\n:::\n\n```{.r .cell-code}\nvec_3 - vec_4\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] -7 -5 -3 -1 1 3 5 7 9 11\n```\n:::\n\n```{.r .cell-code}\nvec_3 / vec_4\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 0.4166667 0.5454545 0.7000000 0.8888889 1.1250000 1.4285714 1.8333333\n [8] 2.4000000 3.2500000 4.6666667\n```\n:::\n\n```{.r .cell-code}\nvec_3 * vec_4\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 60 66 70 72 72 70 66 60 52 42\n```\n:::\n\n```{.r .cell-code}\nvec_3 ^ vec_4\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 244140625 362797056 282475249 134217728 43046721 10000000 1771561\n [8] 248832 28561 2744\n```\n:::\n:::\n\n\n## Functions that can be applied to vectors\n\nThe functions listed below can be applied to vectors:\n\n1. `any()`\n\n2. `all()`\n\n3. `nchar()`\n\n4. `length()`\n\n5. `typeof()`\n\n### Examples\n\n\n::: {.cell}\n\n```{.r .cell-code}\nany(vec_3 > vec_4)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n\n```{.r .cell-code}\nany(vec_3 < vec_4)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nall(vec_3 > vec_4)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n\n```{.r .cell-code}\nall(vec_3 < vec_4)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] FALSE\n```\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nlength(vec_3)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 10\n```\n:::\n\n```{.r .cell-code}\nlength(vec_4)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 10\n```\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(vec_3)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"integer\"\n```\n:::\n\n```{.r .cell-code}\ntypeof(vec_4)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"integer\"\n```\n:::\n:::\n\n\nDetermine the number of letters in a character\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec_5 <- c(\"R\", \"RStudio\", \"Shiny\", \"Quarto\")\nnchar(vec_5)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1 7 5 6\n```\n:::\n:::\n\n\n## Recycling of vectors\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec_3 + c(10, 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 15 26 17 28 19 30 21 32 23 34\n```\n:::\n\n```{.r .cell-code}\nvec_3 + c(10, 20, 30) # will result in a warning as the longer vector is not a multiple of the shorter one\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning in vec_3 + c(10, 20, 30): longer object length is not a multiple of\nshorter object length\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] 15 26 37 18 29 40 21 32 43 24\n```\n:::\n:::\n\n\n## Accessing elements of a vector\n\nTo access the elements of a vector, we can use numeric-, character-, or logical-based indexing.\n\n### Examples\n\n#### 1. Name the columns of a vector with `names()`.\n\nCreate the vector.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec_name <- 1:5\nvec_name\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1 2 3 4 5\n```\n:::\n:::\n\n\nName the individual elements.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames(vec_name) <- c(\"a\", \"c\", \"e\", \"g\", \"i\")\nvec_name\n```\n\n::: {.cell-output .cell-output-stdout}\n```\na c e g i \n1 2 3 4 5 \n```\n:::\n:::\n\n\n#### 2. Use the vector index to filter\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec_index <- 1:5\nvec_index\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1 2 3 4 5\n```\n:::\n:::\n\n\n##### a) Logical vector as an index\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec_index[c(TRUE, FALSE, TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1 3 5\n```\n:::\n:::\n\n\n##### b) Filter vector based on an index\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec_index[1:3]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1 2 3\n```\n:::\n:::\n\n\n##### c) Access a vector using its position\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec_index[4]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 4\n```\n:::\n\n```{.r .cell-code}\nvec_index[c(2,4)]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2 4\n```\n:::\n:::\n\n\n##### d) Modify a vector using indexing\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec_index\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1 2 3 4 5\n```\n:::\n\n```{.r .cell-code}\nvec_index[5] <- 1000\nvec_index\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1 2 3 4 1000\n```\n:::\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_site/0_unpublished/danger_disturb.html b/_site/0_unpublished/danger_disturb.html index 9318e59..1dc1561 100644 --- a/_site/0_unpublished/danger_disturb.html +++ b/_site/0_unpublished/danger_disturb.html @@ -286,7 +286,7 @@

On this page

Code Along With Me (Episode 1)

-

An assessment of the livestock numbers in the six counties declared to be ‘dangerous and disturbed’ in Kenya

+

An assessment of the livestock numbers in the six counties declared to be ‘disturbed and dangerous’ in Kenya

RStudio
R
diff --git a/_site/posts/biotech/alt_protein/alt_protein_intro.html b/_site/posts/biotech/alt_protein/alt_protein_intro.html new file mode 100644 index 0000000..e6e1b09 --- /dev/null +++ b/_site/posts/biotech/alt_protein/alt_protein_intro.html @@ -0,0 +1,755 @@ + + + + + + + + + + + +William Okech - Biotechnology for the Global South + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

Biotechnology for the Global South

+

Developing alternative proteins to reduce childhood hunger and improve food production in low-income countries

+
+
RStudio
+
R
+
Biotechnology
+
Blog
+
Data Visualization
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

October 27, 2023

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+
+

Futuristic Bioreactor

+
Bioreactor image created using “Bing Image Creator” with the prompt keywords “bioreactor, computer screen, bubbling liquids, cabinet, with a dark purple, sky-blue, neon lights, carpetpunk, backlit photography, and trillwave theme.”
+
+
+
+

Key Points

+
    +
  • Hunger is a significant problem that affects close to one-tenth (1/10) of the world’s population.
  • +
  • Low-income countries, particularly in Africa, bear the brunt of the global childhood hunger crisis.
  • +
  • Alternative protein technologies have the potential to transform the way humans produce and consume proteins.
  • +
  • Increased consumer education, reduced production costs, and innovative technologies are key to the widespread acceptance and consumption of alternative proteins.
  • +
+
+
+

Introduction

+

In the year 2022, between 691 and 783 million people were hungry 1. This alarming figure indicates that almost one-tenth (1/10) of the world’s population (in 2022) did not consume enough calories to maintain a healthy and active lifestyle. Hunger (also referred to as undernutrition) is “an uncomfortable or painful physical sensation caused by insufficient consumption of dietary energy.” 2 The United Nations Children’s Fund (UNICEF) estimates that for children under 5, 148.1 million suffered from stunting (low height-for-age) and 45 million from wasting (low weight-for-height), which placed them at increased risk for physical and/or mental disabilities 3. Moreover, in 2019, protein-energy undernutrition (an energy deficit resulting from the deficiencies of many macronutrients, primarily proteins) contributed to the deaths of 100,000 children (between 0–14 years), with approximately two-thirds from Africa 4 5. That a large proportion of the human population still faces the dual challenges of hunger and food insecurity is very disheartening. This is despite the continual increases in global agricultural productivity (resulting from the expansion of agricultural land area, better yielding crops, and improved animal production methods) that have been witnessed over the past century.

+

The problem of hunger/undernourishment affects the world in an income-/region-dependent manner. Between the year 2000 and 2020, low-income countries had the highest share of their populations (25%–35%) that were undernourished, which was significantly above the world average (5%–15%) (Figure 1).

+
+
+

+
Figure 1: Share of the population that is undernourished (grouped by income)
+
+
+

Additionally, we observe that the two regions that had the highest share of the population that was undernourished were Sub-Saharan Africa (20.9%) and South Asia (15.9%) (Figure 2).

+
+
+

+
Figure 2: Share of the population that is undernourished (grouped by region)
+
+
+

To reduce the share of the population that is undernourished, various countries and development institutions have introduced interventions and technologies that can boost crop yields and enhance livestock production. These interventions include irrigation, fertilizers, improved seed, better insect and pest control strategies, and gene-editing. However, in Sub-Saharan Africa (which has the greatest burden of undernourishment and lowest crop yields), only 6% of cultivated land was irrigated and the rate of fertilizer application was approximately 17 kg/hectare, in 2018, which was significantly below the world average of 135 kg/hectare 6. Additionally, improved livestock rearing methods (such as selective breeding and improved nutrition), and fish and seafood harvesting techniques (such as selective harvesting and aquaculture systems) have increased global production. Sadly, we note that the continent of Africa has seen very little improvement in its livestock, fish, and seafood production capacity (Figure 3), and this may leave its fast-growing population susceptible to protein deficiencies if not addressed in a timely manner.

+
+
+
+
+
+
+

+
(a) Global Meat Production
+
+
+
+
+
+
+

+
(b) Global Fish Production
+
+
+
+
+

Figure 3: Global Meat and Fish Production

+
+
+

It is widely believed that promoting technologies that can improve crop yields and enhance livestock production in low-income countries is the best method to boost food production and subsequently reduce undernourishment. However, many of these interventions have been detrimental to the climate and local environment. With regard to enhancing crop production, some of these effects include:

+
    +
  1. Land degradation and deforestation 7,
  2. +
  3. Poisoning of fresh water/marine ecosystems by chemical runoff, and,
  4. +
  5. Depletion of fresh water sources resulting from overconsumption.
  6. +
+

Moreover, boosting output in the livestock sector has contributed to a number of major environmental challenges and resource conflicts. These include:

+
    +
  1. Overgrazing, soil erosion, and deforestation 8,9,
  2. +
  3. Contributing up to 15% of human-induced greenhouse gas (GHG) emissions 10,
  4. +
  5. Conflict between pastoralists/farmers over grazing land/water 11, and
  6. +
  7. Increased prevalence of antimicrobial resistance in livestock resulting from antibiotic misuse/overuse 12.
  8. +
+

Overall, these findings suggest that enhancing livestock production and boosting crop yields to reduce undernourishment may not be the panacea we envision and may cause more long-term harm than good. Additionally, with more extreme weather events (such as heat waves, floods, and droughts) resulting from climate change 13, volatility in global crop and food prices, and rapidly increasing world populations, there is a major need to develop and adopt alternative protein sources that can reduce childhood hunger and increase food production in low-income countries. In this essay, I will examine new and innovative alternative protein production technologies that can aid in generating foods with sufficient dietary energy and nutrients to meet human consumption requirements while reducing dependence on animal-based proteins.

+
+
+

What are alternative proteins?

+

Alternative proteins are plant-based and food technology alternatives to animal-based proteins 14. Proteins are large, complex molecules made up of smaller units called amino acids, and they are a key component of quality nutrition that promotes normal growth and maintenance 15. A significant advantage of alternative protein production is the reduced impact on the environment resulting from a decreased dependence on livestock-based protein production. This reduced impact is seen in the decreased greenhouse gas emissions and environmental pollution as well as the decline in the amounts of land and water required for livestock. Major sources of alternative proteins include plant proteins, insects, cultured meat, and fermentation-derived proteins 16. Generally, plant, insect, and fermentation-derived proteins are commercially available, while cultivated meats are still in the research and development phase 17.

+
+
+

Plant proteins

+

Plant proteins are harnessed directly from protein-rich seeds, and the main sources include leguminous (such as soy and pea), cereal (such as wheat and corn), and oilseed (such as peanut and flaxseed) proteins 18. The three major processing steps include protein extraction (centrifugation), protein purification (precipitation and ultrafiltration), and heat treatment (pasteurization) 19. Using the isolated proteins, specific products such as plant-based meats can be developed. To create these meats, the proteins are mixed with fibers and fats, then structured using heat and agitation, and lastly color, flavor, and aroma components are added to make the product more palatable.

+
+

Notable Companies

+
    +
  1. Fry Family Food (South Africa)
  2. +
  3. Moolec Science (Luxembourg)
  4. +
  5. Beyond Meat (Los Angeles, California, USA)
  6. +
  7. Impossible Foods (Redwood City, California, USA)
  8. +
  9. New Wave Foods (Stamford, Connecticut, USA)
  10. +
  11. Eat Just (Alameda, California, USA)
  12. +
+
+
+

Challenges to be addressed

+
    +
  1. Develop crops optimized for plant-based meat that produce higher quantities of high-quality protein,
  2. +
  3. Improve protein extraction and processing methods,
  4. +
  5. Confirm that the taste, texture, and nutritional value are similar to conventional meats, and,
  6. +
  7. Ensure that the cost of production is competitive, and the process is energy-efficient 20.
  8. +
+
+
+
+

Insect proteins

+

Proteins derived from insects are referred to as insect proteins. Insects are rich in essential nutrients such as amino acids, vitamins, and minerals. Insect-derived proteins have a dual role, as they can be eaten directly by humans or used as animal feed. A significant advantage of insect-derived proteins is their negligible environmental footprint, low cost of production, and absence of disease-causing pathogens (post-processing) 21. Numerous people groups across the world have traditionally consumed insects, and it is estimated that approximately 2,000 insect species are consumed in at least 113 countries 22. However, the reluctance to eat insects in many high-income countries and the abundance of other protein sources has prevented widespread acceptance. In contrast, insect-based proteins have shown great promise in the animal feed industry. Both black soldier fly and housefly-larvae have been used to replace fish meal and broiler feed, significantly reducing costs while not compromising final product quality 23.

+
+

Notable Companies

+
    +
  1. Next Protein (France/Tunisia)
  2. +
  3. Biobuu (Tanzania)
  4. +
  5. Inseco (Cape Town, South Africa)
  6. +
  7. InsectiPro (Limuru, Kenya)
  8. +
  9. Entocycle (United Kingdom)
    +
  10. +
  11. Ecodudu (Nairobi, Kenya)
  12. +
  13. Ynsect (Paris, France)
  14. +
  15. Protix (Dongen, Netherlands)
  16. +
  17. All Things Bugs (Oklahoma City, Oklahoma, USA)
  18. +
+
+
+

Challenges to be addressed

+
    +
  1. Develop tools for product scale-up,
  2. +
  3. Lower the production costs, and,
  4. +
  5. Change negative consumer attitudes towards insect-based foods 24.
  6. +
+
+
+
+

Fermentation-derived proteins

+

Fermentation involves the transformation of sugars into new products via chemical reactions carried out by microorganisms. This process has been referred to as “humanity’s oldest biotechnological tool” because humans have previously used it to create foods, medicines, and fuels 25. The three main categories of fermentation include traditional, biomass, and precision fermentation 26.

+
    +
  1. Traditional fermentation uses intact live microorganisms and microbial anaerobic digestion to process plant-based foods. This results in a change in the flavor and function of plant-based foods and ingredients.
  2. +
  3. Biomass fermentation uses the microorganisms that reproduce during the fermentation process as ingredients. The microorganisms naturally have high-protein content, and allowing them to reproduce efficiently makes large amounts of protein-rich food.
  4. +
  5. Precision fermentation uses programmed microorganisms as “cellular production factories” to develop proteins, fats, and other nutrients 27.
  6. +
+
+

Notable Companies

+
    +
  1. Essential Impact (East Africa)
  2. +
  3. De Novo Foodlabs (Cape Town, South Africa)
  4. +
  5. MycoTechnology (Aurora, Colorado, USA)
  6. +
  7. Quorn (Stokesley, UK)
  8. +
  9. Perfect Day (Berkeley, California, USA)
  10. +
+
+
+

Challenges to be addressed

+
    +
  1. Identify the correct molecules to manufacture in a cost-effective manner,
  2. +
  3. Develop the appropriate microbial strains for the relevant products,
  4. +
  5. Determine the appropriate feedstocks,
  6. +
  7. Design low-cost bioreactors and systems for scaling-up processes, and,
  8. +
  9. Improve end-product formulation to allow for better taste/texture 28.
  10. +
+
+
+
+

Animal proteins from cultivated meat

+

The cultivated meat industry develops animal proteins that are grown from animal cells directly. Here, tissue-engineering techniques commonly used in regenerative medicine aid in product development 29. Cells obtained from an animal are put into a bioreactor to replicate, and when they reach the optimal density, they are harvested via centrifugation, and the resulting muscle and fat tissue are formed into the recognizable meat structure. The advantages of producing meat in this way include: reduced contamination, decreased antibiotic use, and a lower environmental footprint 30.

+
+

Notable Companies

+
    +
  1. WildBio (formerly Mogale Meat; Pretoria, South Africa)
  2. +
  3. Newform Foods (formerly Mzansi Meat; Cape Town, South Africa)
  4. +
  5. Mosa Meat (Maastricht, Netherlands)
  6. +
  7. Bluu Seafood (Berlin, Germany)
  8. +
  9. Eat Just (Singapore)
  10. +
  11. Clear Meat (Delhi NCR, India)
  12. +
  13. Sea-Stematic (Cape Town, South Africa)
  14. +
+
+
+

Challenges to be addressed

+

Even though many companies have entered the cultivated meat space, not many have received the requisite regulatory approval to sell their products with some countries temporarily halting development 31. Other challenges include:

+
    +
  1. Insufficient bioreactor capacity,
  2. +
  3. High cost of growth media and factors required for cultivation,
  4. +
  5. Lack of products in the market despite large investments 32, and,
  6. +
  7. High final product cost 33 and a major need for consumer education 34.
  8. +
+
+
+
+

What factors will influence the adoption of alternative proteins in low-income countries?

+

The insect-based protein market has the potential to grow faster than the other three alternative protein segments (plant-based, fermentation-based, and cultivated meat) in low-income countries. This is because of fewer barriers to entry and lower setup costs. Therefore, to enhance the adoption of the other alternative protein segments in low-income countries, there is a need to build biomanufacturing capacity, increase R&D funding, and develop a strong workforce by recruiting more students and researchers to the field. Additionally, it would be important for national-level regulations and policies that support the sector to be implemented. On an individual level, several factors will affect the large-scale adoption of alternative proteins in low-income countries 35. These include:

+
    +
  1. Cost (dollars per kilogram of 100% protein), which will need to be similar or lower than that for conventional animal-derived proteins,
  2. +
  3. The protein digestibility-corrected amino acid score (PDCAAS) which is a tool used to measure a protein by its amino acid requirements and the ability of humans to digest it,
  4. +
  5. The economic impact on agricultural workers in the livestock and fishing industry, and,
  6. +
  7. Consumer adoption 36 (which is dependent on perception, taste, texture, safety, and convenience).
  8. +
+
+
+

Conclusion

+

In summary, I hope that I have convinced you that there is an urgent need to address the crisis of hunger and undernourishment worldwide. Second, this essay should have demonstrated to you that optimizing conventional agricultural practices may also simultaneously negatively impact the environment. Third, the reader should now have a basic understanding of alternative proteins and their potential to address undernutrition. Lastly, to tackle the problem of hunger and undernourishment, it is imperative for society to embrace novel alternative protein production technologies that can enhance food production while minimizing the environmental impact and contribution to climate change.

+ + +
+ + +

Footnotes

+ +
    +
  1. Hunger | FAO | Food and Agriculture Organization of the United Nations. https://www.fao.org/hunger/en/.↩︎

  2. +
  3. I am hungry. What does it mean? https://unric.org/en/i-am-hungry-what-does-it-mean/.↩︎

  4. +
  5. Malnutrition. https://www.who.int/health-topics/malnutrition#tab=tab_1.↩︎

  6. +
  7. Deaths from protein-energy malnutrition, by age, World, 1990 to 2019. https://ourworldindata.org/grapher/malnutrition-deaths-by-age.↩︎

  8. +
  9. Protein-Energy Undernutrition (PEU) - Nutritional Disorders - MSD Manual Professional Edition. https://www.msdmanuals.com/professional/nutritional-disorders/undernutrition/protein-energy-undernutrition-peu.↩︎

  10. +
  11. Africa Fertilizer Map 2020 – AF-AP Partnership. https://afap-partnership.org/news/africa-fertilizer-map-2020/.↩︎

  12. +
  13. Impact of Sustainable Agriculture and Farming Practices. https://www.worldwildlife.org/industries/sustainable-agriculture/.↩︎

  14. +
  15. How Industrialized Meat Production Causes Land Degradation. https://populationeducation.org/industrialized-meat-production-and-land-degradation-3-reasons-to-shift-to-a-plant-based-diet/.↩︎

  16. +
  17. Feltran-Barbieri, R. & Féres, J. G. Degraded pastures in Brazil: improving livestock production and forest restoration. R Soc Open Sci 8, (2021).↩︎

  18. +
  19. Moving Towards Sustainability: The Livestock Sector and the World Bank. https://www.worldbank.org/en/topic/agriculture/brief/moving-towards-sustainability-the-livestock-sector-and-the-world-bank.↩︎

  20. +
  21. Pastoral conflict in Kenya – ACCORD. https://www.accord.org.za/ajcr-issues/pastoral-conflict-in-kenya/.↩︎

  22. +
  23. Antimicrobial resistance and agriculture - OECD. https://www.oecd.org/agriculture/topics/antimicrobial-resistance-and-agriculture/.↩︎

  24. +
  25. Extreme Weather | Facts – Climate Change: Vital Signs of the Planet. https://climate.nasa.gov/extreme-weather/.↩︎

  26. +
  27. Alternative proteins. https://sustainablecampus.unimelb.edu.au/sustainable-research/case-studies/alternative-proteins.↩︎

  28. +
  29. Protein. https://www.genome.gov/genetics-glossary/Protein.↩︎

  30. +
  31. The market for alternative protein: Pea protein, cultured meat, and more | McKinsey. https://www.mckinsey.com/industries/agriculture/our-insights/alternative-proteins-the-race-for-market-share-is-on.↩︎

  32. +
  33. Defining alternative protein | GFI. https://gfi.org/defining-alternative-protein/.↩︎

  34. +
  35. Chandran, A. S., Suri, S. & Choudhary, P. Sustainable plant protein: an up-to-date overview of sources, extraction techniques and utilization. Sustainable Food Technology 1, 466–483 (2023).↩︎

  36. +
  37. Plant-based protein processing | Alfa Laval. https://www.alfalaval.com/industries/food-dairy-beverage/food-processing/protein-processing/plant-based-protein-processing/.↩︎

  38. +
  39. The science of plant-based meat | GFI APAC. https://gfi-apac.org/science/the-science-of-plant-based-meat/.↩︎

  40. +
  41. How Insect Protein can Revolutionize the Food Industry. https://mindthegraph.com/blog/insect-protein/.↩︎

  42. +
  43. Yen, A. L. Edible insects: Traditional knowledge or western phobia? Entomol Res 39, 289–298 (2009).↩︎

  44. +
  45. Kim, T. K., Yong, H. I., Kim, Y. B., Kim, H. W. & Choi, Y. S. Edible Insects as a Protein Source: A Review of Public Perception, Processing Technology, and Research Trends. Food Sci Anim Resour 39, 521 (2019).↩︎

  46. +
  47. The Growing Animal Feed Insect Protein Market. https://nutrinews.com/en/the-growing-animal-feed-insect-protein-market-opportunities-and-challenges/.↩︎

  48. +
  49. Taveira, I. C., Nogueira, K. M. V., Oliveira, D. L. G. de & Silva, R. do N. Fermentation: Humanity’s Oldest Biotechnological Tool. Front Young Minds 9, (2021).↩︎

  50. +
  51. Fermentation for alternative proteins 101 | Resource guide | GFI. https://gfi.org/fermentation/.↩︎

  52. +
  53. Fermentation for alternative proteins 101 | Resource guide | GFI. https://gfi.org/fermentation/.↩︎

  54. +
  55. The science of fermentation (2023) | GFI. https://gfi.org/science/the-science-of-fermentation/.↩︎

  56. +
  57. What is cultivated meat? | McKinsey. https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-cultivated-meat.↩︎

  58. +
  59. The science of cultivated meat | GFI. https://gfi.org/science/the-science-of-cultivated-meat/.↩︎

  60. +
  61. Che Sorpresa! Italy U-Turns on Cultivated Meat Ban – For Now. https://www.greenqueen.com.hk/italy-cultivated-meat-ban-lab-grown-food-cultured-protein-eu-tris-notification-francesco-lollobrigida/.↩︎

  62. +
  63. Is overhype dooming the cultivated meat industry? https://www.fastcompany.com/90966338/hype-built-the-cultivated-meat-industry-now-it-could-end-it.↩︎

  64. +
  65. Meat substitutes need to get a lot cheaper. https://www.sustainabilitybynumbers.com/p/meat-substitutes-price.↩︎

  66. +
  67. What is cultivated meat? | McKinsey. https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-cultivated-meat.↩︎

  68. +
  69. The market for alternative protein: Pea protein, cultured meat, and more | McKinsey. https://www.mckinsey.com/industries/agriculture/our-insights/alternative-proteins-the-race-for-market-share-is-on.↩︎

  70. +
  71. Meat substitutes need to get a lot cheaper. https://www.sustainabilitybynumbers.com/p/meat-substitutes-price.↩︎

  72. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/posts/series_3/new_post_1/images/africa_fish.png b/_site/posts/biotech/alt_protein/images/africa_fish.png similarity index 100% rename from posts/series_3/new_post_1/images/africa_fish.png rename to _site/posts/biotech/alt_protein/images/africa_fish.png diff --git a/posts/series_3/new_post_1/images/bioreactor.jpeg b/_site/posts/biotech/alt_protein/images/bioreactor.jpeg similarity index 100% rename from posts/series_3/new_post_1/images/bioreactor.jpeg rename to _site/posts/biotech/alt_protein/images/bioreactor.jpeg diff --git a/posts/series_3/new_post_1/biotech_cover.png b/_site/posts/biotech/alt_protein/images/biotech_cover.png similarity index 100% rename from posts/series_3/new_post_1/biotech_cover.png rename to _site/posts/biotech/alt_protein/images/biotech_cover.png diff --git a/posts/series_3/new_post_1/images/continent_meat_1.png b/_site/posts/biotech/alt_protein/images/continent_meat_1.png similarity index 100% rename from posts/series_3/new_post_1/images/continent_meat_1.png rename to _site/posts/biotech/alt_protein/images/continent_meat_1.png diff --git a/posts/series_3/new_post_1/images/undernourished_income.png b/_site/posts/biotech/alt_protein/images/undernourished_income.png similarity index 100% rename from posts/series_3/new_post_1/images/undernourished_income.png rename to _site/posts/biotech/alt_protein/images/undernourished_income.png diff --git a/posts/series_3/new_post_1/images/undernourished_region.png b/_site/posts/biotech/alt_protein/images/undernourished_region.png similarity index 100% rename from posts/series_3/new_post_1/images/undernourished_region.png rename to _site/posts/biotech/alt_protein/images/undernourished_region.png diff --git a/_site/posts/biotech/alt_protein/post_1.html b/_site/posts/biotech/alt_protein/post_1.html new file mode 100644 index 0000000..7bf78e3 --- /dev/null +++ b/_site/posts/biotech/alt_protein/post_1.html @@ -0,0 +1,741 @@ + + + + + + + + + + + +William Okech - Biotechnology for the Global South + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

Biotechnology for the Global South

+

Leveraging biotechnological innovations to reduce childhood hunger and improve food production in low-income countries

+
+
RStudio
+
R
+
Biotech for the Global South
+
Blog
+
Data Visualization
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

October 27, 2023

+
+
+ + +
+ + +
+ + + + +
+ + + + +

{css, echo = FALSE} .justify { text-align: justify !important }

+
+
+

Futuristic Bioreactor

+
Bioreactor image created using “Bing Image Creator” with the prompt keywords “bioreactor, computer screen, bubbling liquids, cabinet, with a dark purple, sky-blue, neon lights, carpetpunk, backlit photography, and trillwave theme.”
+
+
+
+

Key Points

+
    +
  • Hunger is a significant problem affecting close to one-tenth (1/10) of the world’s population.
  • +
  • Low-income countries, particularly in Africa, bear the brunt of the global hunger crisis.
  • +
  • Alternative proteins have the potential to transform the way humans produce proteins.
  • +
  • Increased consumer education, reduced production costs, and efficient biotechnological protocols are key to the widespread adoption of alternative proteins.
  • +
+
+
+

Introduction

+

In the year 2022, between 691 and 783 million people were hungry (or undernourished)1. Hunger results from an “insufficient consumption of dietary energy,” and this staggering figure indicated that almost one-tenth (1/10) of the world’s population (in 2022) did not consume enough calories to maintain a healthy and active lifestyle. In children, the two main signs of undernutrition are stunting (low height-for-age) and wasting (low weight-for-height), which can result in physical and/or mental disability2. Moreover, in 2019, protein-energy undernutrition (an energy deficit resulting from deficiencies of many macronutrients, primarily proteins) which manifests in 3 main forms, namely marasmus (wasting), kwashiorkor (edema), and marasmic kwashiorkor (both edema and wasting), contributed to the deaths of 200,000 people worldwide (mainly under-5s and 70+ year olds), with approximately half from Africa3 4. Despite the continual increases in global agricultural productivity (resulting from the expansion of agricultural land area, better yielding crops, and improved animal production methods) that have been witnessed over the past century, it is still discouraging that a large proportion of the human population faces the dual challenges of hunger and food insecurity. This is particularly true for low-income countries that have the highest share of their populations affected by undernourishment (Figure 1).

+
+
+

+
Figure 1: Share of the population that is undernourished (grouped by income)
+
+
+

Additionally, we observe that the two regions that had the highest share of the population that was undernourished were Sub-Saharan Africa (20.9%) and South Asia (15.9%) (Figure 2).

+
+
+

+
Figure 2: Share of the population that is undernourished (grouped by region)
+
+
+

It is widely acknowledged that interventions such as irrigation, fertilizer use, insect and pest control, gene-editing, and improved seed can boost crop yields and agricultural productivity. However, on the continent of Africa (which has the greatest burden of undernourishment and lowest crop yields), only 6% of cultivated land was irrigated and the rate of fertilizer application was approximately 17 kg/hectare which was way below the world average of 135 kg/hectare in 20185. Additionally, improved livestock rearing methods (such as selective breeding and improved nutrition), and fish and seafood harvesting techniques (such as selective harvesting and aquaculture systems) have increased global production significantly. Unfortunately, we observe again that the continent of Africa has seen very little improvement in its livestock, fish, and seafood production capacity, which leaves its fast-growing population susceptible to nutrient deficiencies if not addressed in a timely manner (Figure 3).

+
+
+
+
+
+
+

+
(a) Global Meat Production
+
+
+
+
+
+
+

+
(b) Global Fish Production
+
+
+
+
+

Figure 3: Global Meat and Fish Production

+
+
+

One of the methods used to boost food production and subsequently reduce undernourishment has been to develop technologies that can directly improve agricultural productivity and crop yields. However, we now know that many of these interventions have been detrimental to the climate and local environment. With regard to crop production, some of these effects have included:

+
    +
  1. The poisoning of fresh water and marine ecosystems by chemical runoff,
  2. +
  3. Excess water consumption and depletion of fresh water sources, and,
  4. +
  5. Deforestation, land degradation, and ecological destruction6.
  6. +
+

Moreover, we know that the livestock sector has contributed to a number of significant challenges facing the world today. These include:

+
    +
  1. Contributing up to 15% of the human-induced greenhouse gas (GHG) emissions7,
  2. +
  3. Overgrazing, soil erosion, and deforestation8 9,
  4. +
  5. Conflict between pastoralists over finite resources like grazing land/water, and theft10, and,
  6. +
  7. The prevalence of antimicrobial resistance resulting from misuse/overuse11.
  8. +
+

Overall, these observations indicate to us that simply enhancing agricultural productivity and crop yields to reduce undernourishment may not be the panacea we envision and may cause more long-term harm than good. Moreover, with an increasing number of extreme weather events (such as heat waves, floods, and droughts) resulting from climate change12, international conflicts that influence global crop and food prices, and rapidly changing world populations, there is an increasing need to develop and adopt bespoke biotechnological solutions that will address hunger. In this essay, I will not address the numerous challenges that affect commodity markets and agriculture/food production systems worldwide, but I will examine new and innovative biotechnological solutions that can significantly improve access to dietary energy and nutrients. Specifically, the focus will be on how innovations in the development of alternative proteins can be an effective, low-cost tool for generating large amounts of nutrients for both human and animal consumption.

+
+
+

What are alternative proteins?

+

Alternative proteins are plant-based and food technology alternatives to animal-based proteins13. Proteins are large, complex molecules made up of smaller units called amino acids that are a key component of quality nutrition and promote normal growth and maintenance14. A significant advantage of these alternative proteins is the fewer inputs required (such as land and water) and the reduced greenhouse gas emissions and environmental pollution associated with their production. Major sources of alternative proteins include plant proteins, insects, cultured meat, and mycoproteins (via fermentation)15. Generally, plant, insect, and fermentation-derived proteins are commercially available. However, cultivated meats are still in the research and development phase16.

+
+
+

Plant proteins

+

Plant proteins are harnessed directly from protein-rich seeds, and the main sources include leguminous (such as soy and pea), cereal (such as wheat and corn), and oilseed (such as peanut and flaxseed) proteins17. The three major steps include protein extraction (centrifugation), protein purification (precipitation and ultrafiltration), and heat treatment (pasteurization)18. Using these isolated proteins, specific products such as plant-based meats can be developed. To create these meats, the proteins are mixed with fibers and fats, then structured using heat and agitation, and lastly color, flavor, and aroma components are added to make the product more palatable.

+
+

Companies and their products (notable examples)

+
    +
  1. Beyond Meat (Los Angeles, California, USA) – sells burgers that contain five main ingredients: pea protein isolate, canola and coconut oils, trace minerals, carbohydrates, and water, which is included in beet juice to simulate the red color of beef.
  2. +
  3. Impossible Foods (Redwood City, California, USA) – sells burgers containing soy protein concentrate, soy protein isolate, and potato protein, as well as coconut and sunflower oils, natural flavors, and several hydrocolloids, minerals, and vitamins. Additionally, it also has soy leghemoglobin, a heme-containing protein from the roots of soy plants.
  4. +
  5. New Wave Foods (Stamford, Connecticut, USA) – produces a plant-based shrimp alternative containing seaweed, soy protein, and natural flavors and other flavor and aroma components to enhance taste19.
  6. +
+
+
+

Challenges to be addressed

+
    +
  1. Develop crops optimized for plant-based meat that produce higher quantities of high-quality protein,
  2. +
  3. Improve protein extraction and processing methods,
  4. +
  5. Confirm that the taste, texture, and nutritional value are similar to conventional meats, and,
  6. +
  7. Ensure that the cost of production is competitive and the process is energy-efficient20.
  8. +
+
+
+
+

Insect proteins

+

Proteins derived from insects are referred to as insect proteins. Insects are rich in essential nutrients such as amino acids, vitamins, and minerals. Insect-derived proteins have a dual role, as they can be eaten directly by humans or used as animal feed. A significant advantage of insect-derived proteins is the low environmental footprint, low cost of production, and absence of disease-causing pathogens (post-processing)21. Very many people groups across the world have traditionally consumed insects, and it is estimated that approximately 2,000 insect species are consumed in at least 113 countries22; however, the reluctance to eat insects in many high-income countries and the abundance of other protein sources has limited widespread adoption. In contrast, insect-based proteins have great potential for use in the animal feed industry. Both black soldier fly and housefly-larvae have been used to replace fish meal and broiler feed, significantly reducing costs while not compromising final product quality23.

+
+

Companies and their products (notable examples)

+
    +
  1. Ynsect (Paris, France) – produces animal, human, and plant foods made from mealworm beetles,
  2. +
  3. Protix (Dongen, Netherlands) – produces insect ingredients (proteins and other nutrients) and animal feed from black soldier flies,
  4. +
  5. Hey Planet (Copenhagen, Denmark) – creates food products out of buffalo beetles and crickets, and,
  6. +
  7. All Things Bugs (Oklahoma City, Oklahoma, USA) – develops Griopro® Cricket Powder that is used in a wide variety of foods and drinks24.
  8. +
+
+
+

Challenges to be addressed

+
    +
  1. Develop tools for product scale-up,
  2. +
  3. Lower the production costs, and,
  4. +
  5. Change negative consumer attitudes towards insect-based foods25.
  6. +
+
+
+
+

Fermentation-derived proteins

+

Fermentation involves the transformation of sugars into new products via chemical reactions carried out by microorganisms. This process has been referred to as “humanity’s oldest biotechnological tool” because humans have previously used it to create foods, medicines, and fuels26. The three main categories of fermentation include traditional, biomass, and precision fermentation27.

+
    +
  1. Traditional fermentation uses intact live microorganisms and microbial anaerobic digestion to process plant-based foods. This results in a change in the flavor and function of plant-based foods and ingredients.
  2. +
  3. Biomass fermentation uses the microorganisms that reproduce via this process as ingredients. The microorganisms naturally have high-protein content, and allowing them to reproduce efficiently makes large amounts of protein-rich food.
  4. +
  5. Precision fermentation uses programmed microorganisms as “cellular production factories” to develop proteins, fats, and other nutrients28.
  6. +
+
+

Companies and their products (notable examples)

+
    +
  1. MycoTechnology (Aurora, Colorado, USA) – ferments plant-based proteins to enhance flavor.
  2. +
  3. Quorn (Stokesley, UK) – a meat-free super-protein produced from unprocessed microbial biomass.
  4. +
  5. Perfect Day (Berkeley, California, USA) – creates biosynthetic dairy proteins by fermentation using fungi in bioreactors.
  6. +
+
+
+

Challenges to be addressed

+
    +
  1. Identify the correct molecules to manufacture in a cost-effective manner,
  2. +
  3. Develop the appropriate microbial strains for the relevant products,
  4. +
  5. Determine the appropriate feedstocks,
  6. +
  7. Design low-cost bioreactors and systems for scaling-up processes, and,
  8. +
  9. Improve end-product formulation to allow for better taste/texture29.
  10. +
+
+
+
+

Animal proteins from cultivated meat

+

The cultivated meat industry develops animal proteins that are grown from animal cells directly. Here, tissue-engineering techniques commonly used in regenerative medicine aid in product development30. Cells obtained from an animal are put into a bioreactor to replicate, and when they reach the optimal density, they are harvested via centrifugation, and the resulting muscle and fat tissue are formed into the recognizable meat structure. The advantages of producing meat in this way include: reduced contamination, decreased antibiotic use, and a lower environmental footprint31.

+
+

Companies and their products (notable examples)

+
    +
  1. Mosa Meat (Maastricht, Netherlands) which is developing beef products
  2. +
  3. Bluu Seafood (Berlin, Germany) which is developing “lab-grown” seafood
  4. +
  5. Eat Just (Singapore) which sells cultured chicken32.
  6. +
+
+
+

Challenges to be addressed

+

Even though many companies have entered the cultivated meat space, not many have received the requisite regulatory approval to sell their products with some countries temporarily halting development33. Other challenges include:

+
    +
  1. A lack of sufficient bioreactor capacity,
  2. +
  3. The high cost of growth media and factors,
  4. +
  5. Delays in releasing products resulting in accusations of hype 34,
  6. +
  7. High product cost35 and the need for consumer awareness36.
  8. +
+
+
+
+

What factors will influence the adoption of alternative proteins in low-income countries?

+

From the examples of companies provided above, it is clear that most research and product development in the field of alternative proteins is concentrated in high-income countries in the Global North. This, despite the fact that, the urgent need for alternative proteins is heavily concentrated in low-income countries in the Global South. I believe that the first steps to address this disparity would be to build biomanufacturing capacity, increase R&D funding, and recruit more students and researchers to the field. Additionally, it would be very important for regulations and policies that support the sector to be implemented. On an individual level, a number of factors will affect the large-scale adoption of alternative proteins in low-income countries37. These include:

+
    +
  1. Cost (dollars per kilogram of 100% protein) and economic viability for low-income earners, which will need to be lower than conventional proteins,
  2. +
  3. The protein digestibility-corrected amino acid score (PDCAAS) which is a tool used to measure a protein by its amino acid requirements and the ability of humans to digest it,
  4. +
  5. The environmental impact and effect on the livelihoods of farmers, and,
  6. +
  7. Consumer acceptance (which is dependent on perception, taste, texture, safety, and convenience).
  8. +
+
+
+

Conclusion

+

In summary, I hope that I have convinced you that there is an urgent need to address hunger and undernourishment worldwide. Second, this essay should have demonstrated to you that optimizing conventional agricultural practices will not effectively yield the required results. Third, the reader should now have a basic understanding of alternative proteins and their potential to address undernutrition. Lastly, to tackle the problem of hunger and undernourishment, it is imperative for society to embrace novel biotechnologies that can enhance food production while minimizing environmental degradation and not contributing negatively to climate change.

+ + +
+ + +

Footnotes

+ +
    +
  1. Hunger | FAO | Food and Agriculture Organization of the United Nations. https://www.fao.org/hunger/en/.↩︎

  2. +
  3. Malnutrition. https://www.who.int/health-topics/malnutrition#tab=tab_1.↩︎

  4. +
  5. Deaths from protein-energy malnutrition, by age, World, 1990 to 2019. https://ourworldindata.org/grapher/malnutrition-deaths-by-age.↩︎

  6. +
  7. Protein-Energy Undernutrition (PEU) - Nutritional Disorders - MSD Manual Professional Edition. https://www.msdmanuals.com/professional/nutritional-disorders/undernutrition/protein-energy-undernutrition-peu.↩︎

  8. +
  9. Africa Fertilizer Map 2020 – AF-AP Partnership. https://afap-partnership.org/news/africa-fertilizer-map-2020/.↩︎

  10. +
  11. Impact of Sustainable Agriculture and Farming Practices. https://www.worldwildlife.org/industries/sustainable-agriculture/.↩︎

  12. +
  13. Moving Towards Sustainability: The Livestock Sector and the World Bank. https://www.worldbank.org/en/topic/agriculture/brief/moving-towards-sustainability-the-livestock-sector-and-the-world-bank.↩︎

  14. +
  15. How Industrialized Meat Production Causes Land Degradation. https://populationeducation.org/industrialized-meat-production-and-land-degradation-3-reasons-to-shift-to-a-plant-based-diet/.↩︎

  16. +
  17. Feltran-Barbieri, R. & Féres, J. G. Degraded pastures in Brazil: improving livestock production and forest restoration. R Soc Open Sci 8, (2021).↩︎

  18. +
  19. Pastoral conflict in Kenya – ACCORD. https://www.accord.org.za/ajcr-issues/pastoral-conflict-in-kenya/.↩︎

  20. +
  21. Antimicrobial resistance and agriculture - OECD. https://www.oecd.org/agriculture/topics/antimicrobial-resistance-and-agriculture/.↩︎

  22. +
  23. Extreme Weather | Facts – Climate Change: Vital Signs of the Planet. https://climate.nasa.gov/extreme-weather/.↩︎

  24. +
  25. Alternative proteins. https://sustainablecampus.unimelb.edu.au/sustainable-research/case-studies/alternative-proteins.↩︎

  26. +
  27. Protein. https://www.genome.gov/genetics-glossary/Protein.↩︎

  28. +
  29. The market for alternative protein: Pea protein, cultured meat, and more | McKinsey. https://www.mckinsey.com/industries/agriculture/our-insights/alternative-proteins-the-race-for-market-share-is-on.↩︎

  30. +
  31. Defining alternative protein | GFI. https://gfi.org/defining-alternative-protein/.↩︎

  32. +
  33. Chandran, A. S., Suri, S. & Choudhary, P. Sustainable plant protein: an up-to-date overview of sources, extraction techniques and utilization. Sustainable Food Technology 1, 466–483 (2023).↩︎

  34. +
  35. Plant-based protein processing | Alfa Laval. https://www.alfalaval.com/industries/food-dairy-beverage/food-processing/protein-processing/plant-based-protein-processing/.↩︎

  36. +
  37. How Plant-Based Meat and Seafood Are Processed - IFT.org. https://www.ift.org/news-and-publications/food-technology-magazine/issues/2019/october/columns/processing-how-plant-based-meat-and-seafood-are-processed.↩︎

  38. +
  39. The science of plant-based meat | GFI APAC. https://gfi-apac.org/science/the-science-of-plant-based-meat/.↩︎

  40. +
  41. How Insect Protein can Revolutionize the Food Industry. https://mindthegraph.com/blog/insect-protein/.↩︎

  42. +
  43. Yen, A. L. Edible insects: Traditional knowledge or western phobia? Entomol Res 39, 289–298 (2009).↩︎

  44. +
  45. Kim, T. K., Yong, H. I., Kim, Y. B., Kim, H. W. & Choi, Y. S. Edible Insects as a Protein Source: A Review of Public Perception, Processing Technology, and Research Trends. Food Sci Anim Resour 39, 521 (2019).↩︎

  46. +
  47. 7 Insect-Based Food Startups on the Rise in 2023 | Moneywise. https://moneywise.com/investing/insect-based-food-startups.↩︎

  48. +
  49. The Growing Animal Feed Insect Protein Market. https://nutrinews.com/en/the-growing-animal-feed-insect-protein-market-opportunities-and-challenges/.↩︎

  50. +
  51. Taveira, I. C., Nogueira, K. M. V., Oliveira, D. L. G. de & Silva, R. do N. Fermentation: Humanity’s Oldest Biotechnological Tool. Front Young Minds 9, (2021).↩︎

  52. +
  53. Fermentation for alternative proteins 101 | Resource guide | GFI. https://gfi.org/fermentation/.↩︎

  54. +
  55. Fermentation for alternative proteins 101 | Resource guide | GFI. https://gfi.org/fermentation/.↩︎

  56. +
  57. The science of fermentation (2023) | GFI. https://gfi.org/science/the-science-of-fermentation/.↩︎

  58. +
  59. What is cultivated meat? | McKinsey. https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-cultivated-meat.↩︎

  60. +
  61. The science of cultivated meat | GFI. https://gfi.org/science/the-science-of-cultivated-meat/.↩︎

  62. +
  63. Will Europe Follow Singapore in Approving Cultured Meat? https://www.labiotech.eu/trends-news/cultured-meat-eat-just/.↩︎

  64. +
  65. Che Sorpresa! Italy U-Turns on Cultivated Meat Ban – For Now. https://www.greenqueen.com.hk/italy-cultivated-meat-ban-lab-grown-food-cultured-protein-eu-tris-notification-francesco-lollobrigida/.↩︎

  66. +
  67. Is overhype dooming the cultivated meat industry? https://www.fastcompany.com/90966338/hype-built-the-cultivated-meat-industry-now-it-could-end-it.↩︎

  68. +
  69. Meat substitutes need to get a lot cheaper. https://www.sustainabilitybynumbers.com/p/meat-substitutes-price.↩︎

  70. +
  71. What is cultivated meat? | McKinsey. https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-cultivated-meat.↩︎

  72. +
  73. The market for alternative protein: Pea protein, cultured meat, and more | McKinsey. https://www.mckinsey.com/industries/agriculture/our-insights/alternative-proteins-the-race-for-market-share-is-on.↩︎

  74. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/biotech/new_post_1/images/Bioreactor.jpeg b/_site/posts/biotech/new_post_1/images/Bioreactor.jpeg new file mode 100644 index 0000000..663b7cf Binary files /dev/null and b/_site/posts/biotech/new_post_1/images/Bioreactor.jpeg differ diff --git a/_site/posts/biotech/new_post_1/images/africa_fish.png b/_site/posts/biotech/new_post_1/images/africa_fish.png new file mode 100644 index 0000000..7fc1e59 Binary files /dev/null and b/_site/posts/biotech/new_post_1/images/africa_fish.png differ diff --git a/_site/posts/biotech/new_post_1/images/biotech_cover.png b/_site/posts/biotech/new_post_1/images/biotech_cover.png new file mode 100644 index 0000000..4f97857 Binary files /dev/null and b/_site/posts/biotech/new_post_1/images/biotech_cover.png differ diff --git a/_site/posts/biotech/new_post_1/images/continent_meat_1.png b/_site/posts/biotech/new_post_1/images/continent_meat_1.png new file mode 100644 index 0000000..73e2de8 Binary files /dev/null and b/_site/posts/biotech/new_post_1/images/continent_meat_1.png differ diff --git a/_site/posts/biotech/new_post_1/images/undernourished_income.png b/_site/posts/biotech/new_post_1/images/undernourished_income.png new file mode 100644 index 0000000..ec2abb4 Binary files /dev/null and b/_site/posts/biotech/new_post_1/images/undernourished_income.png differ diff --git a/_site/posts/biotech/new_post_1/images/undernourished_region.png b/_site/posts/biotech/new_post_1/images/undernourished_region.png new file mode 100644 index 0000000..b45f843 Binary files /dev/null and b/_site/posts/biotech/new_post_1/images/undernourished_region.png differ diff --git a/_site/posts/biotech/new_post_1/post_1.html b/_site/posts/biotech/new_post_1/post_1.html new file mode 100644 index 0000000..e6e1b09 --- /dev/null +++ b/_site/posts/biotech/new_post_1/post_1.html @@ -0,0 +1,755 @@ + + + + + + + + + + + +William Okech - Biotechnology for the Global South + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

Biotechnology for the Global South

+

Developing alternative proteins to reduce childhood hunger and improve food production in low-income countries

+
+
RStudio
+
R
+
Biotechnology
+
Blog
+
Data Visualization
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

October 27, 2023

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+
+

Futuristic Bioreactor

+
Bioreactor image created using “Bing Image Creator” with the prompt keywords “bioreactor, computer screen, bubbling liquids, cabinet, with a dark purple, sky-blue, neon lights, carpetpunk, backlit photography, and trillwave theme.”
+
+
+
+

Key Points

+
    +
  • Hunger is a significant problem that affects close to one-tenth (1/10) of the world’s population.
  • +
  • Low-income countries, particularly in Africa, bear the brunt of the global childhood hunger crisis.
  • +
  • Alternative protein technologies have the potential to transform the way humans produce and consume proteins.
  • +
  • Increased consumer education, reduced production costs, and innovative technologies are key to the widespread acceptance and consumption of alternative proteins.
  • +
+
+
+

Introduction

+

In the year 2022, between 691 and 783 million people were hungry 1. This alarming figure indicates that almost one-tenth (1/10) of the world’s population (in 2022) did not consume enough calories to maintain a healthy and active lifestyle. Hunger (also referred to as undernutrition) is “an uncomfortable or painful physical sensation caused by insufficient consumption of dietary energy.” 2 The United Nations Children’s Fund (UNICEF) estimates that for children under 5, 148.1 million suffered from stunting (low height-for-age) and 45 million from wasting (low weight-for-height), which placed them at increased risk for physical and/or mental disabilities 3. Moreover, in 2019, protein-energy undernutrition (an energy deficit resulting from the deficiencies of many macronutrients, primarily proteins) contributed to the deaths of 100,000 children (between 0–14 years), with approximately two-thirds from Africa 4 5. That a large proportion of the human population still faces the dual challenges of hunger and food insecurity is very disheartening. This is despite the continual increases in global agricultural productivity (resulting from the expansion of agricultural land area, better yielding crops, and improved animal production methods) that have been witnessed over the past century.

+

The problem of hunger/undernourishment affects the world in an income-/region-dependent manner. Between the year 2000 and 2020, low-income countries had the highest share of their populations (25%–35%) that were undernourished, which was significantly above the world average (5%–15%) (Figure 1).

+
+
+

+
Figure 1: Share of the population that is undernourished (grouped by income)
+
+
+

Additionally, we observe that the two regions that had the highest share of the population that was undernourished were Sub-Saharan Africa (20.9%) and South Asia (15.9%) (Figure 2).

+
+
+

+
Figure 2: Share of the population that is undernourished (grouped by region)
+
+
+

To reduce the share of the population that is undernourished, various countries and development institutions have introduced interventions and technologies that can boost crop yields and enhance livestock production. These interventions include irrigation, fertilizers, improved seed, better insect and pest control strategies, and gene-editing. However, in Sub-Saharan Africa (which has the greatest burden of undernourishment and lowest crop yields), only 6% of cultivated land was irrigated and the rate of fertilizer application was approximately 17 kg/hectare, in 2018, which was significantly below the world average of 135 kg/hectare 6. Additionally, improved livestock rearing methods (such as selective breeding and improved nutrition), and fish and seafood harvesting techniques (such as selective harvesting and aquaculture systems) have increased global production. Sadly, we note that the continent of Africa has seen very little improvement in its livestock, fish, and seafood production capacity (Figure 3), and this may leave its fast-growing population susceptible to protein deficiencies if not addressed in a timely manner.

+
+
+
+
+
+
+

+
(a) Global Meat Production
+
+
+
+
+
+
+

+
(b) Global Fish Production
+
+
+
+
+

Figure 3: Global Meat and Fish Production

+
+
+

It is widely believed that promoting technologies that can improve crop yields and enhance livestock production in low-income countries is the best method to boost food production and subsequently reduce undernourishment. However, many of these interventions have been detrimental to the climate and local environment. With regard to enhancing crop production, some of these effects include:

+
    +
  1. Land degradation and deforestation 7,
  2. +
  3. Poisoning of fresh water/marine ecosystems by chemical runoff, and,
  4. +
  5. Depletion of fresh water sources resulting from overconsumption.
  6. +
+

Moreover, boosting output in the livestock sector has contributed to a number of major environmental challenges and resource conflicts. These include:

+
    +
  1. Overgrazing, soil erosion, and deforestation 8,9,
  2. +
  3. Contributing up to 15% of human-induced greenhouse gas (GHG) emissions 10,
  4. +
  5. Conflict between pastoralists/farmers over grazing land/water 11, and
  6. +
  7. Increased prevalence of antimicrobial resistance in livestock resulting from antibiotic misuse/overuse 12.
  8. +
+

Overall, these findings suggest that enhancing livestock production and boosting crop yields to reduce undernourishment may not be the panacea we envision and may cause more long-term harm than good. Additionally, with more extreme weather events (such as heat waves, floods, and droughts) resulting from climate change 13, volatility in global crop and food prices, and rapidly increasing world populations, there is a major need to develop and adopt alternative protein sources that can reduce childhood hunger and increase food production in low-income countries. In this essay, I will examine new and innovative alternative protein production technologies that can aid in generating foods with sufficient dietary energy and nutrients to meet human consumption requirements while reducing dependence on animal-based proteins.

+
+
+

What are alternative proteins?

+

Alternative proteins are plant-based and food technology alternatives to animal-based proteins 14. Proteins are large, complex molecules made up of smaller units called amino acids, and they are a key component of quality nutrition that promotes normal growth and maintenance 15. A significant advantage of alternative protein production is the reduced impact on the environment resulting from a decreased dependence on livestock-based protein production. This reduced impact is seen in the decreased greenhouse gas emissions and environmental pollution as well as the decline in the amounts of land and water required for livestock. Major sources of alternative proteins include plant proteins, insects, cultured meat, and fermentation-derived proteins 16. Generally, plant, insect, and fermentation-derived proteins are commercially available, while cultivated meats are still in the research and development phase 17.

+
+
+

Plant proteins

+

Plant proteins are harnessed directly from protein-rich seeds, and the main sources include leguminous (such as soy and pea), cereal (such as wheat and corn), and oilseed (such as peanut and flaxseed) proteins 18. The three major processing steps include protein extraction (centrifugation), protein purification (precipitation and ultrafiltration), and heat treatment (pasteurization) 19. Using the isolated proteins, specific products such as plant-based meats can be developed. To create these meats, the proteins are mixed with fibers and fats, then structured using heat and agitation, and lastly color, flavor, and aroma components are added to make the product more palatable.

+
+

Notable Companies

+
    +
  1. Fry Family Food (South Africa)
  2. +
  3. Moolec Science (Luxembourg)
  4. +
  5. Beyond Meat (Los Angeles, California, USA)
  6. +
  7. Impossible Foods (Redwood City, California, USA)
  8. +
  9. New Wave Foods (Stamford, Connecticut, USA)
  10. +
  11. Eat Just (Alameda, California, USA)
  12. +
+
+
+

Challenges to be addressed

+
    +
  1. Develop crops optimized for plant-based meat that produce higher quantities of high-quality protein,
  2. +
  3. Improve protein extraction and processing methods,
  4. +
  5. Confirm that the taste, texture, and nutritional value are similar to conventional meats, and,
  6. +
  7. Ensure that the cost of production is competitive, and the process is energy-efficient 20.
  8. +
+
+
+
+

Insect proteins

+

Proteins derived from insects are referred to as insect proteins. Insects are rich in essential nutrients such as amino acids, vitamins, and minerals. Insect-derived proteins have a dual role, as they can be eaten directly by humans or used as animal feed. A significant advantage of insect-derived proteins is their negligible environmental footprint, low cost of production, and absence of disease-causing pathogens (post-processing) 21. Numerous people groups across the world have traditionally consumed insects, and it is estimated that approximately 2,000 insect species are consumed in at least 113 countries 22. However, the reluctance to eat insects in many high-income countries and the abundance of other protein sources has prevented widespread acceptance. In contrast, insect-based proteins have shown great promise in the animal feed industry. Both black soldier fly and housefly-larvae have been used to replace fish meal and broiler feed, significantly reducing costs while not compromising final product quality 23.

+
+

Notable Companies

+
    +
  1. Next Protein (France/Tunisia)
  2. +
  3. Biobuu (Tanzania)
  4. +
  5. Inseco (Cape Town, South Africa)
  6. +
  7. InsectiPro (Limuru, Kenya)
  8. +
  9. Entocycle (United Kingdom)
    +
  10. +
  11. Ecodudu (Nairobi, Kenya)
  12. +
  13. Ynsect (Paris, France)
  14. +
  15. Protix (Dongen, Netherlands)
  16. +
  17. All Things Bugs (Oklahoma City, Oklahoma, USA)
  18. +
+
+
+

Challenges to be addressed

+
    +
  1. Develop tools for product scale-up,
  2. +
  3. Lower the production costs, and,
  4. +
  5. Change negative consumer attitudes towards insect-based foods 24.
  6. +
+
+
+
+

Fermentation-derived proteins

+

Fermentation involves the transformation of sugars into new products via chemical reactions carried out by microorganisms. This process has been referred to as “humanity’s oldest biotechnological tool” because humans have previously used it to create foods, medicines, and fuels 25. The three main categories of fermentation include traditional, biomass, and precision fermentation 26.

+
    +
  1. Traditional fermentation uses intact live microorganisms and microbial anaerobic digestion to process plant-based foods. This results in a change in the flavor and function of plant-based foods and ingredients.
  2. +
  3. Biomass fermentation uses the microorganisms that reproduce during the fermentation process as ingredients. The microorganisms naturally have high-protein content, and allowing them to reproduce efficiently makes large amounts of protein-rich food.
  4. +
  5. Precision fermentation uses programmed microorganisms as “cellular production factories” to develop proteins, fats, and other nutrients 27.
  6. +
+
+

Notable Companies

+
    +
  1. Essential Impact (East Africa)
  2. +
  3. De Novo Foodlabs (Cape Town, South Africa)
  4. +
  5. MycoTechnology (Aurora, Colorado, USA)
  6. +
  7. Quorn (Stokesley, UK)
  8. +
  9. Perfect Day (Berkeley, California, USA)
  10. +
+
+
+

Challenges to be addressed

+
    +
  1. Identify the correct molecules to manufacture in a cost-effective manner,
  2. +
  3. Develop the appropriate microbial strains for the relevant products,
  4. +
  5. Determine the appropriate feedstocks,
  6. +
  7. Design low-cost bioreactors and systems for scaling-up processes, and,
  8. +
  9. Improve end-product formulation to allow for better taste/texture 28.
  10. +
+
+
+
+

Animal proteins from cultivated meat

+

The cultivated meat industry develops animal proteins that are grown from animal cells directly. Here, tissue-engineering techniques commonly used in regenerative medicine aid in product development 29. Cells obtained from an animal are put into a bioreactor to replicate, and when they reach the optimal density, they are harvested via centrifugation, and the resulting muscle and fat tissue are formed into the recognizable meat structure. The advantages of producing meat in this way include: reduced contamination, decreased antibiotic use, and a lower environmental footprint 30.

+
+

Notable Companies

+
    +
  1. WildBio (formerly Mogale Meat; Pretoria, South Africa)
  2. +
  3. Newform Foods (formerly Mzansi Meat; Cape Town, South Africa)
  4. +
  5. Mosa Meat (Maastricht, Netherlands)
  6. +
  7. Bluu Seafood (Berlin, Germany)
  8. +
  9. Eat Just (Singapore)
  10. +
  11. Clear Meat (Delhi NCR, India)
  12. +
  13. Sea-Stematic (Cape Town, South Africa)
  14. +
+
+
+

Challenges to be addressed

+

Even though many companies have entered the cultivated meat space, not many have received the requisite regulatory approval to sell their products with some countries temporarily halting development 31. Other challenges include:

+
    +
  1. Insufficient bioreactor capacity,
  2. +
  3. High cost of growth media and factors required for cultivation,
  4. +
  5. Lack of products in the market despite large investments 32, and,
  6. +
  7. High final product cost 33 and a major need for consumer education 34.
  8. +
+
+
+
+

What factors will influence the adoption of alternative proteins in low-income countries?

+

The insect-based protein market has the potential to grow faster than the other three alternative protein segments (plant-based, fermentation-based, and cultivated meat) in low-income countries. This is because of fewer barriers to entry and lower setup costs. Therefore, to enhance the adoption of the other alternative protein segments in low-income countries, there is a need to build biomanufacturing capacity, increase R&D funding, and develop a strong workforce by recruiting more students and researchers to the field. Additionally, it would be important for national-level regulations and policies that support the sector to be implemented. On an individual level, several factors will affect the large-scale adoption of alternative proteins in low-income countries 35. These include:

+
    +
  1. Cost (dollars per kilogram of 100% protein), which will need to be similar or lower than that for conventional animal-derived proteins,
  2. +
  3. The protein digestibility-corrected amino acid score (PDCAAS) which is a tool used to measure a protein by its amino acid requirements and the ability of humans to digest it,
  4. +
  5. The economic impact on agricultural workers in the livestock and fishing industry, and,
  6. +
  7. Consumer adoption 36 (which is dependent on perception, taste, texture, safety, and convenience).
  8. +
+
+
+

Conclusion

+

In summary, I hope that I have convinced you that there is an urgent need to address the crisis of hunger and undernourishment worldwide. Second, this essay should have demonstrated to you that optimizing conventional agricultural practices may also simultaneously negatively impact the environment. Third, the reader should now have a basic understanding of alternative proteins and their potential to address undernutrition. Lastly, to tackle the problem of hunger and undernourishment, it is imperative for society to embrace novel alternative protein production technologies that can enhance food production while minimizing the environmental impact and contribution to climate change.

+ + +
+ + +

Footnotes

+ +
    +
  1. Hunger | FAO | Food and Agriculture Organization of the United Nations. https://www.fao.org/hunger/en/.↩︎

  2. +
  3. I am hungry. What does it mean? https://unric.org/en/i-am-hungry-what-does-it-mean/.↩︎

  4. +
  5. Malnutrition. https://www.who.int/health-topics/malnutrition#tab=tab_1.↩︎

  6. +
  7. Deaths from protein-energy malnutrition, by age, World, 1990 to 2019. https://ourworldindata.org/grapher/malnutrition-deaths-by-age.↩︎

  8. +
  9. Protein-Energy Undernutrition (PEU) - Nutritional Disorders - MSD Manual Professional Edition. https://www.msdmanuals.com/professional/nutritional-disorders/undernutrition/protein-energy-undernutrition-peu.↩︎

  10. +
  11. Africa Fertilizer Map 2020 – AF-AP Partnership. https://afap-partnership.org/news/africa-fertilizer-map-2020/.↩︎

  12. +
  13. Impact of Sustainable Agriculture and Farming Practices. https://www.worldwildlife.org/industries/sustainable-agriculture/.↩︎

  14. +
  15. How Industrialized Meat Production Causes Land Degradation. https://populationeducation.org/industrialized-meat-production-and-land-degradation-3-reasons-to-shift-to-a-plant-based-diet/.↩︎

  16. +
  17. Feltran-Barbieri, R. & Féres, J. G. Degraded pastures in Brazil: improving livestock production and forest restoration. R Soc Open Sci 8, (2021).↩︎

  18. +
  19. Moving Towards Sustainability: The Livestock Sector and the World Bank. https://www.worldbank.org/en/topic/agriculture/brief/moving-towards-sustainability-the-livestock-sector-and-the-world-bank.↩︎

  20. +
  21. Pastoral conflict in Kenya – ACCORD. https://www.accord.org.za/ajcr-issues/pastoral-conflict-in-kenya/.↩︎

  22. +
  23. Antimicrobial resistance and agriculture - OECD. https://www.oecd.org/agriculture/topics/antimicrobial-resistance-and-agriculture/.↩︎

  24. +
  25. Extreme Weather | Facts – Climate Change: Vital Signs of the Planet. https://climate.nasa.gov/extreme-weather/.↩︎

  26. +
  27. Alternative proteins. https://sustainablecampus.unimelb.edu.au/sustainable-research/case-studies/alternative-proteins.↩︎

  28. +
  29. Protein. https://www.genome.gov/genetics-glossary/Protein.↩︎

  30. +
  31. The market for alternative protein: Pea protein, cultured meat, and more | McKinsey. https://www.mckinsey.com/industries/agriculture/our-insights/alternative-proteins-the-race-for-market-share-is-on.↩︎

  32. +
  33. Defining alternative protein | GFI. https://gfi.org/defining-alternative-protein/.↩︎

  34. +
  35. Chandran, A. S., Suri, S. & Choudhary, P. Sustainable plant protein: an up-to-date overview of sources, extraction techniques and utilization. Sustainable Food Technology 1, 466–483 (2023).↩︎

  36. +
  37. Plant-based protein processing | Alfa Laval. https://www.alfalaval.com/industries/food-dairy-beverage/food-processing/protein-processing/plant-based-protein-processing/.↩︎

  38. +
  39. The science of plant-based meat | GFI APAC. https://gfi-apac.org/science/the-science-of-plant-based-meat/.↩︎

  40. +
  41. How Insect Protein can Revolutionize the Food Industry. https://mindthegraph.com/blog/insect-protein/.↩︎

  42. +
  43. Yen, A. L. Edible insects: Traditional knowledge or western phobia? Entomol Res 39, 289–298 (2009).↩︎

  44. +
  45. Kim, T. K., Yong, H. I., Kim, Y. B., Kim, H. W. & Choi, Y. S. Edible Insects as a Protein Source: A Review of Public Perception, Processing Technology, and Research Trends. Food Sci Anim Resour 39, 521 (2019).↩︎

  46. +
  47. The Growing Animal Feed Insect Protein Market. https://nutrinews.com/en/the-growing-animal-feed-insect-protein-market-opportunities-and-challenges/.↩︎

  48. +
  49. Taveira, I. C., Nogueira, K. M. V., Oliveira, D. L. G. de & Silva, R. do N. Fermentation: Humanity’s Oldest Biotechnological Tool. Front Young Minds 9, (2021).↩︎

  50. +
  51. Fermentation for alternative proteins 101 | Resource guide | GFI. https://gfi.org/fermentation/.↩︎

  52. +
  53. Fermentation for alternative proteins 101 | Resource guide | GFI. https://gfi.org/fermentation/.↩︎

  54. +
  55. The science of fermentation (2023) | GFI. https://gfi.org/science/the-science-of-fermentation/.↩︎

  56. +
  57. What is cultivated meat? | McKinsey. https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-cultivated-meat.↩︎

  58. +
  59. The science of cultivated meat | GFI. https://gfi.org/science/the-science-of-cultivated-meat/.↩︎

  60. +
  61. Che Sorpresa! Italy U-Turns on Cultivated Meat Ban – For Now. https://www.greenqueen.com.hk/italy-cultivated-meat-ban-lab-grown-food-cultured-protein-eu-tris-notification-francesco-lollobrigida/.↩︎

  62. +
  63. Is overhype dooming the cultivated meat industry? https://www.fastcompany.com/90966338/hype-built-the-cultivated-meat-industry-now-it-could-end-it.↩︎

  64. +
  65. Meat substitutes need to get a lot cheaper. https://www.sustainabilitybynumbers.com/p/meat-substitutes-price.↩︎

  66. +
  67. What is cultivated meat? | McKinsey. https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-cultivated-meat.↩︎

  68. +
  69. The market for alternative protein: Pea protein, cultured meat, and more | McKinsey. https://www.mckinsey.com/industries/agriculture/our-insights/alternative-proteins-the-race-for-market-share-is-on.↩︎

  70. +
  71. Meat substitutes need to get a lot cheaper. https://www.sustainabilitybynumbers.com/p/meat-substitutes-price.↩︎

  72. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/code_along_with_me/danger_disturb/danger_disturb.html b/_site/posts/code_along_with_me/danger_disturb/danger_disturb.html new file mode 100644 index 0000000..96578d4 --- /dev/null +++ b/_site/posts/code_along_with_me/danger_disturb/danger_disturb.html @@ -0,0 +1,1199 @@ + + + + + + + + + + + +William Okech - Code Along With Me (Episode 1) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

Code Along With Me (Episode 1)

+

An assessment of the livestock numbers in the six counties declared to be ‘disturbed and dangerous’ in Kenya

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

November 28, 2023

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+
+

Pastoral livestock running through a dry savanna followed by a traditional herder on horseback

+
Image created using “Bing Image Creator” with the prompt keywords “cows, sheep, indigenous cattle, running, dry savanna, river bed, traditional herdsman, nature photography, –ar 5:4 –style raw”
+
+
+
+

Introduction

+

In February 2023, the government of Kenya described six counties as “disturbed” and “dangerous.” This is because in the preceding six months, over 100 civilians and 16 police officers have lost their lives, as criminals engaged in banditry have terrorized the rural villages. One of the main causes of the conflict is livestock theft, therefore, my goal is to perform an exploratory data analysis of livestock numbers from the Kenya Population and Housing Census (2019) report.

+

References

+
    +
  1. Nation Media News Brief
  2. +
  3. Citizen TV Summary
  4. +
+
+
+

Section 1: Load all the required libraries

+
+
+Code +
library(tidyverse) # a collection of packages used to model, transform, and visualize data
+library(rKenyaCensus) # tidy datasets obtained from the Kenya Population and Housing Census results
+library(patchwork) # combine separate ggplots into the same graphic
+library(janitor) # initial data exploration and cleaning for a new data set
+library(ggrepel)# repel overlapping text labels
+library(ggthemes) # Extra Themes, Scales and Geoms for 'ggplot2'
+library(scales) # tools to override the default breaks, labels, transformations and palettes
+# install.packages("treemapify")
+library(treemapify) # allows the creation of treemaps in ggplot2
+
+library(sf) # simple features, a method to encode spatial vector data
+#install.packages("devtools")
+library(devtools) # helps to install packages not on CRAN
+#devtools::install_github("yutannihilation/ggsflabel")
+library(ggsflabel) # place labels on maps
+
+library(knitr) # a tool for dynamic report generation in R
+#install.packages("kableExtra")
+library(kableExtra) # build common complex tables and manipulate table styles
+
+
+

Note: If you have package loading issues check the timeout with getOption(‘timeout’) and use options(timeout = ) to increase package loading time.

+
+
+

Section 2: Create a map of the “dangerous and disturbed” counties

+

The rKenyaCensus package includes a built-in county boundaries dataset to facilitate mapping of the various indicators in the Census. The required shapefile for this analysis is KenyaCounties_SHP

+
+

a) Sample plot of the map of Kenya

+
+
+Code +
# Load the shapefile
+kenya_counties_sf <- st_as_sf(KenyaCounties_SHP)
+
+# Plot the map of Kenya
+p0 <- ggplot(kenya_counties_sf) + 
+  geom_sf(fill = "bisque2", linewidth = 0.6, color = "black") + 
+  theme_void()
+p0
+
+
+

+
+
+
+
+

b) Highlight the dangerous and disturbed counties in Kenya

+
+
+Code +
# First, remove the "/" from the county names
+
+kenya_counties_sf$County <- gsub("/", 
+                                 " ", 
+                                 kenya_counties_sf$County)
+
+# Select the six counties to highlight
+highlight_counties <- c("TURKANA", "WEST POKOT", "ELGEYO MARAKWET", "BARINGO", "LAIKIPIA", "SAMBURU")
+
+# Filter the counties dataset to only include the highlighted counties
+highlighted <- kenya_counties_sf %>% filter(County %in% highlight_counties)
+
+# Plot the highlighted counties in the map
+p1 <- ggplot() + 
+  geom_sf(data = kenya_counties_sf, fill = "bisque2", linewidth = 0.6, color = "black") + 
+  geom_sf(data  = highlighted, fill = "chocolate4", linewidth = 0.8, color = "black") +
+  theme_void()
+p1
+
+
+

+
+
+
+
+

c) Plot only the required counties

+
+
+Code +
p2 <- ggplot(data = highlighted) +
+  geom_sf(aes(fill = County), linewidth = 1, show.legend = FALSE) +
+  geom_label_repel(aes(label = County, geometry = geometry), size = 3,
+                      stat = "sf_coordinates",
+                      force=10, # force of repulsion between overlapping text labels
+                      seed = 1,
+                      segment.size = 0.75,
+                      min.segment.length=0) +
+  scale_fill_brewer(palette = "OrRd") +
+  labs(title = "",
+       caption = "") +
+  theme(plot.title = element_text(family = "Helvetica",size = 10, hjust = 0.5),
+        legend.title = element_blank(),
+        legend.position = "none",
+        plot.caption = element_text(family = "Helvetica",size = 12)) +
+  theme_void() 
+p2
+
+
+

+
+
+Code +
# Notes: geom_label_repel() with geometry and stat defined can be used as an 
+# alternative to geom_sf_label_repel()
+
+
+
+
+

d) Combine the plots using patchwork to clearly highlight the counties of interest

+
+
+Code +
p1 + 
+  p2 + 
+  plot_annotation(title = "",
+                  subtitle = "",
+                  caption = "",
+                  theme = theme(plot.title = element_text(family="Helvetica", face="bold", size = 25),
+                                plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+                                plot.caption = element_text(family = "Helvetica",size = 12, face = "bold")))
+
+
+

+
+
+
+
+
+

Section 3: Load the livestock data from the census report and generate the dataframes required for analysis.

+
+

a) View the data available in the data catalogue

+
+
+Code +
data("DataCatalogue")
+
+
+
+
+

b) Load the livestock data

+

Here, pastoral livestock are defined as sheep, goats, and indigenous cattle.

+
+
+Code +
# Select the livestock data from the census report
+df_livestock <- V4_T2.24
+livestock <- df_livestock[2:393, ]
+livestock <- livestock %>%
+  clean_names()
+
+# Remove the "/" from the county names in the dataset
+livestock$county <- gsub("/", " ", livestock$county)
+livestock$sub_county <- gsub("/", " ", livestock$sub_county)
+
+# Select the variables of interest from the dataset
+# These include the county, subcounty, land area, number of farming households, 
+# sheep, goats, and indigenous cattle.
+
+# New variables listed below include:
+# pasto_livestock is the total number of sheep, goats, and indigenous cattle
+# ind_cattle_household is the number of indigenous cattle per household
+# goats_household is the number of goats per household
+# sheep_household is the number of sheep per household
+# pasto_livestock_household is the number of pastoral livestock per household
+
+livestock_select <- livestock %>%
+  select(county, sub_county, admin_area, farming, sheep, goats, indigenous_cattle) %>%
+  mutate(pasto_livestock = sheep + goats + indigenous_cattle) %>% 
+  mutate(ind_cattle_household = round(indigenous_cattle/farming)) %>%
+  mutate(goats_household = round(goats/farming)) %>%
+  mutate(sheep_household = round(sheep/farming)) %>%
+  mutate(pasto_livestock_household = round(pasto_livestock/farming))
+
+
+
+
+

c) Filter data for the selected “disturbed and dangerous” counties

+
+
+Code +
# Select the data for the "dangerous and disturbed" counties
+dan_dist <- c("TURKANA", "WEST POKOT", "ELGEYO MARAKWET", "BARINGO", "LAIKIPIA", "SAMBURU")
+
+livestock_select_county <- livestock_select %>%
+  filter(admin_area == "County") %>%
+  filter(county %in% dan_dist)
+
+# Select subcounty data for the "disturbed and dangerous" counties
+livestock_select_subcounty <- livestock_select %>%
+  filter(admin_area == "SubCounty") %>%
+  filter(county %in% dan_dist)
+
+# Create an area dataset for the "dangerous and disturbed" counties
+df_land_area <- V1_T2.7
+land_area <- df_land_area[2:396,]
+land_area <- land_area %>%
+  clean_names()
+
+
+
+
+Code +
# Create a function to remove the "/", " County" from label, and change the label to UPPERCASE
+
+clean_county_names <- function(dataframe, column_name) {
+  dataframe[[column_name]] <- toupper(gsub("/", " ", gsub(" County", "", dataframe[[column_name]])))
+  return(dataframe)
+}
+
+land_area <- clean_county_names(land_area, 'county')
+land_area <- clean_county_names(land_area, 'sub_county')
+
+# The code above does the processes listed below:
+
+# land_area$county <- gsub("/", " ", land_area$county)
+# land_area$county <- gsub(" County", "", land_area$county)
+# land_area$county <- toupper(land_area$county)
+# land_area$sub_county <- toupper(land_area$sub_county)
+
+
+
+
+Code +
# Obtain the area data for "disturbed and dangerous" counties
+land_area_county <- land_area %>%
+  filter(admin_area == "County") %>%
+  select(county, land_area_in_sq_km) %>%
+  filter(county %in% dan_dist)
+
+# Get the subcounty area data for "disturbed and dangerous" counties
+land_area_subcounty <- land_area %>%
+  filter(admin_area == "SubCounty") %>%
+  select(county, sub_county, land_area_in_sq_km) %>%
+  filter(county %in% dan_dist) %>%
+  select(-county)
+
+
+
+
+

d) Create the final datasets to be used for analysis. Use inner_join() and creating new variables.

+
+
+Code +
# Create a county dataset with area and livestock numbers for the disturbed and dangerous regions
+
+livestock_area_county <- inner_join(livestock_select_county, land_area_county, by = "county")
+
+# New variables listed below include:
+# ind_cattle_area is the number of indigenous cattle per area_in_sq_km
+# goats_area is the number of goats per household per area_in_sq_km
+# sheep_area is the number of sheep per area_in_sq_km
+# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km
+
+livestock_area_county <- livestock_area_county %>%
+  mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),
+         sheep_area = round(sheep/land_area_in_sq_km),
+         goats_area = round(goats/land_area_in_sq_km),
+         pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))
+
+# Create a subcounty dataset with area and livestock numbers
+# for the disturbed and dangerous regions
+
+livestock_area_subcounty <- inner_join(livestock_select_subcounty, land_area_subcounty, by = "sub_county")
+
+# New variables listed below include:
+# ind_cattle_area is the number of indigenous cattle per area_in_sq_km
+# goats_area is the number of goats per household per area_in_sq_km
+# sheep_area is the number of sheep per area_in_sq_km
+# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km
+
+livestock_area_subcounty <- livestock_area_subcounty %>%
+  mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),
+         sheep_area = round(sheep/land_area_in_sq_km),
+         goats_area = round(goats/land_area_in_sq_km),
+         pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))
+
+
+
+
+
+

Section 4: Create a table with land area (sq. km) for the six counties

+

These six counties cover approximately one-fifth (1/5) of Kenya

+
+
+Code +
livestock_area_county %>%
+  select(county, land_area_in_sq_km) %>%
+  mutate(county = str_to_title(county)) %>%
+  arrange(desc(land_area_in_sq_km)) %>%
+  adorn_totals("row") %>%
+  rename("County" = "county",
+         "Land Area (sq. km)" = "land_area_in_sq_km") %>%
+  kbl(align = "c") %>%
+  kable_classic() %>% 
+  row_spec(row = 0, font_size = 21, color = "white", background = "#000000") %>%
+  row_spec(row = c(1:7), font_size = 15) %>%
+  row_spec(row = 6, extra_css = "border-bottom: 1px solid;") %>%
+  row_spec(row = 7, bold = T)
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CountyLand Area (sq. km)
Turkana68232.9
Samburu21065.1
Baringo10976.4
Laikipia9532.2
West Pokot9123.2
Elgeyo Marakwet3032.0
Total121961.8
+ + +
+
+
+
+

Section 5: Perform an exploratory data analysis to gain key insights about the data

+
+

a) Number of Farming Households at the County Level

+
+
+Code +
lac_1 <- livestock_area_county %>%
+  ggplot() + 
+  geom_col(aes(x= reorder(county, farming), y = farming, fill = county)) +
+  geom_text(aes(x = county, y = 0, label = county),
+            hjust = 0, nudge_y = 0.25) +
+  geom_text(aes(x = county, y = farming, label = farming),
+            hjust = 1, nudge_y = 15000) +
+  scale_fill_brewer(palette = "OrRd") +
+  coord_flip() + 
+  labs(x = "County",
+       y = "Number of Farming Households",
+       title = "",
+       subtitle = "",
+       caption = "") +
+  theme_minimal() +
+  scale_y_continuous(labels = comma) +
+  theme(axis.title.x =element_blank(), 
+        axis.title.y =element_blank(),
+        axis.text.x = element_blank(),
+        axis.text.y = element_blank(),
+        plot.title = element_text(family="Helvetica", face="bold", size = 20),
+        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
+        panel.grid.major = element_blank(),
+        panel.grid.minor = element_blank(),
+        legend.title = element_blank(),
+        legend.position = "none",
+        plot.margin = unit(c(1, 1, 1, 1), "cm"))
+
+# Create a patchwork plot with the map and the bar graph
+
+lac_1 + p2
+
+
+

+
+
+
+
+

b) Number of Farming Households at the Subcounty Level

+
+
+Code +
livestock_area_subcounty %>%
+  ggplot() + 
+  geom_col(aes(x= reorder(sub_county, farming), y = farming, fill = county), width = 0.95) +   geom_text(aes(x = sub_county, y = farming, label = farming),
+            hjust = 1, nudge_y = 2500, size = 4) +
+  scale_fill_brewer(palette = "OrRd") +
+  coord_flip() +
+  guides(fill = guide_legend(nrow = 1)) +
+  labs(x = "County",
+       y = "Number of Farming Households",
+       fill = "County",
+       title = "",
+       subtitle = "",
+       caption = "") +
+  theme_minimal() +
+  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +
+  theme(axis.title.x =element_blank(),
+        axis.title.y =element_blank(),
+        axis.text.x = element_text(size = 12),
+        axis.text.y = element_text(size = 12),
+        plot.title = element_text(family="Helvetica", face="bold", size = 20),
+        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
+        plot.margin = unit(c(1, 1, 1, 1), "cm"),
+        panel.grid.major = element_blank(),
+        panel.grid.minor = element_blank(),
+        legend.title = element_text(size = 10),
+        legend.text=element_text(size=8),
+        legend.position = "top")
+
+
+

+
+
+
+
+

c) Number of pastoral livestock per county

+
+

Treemap

+
+
+Code +
ggplot(livestock_area_county, aes(area = pasto_livestock, fill = county, label = comma(pasto_livestock)
+                              )) +
+  geom_treemap() +
+  geom_treemap_text(colour = "black",
+                    place = "centre",
+                    size = 24) +
+  scale_fill_brewer(palette = "OrRd") +
+  labs(x = "",
+       y = "",
+       title = "",
+       caption = "") +
+  theme_minimal() +
+  theme(axis.title.x =element_text(size = 20),
+        axis.title.y =element_text(size = 20),
+        axis.text.x = element_text(size = 15),
+        axis.text.y = element_text(size = 15),
+        plot.title = element_text(family="Helvetica", face="bold", size = 28),
+        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
+        legend.title = element_blank(),
+        legend.text=element_text(size=12),
+        legend.position = "bottom") 
+
+
+

+
+
+
+
+
+

d) Number of pastoral livestock per subcounty

+
+
+Code +
livestock_area_subcounty %>%
+  ggplot() + 
+  geom_col(aes(x= reorder(sub_county, pasto_livestock), y = pasto_livestock, fill = county), width = 0.95) + 
+  geom_text(aes(x = sub_county, y = pasto_livestock, label = pasto_livestock),
+            hjust = 1, nudge_y = 125000) +
+  scale_fill_brewer(palette = "OrRd") +
+  coord_flip() + 
+  guides(fill = guide_legend(nrow = 1)) +
+  labs(x = "Subcounty",
+       y = "Number of Pastoral Livestock",
+       fill = "County",
+       title = "",
+       caption = "") +
+  theme_minimal() +
+  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +
+  theme(axis.title.x =element_blank(),
+        axis.title.y =element_blank(),
+        axis.text.x = element_text(size = 12),
+        axis.text.y = element_text(size = 12),
+        plot.title = element_text(family="Helvetica", face="bold", size = 20),
+        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
+        plot.margin = unit(c(1, 1, 1, 1), "cm"),
+        panel.grid.major = element_blank(),
+        panel.grid.minor = element_blank(),
+        legend.title = element_text(size = 10),
+        legend.text=element_text(size=8),
+        legend.position = "top")
+
+
+

+
+
+
+
+

e) Number of pastoral livestock per household at the county level

+
+
+Code +
lac_2 <- livestock_area_county %>%
+  ggplot() + 
+  geom_col(aes(x= reorder(county, pasto_livestock_household), y = pasto_livestock_household, fill = county)) + 
+geom_text(aes(x = county, y = pasto_livestock_household, label = pasto_livestock_household), hjust = 1, nudge_y = 5) +
+  scale_fill_brewer(palette = "OrRd") +
+  coord_flip() + 
+  labs(x = "County",
+       y = "Number of Pastoral Livestock per household",
+       title = "",
+       caption = "") +
+  theme_minimal() +
+  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +
+  theme(axis.title.x =element_blank(), 
+        axis.title.y =element_blank(),
+        axis.text.x = element_text(size = 12),
+        axis.text.y = element_text(size = 12),
+        plot.title = element_text(family="Helvetica", face="bold", size = 20),
+        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
+        panel.grid.major = element_blank(),
+        panel.grid.minor = element_blank(),
+        legend.title = element_blank(),
+        legend.position = "none",
+        plot.margin = unit(c(1, 1, 1, 1), "cm"))
+
+# Create a patchwork plot with the map and the bar graph
+
+lac_2 + p2
+
+
+

+
+
+
+
+

f) Number of pastoral livestock per household at the subcounty level

+
+
+Code +
livestock_area_subcounty %>%
+  ggplot() + 
+  geom_col(aes(x= reorder(sub_county, pasto_livestock_household), y = pasto_livestock_household, fill = county), width = 0.95) + 
+  scale_fill_brewer(palette = "OrRd") +
+  geom_text(aes(x = sub_county, y = pasto_livestock_household, label = pasto_livestock_household),
+            hjust = 1, nudge_y = 5) +
+  coord_flip() + 
+  guides(fill = guide_legend(nrow = 1)) +
+  labs(x = "Subcounty",
+       y = "Number of Pastoral Livestock per household",
+       fill = "County",
+       title = "",
+       caption = "") +
+  theme_minimal() +
+  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +
+  theme(axis.title.x =element_blank(),
+        axis.title.y =element_blank(),
+        axis.text.x = element_text(size = 12),
+        axis.text.y = element_text(size = 12),
+        plot.title = element_text(family="Helvetica", face="bold", size = 20),
+        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
+        plot.margin = unit(c(1, 1, 1, 1), "cm"),
+        panel.grid.major = element_blank(),
+        panel.grid.minor = element_blank(),
+        legend.title = element_text(size = 10),
+        legend.text=element_text(size=8),
+        legend.position = "top")
+
+
+

+
+
+
+
+
+

Section 5: Conclusion

+

In this post, I have assessed the pastoral livestock (indigenous cattle, sheep, and goats) populations in the Kenyan counties described as “disturbed and dangerous.” A major contributor to this classification is the banditry, livestock theft, and limited amounts of pasture and water available to livestock owners. To get a better sense of the number of households engaged in farming and the pastoral livestock populations in these counties, I performed an exploratory data analysis and visualized my results.

+

Key findings from the study were that: 1. Turkana and Samburu had some of the highest numbers of pastoral livestock, yet they had the fewest numbers of households engaged in farming. This meant that both these counties had the highest average livestock to farming household ratios in the region (Turkana: 55 per household and Samburu: 36 per household). 2. The top three (3) subcounties with the highest average ratios of pastoral livestock to farming households were in Turkana. They included Turkana East (126), Kibish (96), and Turkana West (56). Surprisingly, counties such as Keiyo North (4), Koibatek (4), and Nyahururu (4) had very low ratios of pastoral livestock to farming household ratios despite having relatively high numbers of farming households. This may have resulted from the unsuitability of the land for grazing animals, small average land sizes per farming household, a switch to exotic dairy and beef livestock, or simply, a preference for crop, rather than livestock farming.

+

In the next analysis, I will assess the livestock numbers per area (square kilometers), the numbers of indigenous cattle, sheep, and goats in every county and subcounty, as well as the other animal husbandry and/or crop farming activities that take place in the region. The reader is encouraged to use this code and data package (rKenyaCensus) to come up with their own analyses to share with the world.

+ + +
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-13-1.png b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-13-1.png new file mode 100644 index 0000000..65c959e Binary files /dev/null and b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-13-1.png differ diff --git a/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-14-1.png b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-14-1.png new file mode 100644 index 0000000..641ac7a Binary files /dev/null and b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-14-1.png differ diff --git a/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-15-1.png b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-15-1.png new file mode 100644 index 0000000..159506c Binary files /dev/null and b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-15-1.png differ diff --git a/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-16-1.png b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-16-1.png new file mode 100644 index 0000000..61a5dc2 Binary files /dev/null and b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-16-1.png differ diff --git a/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-17-1.png b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-17-1.png new file mode 100644 index 0000000..d74d10f Binary files /dev/null and b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-17-1.png differ diff --git a/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-18-1.png b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-18-1.png new file mode 100644 index 0000000..db9b123 Binary files /dev/null and b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-18-1.png differ diff --git a/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-2-1.png b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-2-1.png new file mode 100644 index 0000000..aba528e Binary files /dev/null and b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-2-1.png differ diff --git a/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-3-1.png b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-3-1.png new file mode 100644 index 0000000..ea83c2e Binary files /dev/null and b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-3-1.png differ diff --git a/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-4-1.png b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-4-1.png new file mode 100644 index 0000000..90a5d22 Binary files /dev/null and b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-5-1.png b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-5-1.png new file mode 100644 index 0000000..714f92c Binary files /dev/null and b/_site/posts/code_along_with_me/danger_disturb/danger_disturb_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/_site/posts/code_along_with_me/danger_disturb/images/code_along_with_me_cover.png b/_site/posts/code_along_with_me/danger_disturb/images/code_along_with_me_cover.png new file mode 100644 index 0000000..e40dae7 Binary files /dev/null and b/_site/posts/code_along_with_me/danger_disturb/images/code_along_with_me_cover.png differ diff --git a/_site/posts/code_along_with_me/danger_disturb/images/running_livestock.jpeg b/_site/posts/code_along_with_me/danger_disturb/images/running_livestock.jpeg new file mode 100644 index 0000000..b3530f6 Binary files /dev/null and b/_site/posts/code_along_with_me/danger_disturb/images/running_livestock.jpeg differ diff --git a/_site/posts/code_along_with_me/new_post_1/danger_disturb.html b/_site/posts/code_along_with_me/new_post_1/danger_disturb.html new file mode 100644 index 0000000..e85b779 --- /dev/null +++ b/_site/posts/code_along_with_me/new_post_1/danger_disturb.html @@ -0,0 +1,1199 @@ + + + + + + + + + + + +William Okech - Code Along With Me (Episode 1) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

Code Along With Me (Episode 1)

+

An assessment of the livestock numbers in the six counties declared to be ‘disturbed and dangerous’ in Kenya

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

November 28, 2023

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+
+

Pastoral livestock running through a dry savanna followed by a traditional herder on horseback

+
Image created using “Bing Image Creator” with the prompt keywords “cows, sheep, indigenous cattle, running, dry savanna, river bed, traditional herdsman, nature photography, –ar 5:4 –style raw”
+
+
+
+

Introduction

+

In February 2023, the government of Kenya described six counties as “disturbed” and “dangerous.” This is because in the preceding six months, over 100 civilians and 16 police officers have lost their lives, as criminals engaged in banditry have terrorized the rural villages. One of the main causes of the conflict is livestock theft, therefore, my goal is to perform an exploratory data analysis of livestock numbers from the Kenya Population and Housing Census (2019) report.

+

References

+
    +
  1. Nation Media News Brief
  2. +
  3. Citizen TV Summary
  4. +
+
+
+

Section 1: Load all the required libraries

+
+
+Code +
library(tidyverse) # a collection of packages used to model, transform, and visualize data
+library(rKenyaCensus) # tidy datasets obtained from the Kenya Population and Housing Census results
+library(patchwork) # combine separate ggplots into the same graphic
+library(janitor) # initial data exploration and cleaning for a new data set
+library(ggrepel)# repel overlapping text labels
+library(ggthemes) # Extra Themes, Scales and Geoms for 'ggplot2'
+library(scales) # tools to override the default breaks, labels, transformations and palettes
+# install.packages("treemapify")
+library(treemapify) # allows the creation of treemaps in ggplot2
+
+library(sf) # simple features, a method to encode spatial vector data
+#install.packages("devtools")
+library(devtools) # helps to install packages not on CRAN
+#devtools::install_github("yutannihilation/ggsflabel")
+library(ggsflabel) # place labels on maps
+
+library(knitr) # a tool for dynamic report generation in R
+#install.packages("kableExtra")
+library(kableExtra) # build common complex tables and manipulate table styles
+
+
+

Note: If you have package loading issues check the timeout with getOption(‘timeout’) and use options(timeout = ) to increase package loading time.

+
+
+

Section 2: Create a map of the “dangerous and disturbed” counties

+

The rKenyaCensus package includes a built-in county boundaries dataset to facilitate mapping of the various indicators in the Census. The required shapefile for this analysis is KenyaCounties_SHP

+
+

a) Sample plot of the map of Kenya

+
+
+Code +
# Load the shapefile
+kenya_counties_sf <- st_as_sf(KenyaCounties_SHP)
+
+# Plot the map of Kenya
+p0 <- ggplot(kenya_counties_sf) + 
+  geom_sf(fill = "bisque2", linewidth = 0.6, color = "black") + 
+  theme_void()
+p0
+
+
+

+
+
+
+
+

b) Highlight the dangerous and disturbed counties in Kenya

+
+
+Code +
# First, remove the "/" from the county names
+
+kenya_counties_sf$County <- gsub("/", 
+                                 " ", 
+                                 kenya_counties_sf$County)
+
+# Select the six counties to highlight
+highlight_counties <- c("TURKANA", "WEST POKOT", "ELGEYO MARAKWET", "BARINGO", "LAIKIPIA", "SAMBURU")
+
+# Filter the counties dataset to only include the highlighted counties
+highlighted <- kenya_counties_sf %>% filter(County %in% highlight_counties)
+
+# Plot the highlighted counties in the map
+p1 <- ggplot() + 
+  geom_sf(data = kenya_counties_sf, fill = "bisque2", linewidth = 0.6, color = "black") + 
+  geom_sf(data  = highlighted, fill = "chocolate4", linewidth = 0.8, color = "black") +
+  theme_void()
+p1
+
+
+

+
+
+
+
+

c) Plot only the required counties

+
+
+Code +
p2 <- ggplot(data = highlighted) +
+  geom_sf(aes(fill = County), linewidth = 1, show.legend = FALSE) +
+  geom_label_repel(aes(label = County, geometry = geometry), size = 3,
+                      stat = "sf_coordinates",
+                      force=10, # force of repulsion between overlapping text labels
+                      seed = 1,
+                      segment.size = 0.75,
+                      min.segment.length=0) +
+  scale_fill_brewer(palette = "OrRd") +
+  labs(title = "",
+       caption = "") +
+  theme(plot.title = element_text(family = "Helvetica",size = 10, hjust = 0.5),
+        legend.title = element_blank(),
+        legend.position = "none",
+        plot.caption = element_text(family = "Helvetica",size = 12)) +
+  theme_void() 
+p2
+
+
+

+
+
+Code +
# Notes: geom_label_repel() with geometry and stat defined can be used as an 
+# alternative to geom_sf_label_repel()
+
+
+
+
+

d) Combine the plots using patchwork to clearly highlight the counties of interest

+
+
+Code +
p1 + 
+  p2 + 
+  plot_annotation(title = "",
+                  subtitle = "",
+                  caption = "",
+                  theme = theme(plot.title = element_text(family="Helvetica", face="bold", size = 25),
+                                plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+                                plot.caption = element_text(family = "Helvetica",size = 12, face = "bold")))
+
+
+

+
+
+
+
+
+

Section 3: Load the livestock data from the census report and generate the dataframes required for analysis.

+
+

a) View the data available in the data catalogue

+
+
+Code +
data("DataCatalogue")
+
+
+
+
+

b) Load the livestock data

+

Here, pastoral livestock are defined as sheep, goats, and indigenous cattle.

+
+
+Code +
# Select the livestock data from the census report
+df_livestock <- V4_T2.24
+livestock <- df_livestock[2:393, ]
+livestock <- livestock %>%
+  clean_names()
+
+# Remove the "/" from the county names in the dataset
+livestock$county <- gsub("/", " ", livestock$county)
+livestock$sub_county <- gsub("/", " ", livestock$sub_county)
+
+# Select the variables of interest from the dataset
+# These include the county, subcounty, land area, number of farming households, 
+# sheep, goats, and indigenous cattle.
+
+# New variables listed below include:
+# pasto_livestock is the total number of sheep, goats, and indigenous cattle
+# ind_cattle_household is the number of indigenous cattle per household
+# goats_household is the number of goats per household
+# sheep_household is the number of sheep per household
+# pasto_livestock_household is the number of pastoral livestock per household
+
+livestock_select <- livestock %>%
+  select(county, sub_county, admin_area, farming, sheep, goats, indigenous_cattle) %>%
+  mutate(pasto_livestock = sheep + goats + indigenous_cattle) %>% 
+  mutate(ind_cattle_household = round(indigenous_cattle/farming)) %>%
+  mutate(goats_household = round(goats/farming)) %>%
+  mutate(sheep_household = round(sheep/farming)) %>%
+  mutate(pasto_livestock_household = round(pasto_livestock/farming))
+
+
+
+
+

c) Filter data for the selected “disturbed and dangerous” counties

+
+
+Code +
# Select the data for the "dangerous and disturbed" counties
+dan_dist <- c("TURKANA", "WEST POKOT", "ELGEYO MARAKWET", "BARINGO", "LAIKIPIA", "SAMBURU")
+
+livestock_select_county <- livestock_select %>%
+  filter(admin_area == "County") %>%
+  filter(county %in% dan_dist)
+
+# Select subcounty data for the "disturbed and dangerous" counties
+livestock_select_subcounty <- livestock_select %>%
+  filter(admin_area == "SubCounty") %>%
+  filter(county %in% dan_dist)
+
+# Create an area dataset for the "dangerous and disturbed" counties
+df_land_area <- V1_T2.7
+land_area <- df_land_area[2:396,]
+land_area <- land_area %>%
+  clean_names()
+
+
+
+
+Code +
# Create a function to remove the "/", " County" from label, and change the label to UPPERCASE
+
+clean_county_names <- function(dataframe, column_name) {
+  dataframe[[column_name]] <- toupper(gsub("/", " ", gsub(" County", "", dataframe[[column_name]])))
+  return(dataframe)
+}
+
+land_area <- clean_county_names(land_area, 'county')
+land_area <- clean_county_names(land_area, 'sub_county')
+
+# The code above does the processes listed below:
+
+# land_area$county <- gsub("/", " ", land_area$county)
+# land_area$county <- gsub(" County", "", land_area$county)
+# land_area$county <- toupper(land_area$county)
+# land_area$sub_county <- toupper(land_area$sub_county)
+
+
+
+
+Code +
# Obtain the area data for "disturbed and dangerous" counties
+land_area_county <- land_area %>%
+  filter(admin_area == "County") %>%
+  select(county, land_area_in_sq_km) %>%
+  filter(county %in% dan_dist)
+
+# Get the subcounty area data for "disturbed and dangerous" counties
+land_area_subcounty <- land_area %>%
+  filter(admin_area == "SubCounty") %>%
+  select(county, sub_county, land_area_in_sq_km) %>%
+  filter(county %in% dan_dist) %>%
+  select(-county)
+
+
+
+
+

d) Create the final datasets to be used for analysis. Use inner_join() and creating new variables.

+
+
+Code +
# Create a county dataset with area and livestock numbers for the disturbed and dangerous regions
+
+livestock_area_county <- inner_join(livestock_select_county, land_area_county, by = "county")
+
+# New variables listed below include:
+# ind_cattle_area is the number of indigenous cattle per area_in_sq_km
+# goats_area is the number of goats per household per area_in_sq_km
+# sheep_area is the number of sheep per area_in_sq_km
+# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km
+
+livestock_area_county <- livestock_area_county %>%
+  mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),
+         sheep_area = round(sheep/land_area_in_sq_km),
+         goats_area = round(goats/land_area_in_sq_km),
+         pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))
+
+# Create a subcounty dataset with area and livestock numbers
+# for the disturbed and dangerous regions
+
+livestock_area_subcounty <- inner_join(livestock_select_subcounty, land_area_subcounty, by = "sub_county")
+
+# New variables listed below include:
+# ind_cattle_area is the number of indigenous cattle per area_in_sq_km
+# goats_area is the number of goats per household per area_in_sq_km
+# sheep_area is the number of sheep per area_in_sq_km
+# pasto_livestock_area is the number of pastoral livestock per area_in_sq_km
+
+livestock_area_subcounty <- livestock_area_subcounty %>%
+  mutate(ind_cattle_area = round(indigenous_cattle/land_area_in_sq_km),
+         sheep_area = round(sheep/land_area_in_sq_km),
+         goats_area = round(goats/land_area_in_sq_km),
+         pasto_livestock_area = round(pasto_livestock/land_area_in_sq_km))
+
+
+
+
+
+

Section 4: Create a table with land area (sq. km) for the six counties

+

These six counties cover approximately one-fifth (1/5) of Kenya

+
+
+Code +
livestock_area_county %>%
+  select(county, land_area_in_sq_km) %>%
+  mutate(county = str_to_title(county)) %>%
+  arrange(desc(land_area_in_sq_km)) %>%
+  adorn_totals("row") %>%
+  rename("County" = "county",
+         "Land Area (sq. km)" = "land_area_in_sq_km") %>%
+  kbl(align = "c") %>%
+  kable_classic() %>% 
+  row_spec(row = 0, font_size = 21, color = "white", background = "#000000") %>%
+  row_spec(row = c(1:7), font_size = 15) %>%
+  row_spec(row = 6, extra_css = "border-bottom: 1px solid;") %>%
+  row_spec(row = 7, bold = T)
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CountyLand Area (sq. km)
Turkana68232.9
Samburu21065.1
Baringo10976.4
Laikipia9532.2
West Pokot9123.2
Elgeyo Marakwet3032.0
Total121961.8
+ + +
+
+
+
+

Section 5: Perform an exploratory data analysis to gain key insights about the data

+
+

a) Number of Farming Households at the County Level

+
+
+Code +
lac_1 <- livestock_area_county %>%
+  ggplot() + 
+  geom_col(aes(x= reorder(county, farming), y = farming, fill = county)) +
+  geom_text(aes(x = county, y = 0, label = county),
+            hjust = 0, nudge_y = 0.25) +
+  geom_text(aes(x = county, y = farming, label = farming),
+            hjust = 1, nudge_y = 15000) +
+  scale_fill_brewer(palette = "OrRd") +
+  coord_flip() + 
+  labs(x = "County",
+       y = "Number of Farming Households",
+       title = "",
+       subtitle = "",
+       caption = "") +
+  theme_minimal() +
+  scale_y_continuous(labels = comma) +
+  theme(axis.title.x =element_blank(), 
+        axis.title.y =element_blank(),
+        axis.text.x = element_blank(),
+        axis.text.y = element_blank(),
+        plot.title = element_text(family="Helvetica", face="bold", size = 20),
+        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
+        panel.grid.major = element_blank(),
+        panel.grid.minor = element_blank(),
+        legend.title = element_blank(),
+        legend.position = "none",
+        plot.margin = unit(c(1, 1, 1, 1), "cm"))
+
+# Create a patchwork plot with the map and the bar graph
+
+lac_1 + p2
+
+
+

+
+
+
+
+

b) Number of Farming Households at the Subcounty Level

+
+
+Code +
livestock_area_subcounty %>%
+  ggplot() + 
+  geom_col(aes(x= reorder(sub_county, farming), y = farming, fill = county), width = 0.95) +   geom_text(aes(x = sub_county, y = farming, label = farming),
+            hjust = 1, nudge_y = 2500, size = 4) +
+  scale_fill_brewer(palette = "OrRd") +
+  coord_flip() +
+  guides(fill = guide_legend(nrow = 1)) +
+  labs(x = "County",
+       y = "Number of Farming Households",
+       fill = "County",
+       title = "",
+       subtitle = "",
+       caption = "") +
+  theme_minimal() +
+  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +
+  theme(axis.title.x =element_blank(),
+        axis.title.y =element_blank(),
+        axis.text.x = element_text(size = 12),
+        axis.text.y = element_text(size = 12),
+        plot.title = element_text(family="Helvetica", face="bold", size = 20),
+        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
+        plot.margin = unit(c(1, 1, 1, 1), "cm"),
+        panel.grid.major = element_blank(),
+        panel.grid.minor = element_blank(),
+        legend.title = element_text(size = 10),
+        legend.text=element_text(size=8),
+        legend.position = "top")
+
+
+

+
+
+
+
+

c) Number of pastoral livestock per county

+
+

Treemap

+
+
+Code +
ggplot(livestock_area_county, aes(area = pasto_livestock, fill = county, label = comma(pasto_livestock)
+                              )) +
+  geom_treemap() +
+  geom_treemap_text(colour = "black",
+                    place = "centre",
+                    size = 24) +
+  scale_fill_brewer(palette = "OrRd") +
+  labs(x = "",
+       y = "",
+       title = "",
+       caption = "") +
+  theme_minimal() +
+  theme(axis.title.x =element_text(size = 20),
+        axis.title.y =element_text(size = 20),
+        axis.text.x = element_text(size = 15),
+        axis.text.y = element_text(size = 15),
+        plot.title = element_text(family="Helvetica", face="bold", size = 28),
+        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
+        legend.title = element_blank(),
+        legend.text=element_text(size=12),
+        legend.position = "bottom") 
+
+
+

+
+
+
+
+
+

d) Number of pastoral livestock per subcounty

+
+
+Code +
livestock_area_subcounty %>%
+  ggplot() + 
+  geom_col(aes(x= reorder(sub_county, pasto_livestock), y = pasto_livestock, fill = county), width = 0.95) + 
+  geom_text(aes(x = sub_county, y = pasto_livestock, label = pasto_livestock),
+            hjust = 1, nudge_y = 125000) +
+  scale_fill_brewer(palette = "OrRd") +
+  coord_flip() + 
+  guides(fill = guide_legend(nrow = 1)) +
+  labs(x = "Subcounty",
+       y = "Number of Pastoral Livestock",
+       fill = "County",
+       title = "",
+       caption = "") +
+  theme_minimal() +
+  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +
+  theme(axis.title.x =element_blank(),
+        axis.title.y =element_blank(),
+        axis.text.x = element_text(size = 12),
+        axis.text.y = element_text(size = 12),
+        plot.title = element_text(family="Helvetica", face="bold", size = 20),
+        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
+        plot.margin = unit(c(1, 1, 1, 1), "cm"),
+        panel.grid.major = element_blank(),
+        panel.grid.minor = element_blank(),
+        legend.title = element_text(size = 10),
+        legend.text=element_text(size=8),
+        legend.position = "top")
+
+
+

+
+
+
+
+

e) Number of pastoral livestock per household at the county level

+
+
+Code +
lac_2 <- livestock_area_county %>%
+  ggplot() + 
+  geom_col(aes(x= reorder(county, pasto_livestock_household), y = pasto_livestock_household, fill = county)) + 
+geom_text(aes(x = county, y = pasto_livestock_household, label = pasto_livestock_household), hjust = 1, nudge_y = 5) +
+  scale_fill_brewer(palette = "OrRd") +
+  coord_flip() + 
+  labs(x = "County",
+       y = "Number of Pastoral Livestock per household",
+       title = "",
+       caption = "") +
+  theme_minimal() +
+  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +
+  theme(axis.title.x =element_blank(), 
+        axis.title.y =element_blank(),
+        axis.text.x = element_text(size = 12),
+        axis.text.y = element_text(size = 12),
+        plot.title = element_text(family="Helvetica", face="bold", size = 20),
+        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
+        panel.grid.major = element_blank(),
+        panel.grid.minor = element_blank(),
+        legend.title = element_blank(),
+        legend.position = "none",
+        plot.margin = unit(c(1, 1, 1, 1), "cm"))
+
+# Create a patchwork plot with the map and the bar graph
+
+lac_2 + p2
+
+
+

+
+
+
+
+

f) Number of pastoral livestock per household at the subcounty level

+
+
+Code +
livestock_area_subcounty %>%
+  ggplot() + 
+  geom_col(aes(x= reorder(sub_county, pasto_livestock_household), y = pasto_livestock_household, fill = county), width = 0.95) + 
+  scale_fill_brewer(palette = "OrRd") +
+  geom_text(aes(x = sub_county, y = pasto_livestock_household, label = pasto_livestock_household),
+            hjust = 1, nudge_y = 5) +
+  coord_flip() + 
+  guides(fill = guide_legend(nrow = 1)) +
+  labs(x = "Subcounty",
+       y = "Number of Pastoral Livestock per household",
+       fill = "County",
+       title = "",
+       caption = "") +
+  theme_minimal() +
+  scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.01))) +
+  theme(axis.title.x =element_blank(),
+        axis.title.y =element_blank(),
+        axis.text.x = element_text(size = 12),
+        axis.text.y = element_text(size = 12),
+        plot.title = element_text(family="Helvetica", face="bold", size = 20),
+        plot.subtitle = element_text(family="Helvetica", face="bold", size = 15),
+        plot.caption = element_text(family = "Helvetica",size = 12, face = "bold"),
+        plot.margin = unit(c(1, 1, 1, 1), "cm"),
+        panel.grid.major = element_blank(),
+        panel.grid.minor = element_blank(),
+        legend.title = element_text(size = 10),
+        legend.text=element_text(size=8),
+        legend.position = "top")
+
+
+

+
+
+
+
+
+

Section 5: Conclusion

+

In this post, I have assessed the pastoral livestock (indigenous cattle, sheep, and goats) populations in the Kenyan counties described as “disturbed and dangerous.” A major contributor to this classification is the banditry, livestock theft, and limited amounts of pasture and water available to livestock owners. To get a better sense of the number of households engaged in farming and the pastoral livestock populations in these counties, I performed an exploratory data analysis and visualized my results.

+

Key findings from the study were that: 1. Turkana and Samburu had some of the highest numbers of pastoral livestock, yet they had the fewest numbers of households engaged in farming. This meant that both these counties had the highest average livestock to farming household ratios in the region (Turkana: 55 per household and Samburu: 36 per household). 2. The top three (3) subcounties with the highest average ratios of pastoral livestock to farming households were in Turkana. They included Turkana East (126), Kibish (96), and Turkana West (56). Surprisingly, counties such as Keiyo North (4), Koibatek (4), and Nyahururu (4) had very low ratios of pastoral livestock to farming household ratios despite having relatively high numbers of farming households. This may have resulted from the unsuitability of the land for grazing animals, small average land sizes per farming household, a switch to exotic dairy and beef livestock, or simply, a preference for crop, rather than livestock farming.

+

In the next analysis, I will assess the livestock numbers per area (square kilometers), the numbers of indigenous cattle, sheep, and goats in every county and subcounty, as well as the other animal husbandry and/or crop farming activities that take place in the region. The reader is encouraged to use this code and data package (rKenyaCensus) to come up with their own analyses to share with the world.

+ + +
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-13-1.png b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-13-1.png new file mode 100644 index 0000000..65c959e Binary files /dev/null and b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-13-1.png differ diff --git a/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-14-1.png b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-14-1.png new file mode 100644 index 0000000..641ac7a Binary files /dev/null and b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-14-1.png differ diff --git a/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-15-1.png b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-15-1.png new file mode 100644 index 0000000..159506c Binary files /dev/null and b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-15-1.png differ diff --git a/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-16-1.png b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-16-1.png new file mode 100644 index 0000000..61a5dc2 Binary files /dev/null and b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-16-1.png differ diff --git a/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-17-1.png b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-17-1.png new file mode 100644 index 0000000..d74d10f Binary files /dev/null and b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-17-1.png differ diff --git a/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-18-1.png b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-18-1.png new file mode 100644 index 0000000..db9b123 Binary files /dev/null and b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-18-1.png differ diff --git a/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-2-1.png b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-2-1.png new file mode 100644 index 0000000..aba528e Binary files /dev/null and b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-2-1.png differ diff --git a/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-3-1.png b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-3-1.png new file mode 100644 index 0000000..ea83c2e Binary files /dev/null and b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-3-1.png differ diff --git a/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-4-1.png b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-4-1.png new file mode 100644 index 0000000..90a5d22 Binary files /dev/null and b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-5-1.png b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-5-1.png new file mode 100644 index 0000000..714f92c Binary files /dev/null and b/_site/posts/code_along_with_me/new_post_1/danger_disturb_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/_site/posts/data_stories/asbestos_roof_kenya/asbestos_roof_kenya.html b/_site/posts/data_stories/asbestos_roof_kenya/asbestos_roof_kenya.html new file mode 100644 index 0000000..22abf36 --- /dev/null +++ b/_site/posts/data_stories/asbestos_roof_kenya/asbestos_roof_kenya.html @@ -0,0 +1,572 @@ + + + + + + + + + + + +William Okech - A roof over your head + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

A roof over your head

+

The prevalence of asbestos-based roofing in Kenya and its potential effects

+
+
RStudio
+
R
+
Data Stories
+
Blog
+
Kenya Census
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

December 1, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

An aerial view of most settlements in Kenya will demonstrate that many residential home roofs are constructed using iron sheets.

+

Indeed, this is confirmed by the Kenya Population and Housing Census (2019) report 1 2 where we see that 4 out of every 5 households (total number = 12,043,016) in Kenya is roofed using iron sheets (Figure 1). Overall, the top 5 building materials are iron sheets (80.3%), concrete (8.2%), grass/twigs (5.1%), makuti (sun-dried coconut palm leaves; 1.6%), and asbestos (1.4%). Despite the widespread use of iron sheets, it is surprising to note that 1.4% (2.2% urban and 0.9% rural) of residential household roofs (which is approximately 170,000) are covered with asbestos-based roofing materials (NB: this figure does not include public buildings such as educational institutions and government facilities).

+
+
+

+
+
+

Figure 1: Roof types in Kenya (visualizations generated in RStudio)

+
+
+

Asbestos and its potential risks

+

Asbestos refers to a class of six minerals that naturally form a bundle of fibers. These fibers have many properties that make them attractive, including a lack of electrical conductivity, and chemical, heat, and fire resistance. Historically, asbestos has been used for various commercial and industrial applications, including roofing shingles, automobile brakes, and textured paints for walls and ceilings 3. However, using asbestos for products that come into regular contact with humans is quite problematic. Why? Asbestos is a known human carcinogen, and the primary risk factor for most mesotheliomas is asbestos exposure 4 5. Furthermore, asbestos exposure (depending on the frequency, amount, and type) can cause asbestosis, pleural disease, and cancer. If asbestos-based materials remain intact, there is minimal risk to the user, but if materials are damaged via natural degradation or during home demolition and remodeling, tiny asbestos fibers will be released into the air 6 7. In Kenya, Legal Notice No. 121 of the Environmental Management and Coordination (Waste Management) Regulations (2006)8 states that waste containing asbestos is classified as hazardous. Why should Kenyans be concerned about this? In the 2013/2014 financial year, Kenya spent approximately one-tenth of its total health budget on asbestos-related cancers 9 10 11.

+

Where do we find high numbers of asbestos-based roofs in Kenya? As previously stated, 1.4% of households in Kenya have asbestos-based roofs. Figure 2 demonstrates the percentage of households with asbestos-based roofs in every county in Kenya. Interestingly, 4 (Nairobi, Kajiado, Machakos, and Kiambu) out of the top 6 counties (from a total of 47) fall within the Nairobi Metropolitan region.

+
+
+

Where do we find high numbers of asbestos-based roofs in Kenya?

+

As previously stated, 1.4% of households in Kenya have asbestos-based roofs. Figure 2 demonstrates the percentage of households with asbestos-based roofs in every county in Kenya. Interestingly, 4 (Nairobi, Kajiado, Machakos, and Kiambu) out of the top 6 counties (from a total of 47) fall within the Nairobi Metropolitan region.

+
+
+

+
+
+

Figure 2: Percentage(%) of households with asbestos-based roofs distributed by county (visualizations generated using RStudio)

+

Next, I investigated the subcounties with the highest number of households with asbestos-based roofs. The top 5 subcounties are located within Nairobi county, with Embakasi subcounty taking the lead with just over 8,000 households.

+
+
+

+
+
+

Figure 3: The top ten subcounties with the highest number of households that have asbestos-based roofs (visualizations generated using RStudio)

+
+
+

Conclusion

+

Overall, this study demonstrates that a notable proportion of Kenyan households used asbestos-based roofing materials, with Nairobi Metropolitan county accounting for the largest number of households. It is widely acknowledged that asbestos is harmful to our health, and asbestos-related diseases impose a significant burden on the economy. However, the impact of these roofs on the health of residents may not be fully apparent as asbestos exposure may also occur in various settings such as educational facilities and government institutions. To lessen the impact of asbestos exposure, it would be beneficial for local/county governments to educate residents about the dangers of asbestos and facilitate the complex and costly removal of asbestos-based roofing materials.

+ + +
+ + +

Footnotes

+ +
    +
  1. Kenya National Bureau of Statistics. The 2019 Kenya Population and Housing Census. Volume I: Population by County and Sub-County and Volume III: Distribution of Population by Age and Sex.↩︎

  2. +
  3. Shelmith Kariuki (2020). rKenyaCensus: 2019 Kenya Population and Housing Census Results. R package version 0.0.2.↩︎

  4. +
  5. Learn About Asbestos (no date) EPA. Environmental Protection Agency. Available at: https://www.epa.gov/asbestos/learn-about-asbestos (Accessed: December 1, 2022).↩︎

  6. +
  7. Asbestos exposure and cancer risk fact sheet (no date) National Cancer Institute. Available at: https://www.cancer.gov/about-cancer/causes-prevention/risk/substances/asbestos/asbestos-fact-sheet (Accessed: December 1, 2022).↩︎

  8. +
  9. Asbestos (no date) World Health Organization. World Health Organization. Available at: https://www.iarc.who.int/risk-factor/asbestos/ (Accessed: December 1, 2022).↩︎

  10. +
  11. Learn About Asbestos (no date) EPA. Environmental Protection Agency. Available at: https://www.epa.gov/asbestos/learn-about-asbestos (Accessed: December 1, 2022).↩︎

  12. +
  13. Asbestos and your health (2016) Centers for Disease Control and Prevention. Centers for Disease Control and Prevention. Available at: https://www.atsdr.cdc.gov/asbestos/index.html (Accessed: December 1, 2022).↩︎

  14. +
  15. Act Title: ENVIRONMENTAL MANAGEMENT AND CO-ORDINATION (no date) No. 8 of 1999. Available at: http://kenyalaw.org:8181/exist/kenyalex/sublegview.xql?subleg=No.+8+of+1999 (Accessed: December 1, 2022).↩︎

  16. +
  17. Okoth, D. (2013) Slow transition from use of asbestos raises concern as cancer cases rise, The Standard. Available at: https://www.standardmedia.co.ke/lifestyle/article/2000096118/slow-transition-from-use-of-asbestos-raises-concern-as-cancer-cases-rise (Accessed: December 1, 2022).↩︎

  18. +
  19. GCR, S. (2016) Kenya faces cancer epidemic caused by asbestos roofs, Global Construction Review. Available at: https://www.globalconstructionreview.com/kenya-faces-cancer-epid7emic-caus7ed-asbe7stos/ (Accessed: December 1, 2022).↩︎

  20. +
  21. Irungu, S. (2020) Exposure to the noxious asbestos needs to be alleviated with a lot of care, Kenya Climate Innovation Center (KCIC). Available at: https://www.kenyacic.org/2019/11/exposure-to-the-noxious-asbestos-needs-to-be-alleviated-with-a-lot-of-care/ (Accessed: December 1, 2022).↩︎

  22. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/posts/series_2/new_post_1/data_story_headline.png b/_site/posts/data_stories/asbestos_roof_kenya/data_story_headline.png similarity index 100% rename from posts/series_2/new_post_1/data_story_headline.png rename to _site/posts/data_stories/asbestos_roof_kenya/data_story_headline.png diff --git a/posts/series_2/new_post_2/images/all_counties_asbestos_barplot_map.png b/_site/posts/data_stories/asbestos_roof_kenya/images/all_counties_asbestos_barplot_map.png similarity index 100% rename from posts/series_2/new_post_2/images/all_counties_asbestos_barplot_map.png rename to _site/posts/data_stories/asbestos_roof_kenya/images/all_counties_asbestos_barplot_map.png diff --git a/posts/series_2/new_post_2/images/national_treemap.png b/_site/posts/data_stories/asbestos_roof_kenya/images/national_treemap.png similarity index 100% rename from posts/series_2/new_post_2/images/national_treemap.png rename to _site/posts/data_stories/asbestos_roof_kenya/images/national_treemap.png diff --git a/posts/series_2/new_post_2/images/top_households_asbestos_raw.png b/_site/posts/data_stories/asbestos_roof_kenya/images/top_households_asbestos_raw.png similarity index 100% rename from posts/series_2/new_post_2/images/top_households_asbestos_raw.png rename to _site/posts/data_stories/asbestos_roof_kenya/images/top_households_asbestos_raw.png diff --git a/_site/posts/data_stories/cas_kenya_viz/cas_kenya_viz.html b/_site/posts/data_stories/cas_kenya_viz/cas_kenya_viz.html new file mode 100644 index 0000000..d353a2b --- /dev/null +++ b/_site/posts/data_stories/cas_kenya_viz/cas_kenya_viz.html @@ -0,0 +1,603 @@ + + + + + + + + + + + +William Okech - Data Visualization for Government + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

Data Visualization for Government

+

Visualizing HR data from the Chief Administrative Secretary recruitment process in Kenya

+
+
RStudio
+
R
+
Data Stories
+
Blog
+
Data Visualization
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

April 13, 2023

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

In January 2018, former President Uhuru Kenyatta announced the creation of the Chief Administrative Secretary (CAS) position. The primary role of the appointees would be to “help the Cabinet Secretary to better coordinate the running of the affairs of the respective ministries 1.” Though highly controversial at the time (as civil society groups 2 challenged the legality of the position), the High Court (in February 2023) eventually ruled that the decision to create the new position was lawful 3.

+

The Public Service Commission (PSC) of Kenya is responsible for government recruitment, and Section 234 of the Constitution of Kenya states that part of its mandate includes the “establishment of public offices” and “appointment of persons to those offices.” The most recent call for applications to the CAS position was circulated in October 2022 4. Here, the PSC listed the roles, responsibilities, and requirements for all applicants. To ensure transparency in the recruitment process, the PSC provided a list of all the applicants 5, the shortlisted applicants [5], and a schedule of the interview times [5]. After the interview, the names of successful candidates were forwarded to the President for appointment. On March 16th, 2023, President William Ruto appointed fifty (50) individuals to the CAS position, which was more than double the number of positions created by the PSC. The legality of these appointments has already been challenged in court and we await the final verdict 6 7.

+

Controversy aside, the goal of this article is to provide a data-driven assessment of publicly available recruitment information (specifically gender, disability status, and county of origin) to determine whether principles of diversity, equity, and inclusion (DEI) were promoted during the hiring process.

+
+
+

Summary of Findings

+
    +
  1. Only 26% of the CAS appointments went to women, even though 33% of applicants and 36% of those shortlisted were female. This may not be in line with the “two-thirds gender rule” outlined in Article 27(8) of the Constitution of Kenya (2010) which states that “not more than two-thirds of the members of elective or appointive bodies shall be of the same gender.”
  2. +
+
+
+

+
+
+
    +
  1. Two percent (2%) of the nominees were Persons with Disabilities (PWDs), which is representative of the percentage of Kenyans (at the national level) living with a disability (2.2%) 8. However, institutions such as the National Gender and Equality Commission should develop programs to increase the number of applicants from marginalized groups.
  2. +
+
+
+

+
+
+
    +
  1. In total, there were 133 applicants from Busia County (2.5%). Despite this, it was the only county that did not have a representative on the shortlist. To remedy this, the PSC may need to put in place affirmative action policies that ensure that at least one applicant from each county makes it to the shortlist.
  2. +
+
+
+

+
+
+
    +
  1. Nine counties are not represented in the final list of nominees.
  2. +
+
+
+

+
+
+
    +
  1. Ten counties have more than one CAS nominee.
  2. +
+
+
+

+
+
+
+
+

Conclusion

+

Overall, this article provided a set of data visualizations to help understand the demographics of applicants for the CAS position in Kenya. Specifically, the gender, disability status, and county of origin were assessed at the applicant, shortlisting, and nomination stages. This study had several key findings:

+
    +
  1. Less than one-third of the nominees were women even though 36% of the nominees shortlisted were female.
  2. +
  3. Only one nominee was a person with a disability (PWD).
  4. +
  5. One county (Busia) failed to have any applicants make it to the shortlist of 240 and nine counties in total had zero nominees out of 50.
  6. +
+

Future work will assess the data at a more granular level and determine the counties where there were low numbers of total applicants, low numbers of women at the application and shortlisting stage, and also determine which counties did not have PWD applicants. Additionally, it would be helpful if the PSC could provide age and education level data to help perform a more thorough analysis of the recruitment process. Lastly, it is commendable that the PSC makes this information publicly available, and I hope that further analysis of the data can result in initiatives to diversify the candidate pool at the applicant and shortlisting stages.

+
+
+

Fun Fact

+

The most common applicant first names (male and female).

+
+
+

+
+
+ + +
+ + +

Footnotes

+ +
    +
  1. PSCU (2018) Uhuru Kenyatta’s Full Statement On New Cabinet. Available at: https://www.citizen.digital/news/uhuru-kenyattas-full-statement-on-new-cabinet-189331 (Accessed: April 2, 2023).↩︎

  2. +
  3. Betty Njeru (2018) Activist Omtatah moves to court challenging the creation of new cabinet positions. Available at: https://www.standardmedia.co.ke/kenya/article/2001267705/okiya-omtatah-moves-to-court-challenging-new-cabinet-positions (Accessed: April 6, 2023).↩︎

  4. +
  5. Correspondent (2023) Labour Court Clears Way For Appointment Of Chief Administrative Secretaries. Available at: https://www.capitalfm.co.ke/news/2023/02/labour-court-clears-the-way-for-appointment-of-chief-administrative-secretaries/ (Accessed: April 10, 2023).↩︎

  6. +
  7. PSC (2023) CALL FOR APPLICATIONS TO THE POSITION OF CHIEF ADMINISTRATIVE SECRETARY IN THE PUBLIC SERVICE. Available at: https://www.publicservice.go.ke/index.php/media-center/2/202-call-for-applications-to-the-position-of-chief-administrative-secretary-in-the-public-service (Accessed: April 10, 2023).↩︎

  8. +
  9. PSC (2023) Shortlisted Candidates. Available at: https://www.publicservice.go.ke/index.php/recruitment/shortlisted-candidates (Accessed: April 10, 2023).↩︎

  10. +
  11. Susan Muhindi (2023) Ruto’s 50 CAS nomination list challenged in court. Available at: https://www.the-star.co.ke/news/2023-03-20-rutos-50-cas-nomination-list-challenged-in-court/ (Accessed: April 2, 2023).↩︎

  12. +
  13. Emmanuel Wanjala (2023). Available at: https://www.the-star.co.ke/news/2023-03-17-ruto-has-created-27-illegal-cas-posts-lsks-eric-theuri/ (Accessed: April 5, 2023).↩︎

  14. +
    1. +
    2. Shelmith Kariuki (2020). rKenyaCensus: 2019 Kenya Population and Housing Census Results. R package version 0.0.2.
    3. +
    +↩︎
  15. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/posts/series_2/new_post_2/data_story_headline.png b/_site/posts/data_stories/cas_kenya_viz/data_story_headline.png similarity index 100% rename from posts/series_2/new_post_2/data_story_headline.png rename to _site/posts/data_stories/cas_kenya_viz/data_story_headline.png diff --git a/posts/series_2/new_post_3/images/all_cas_first_name.png b/_site/posts/data_stories/cas_kenya_viz/images/all_cas_first_name.png similarity index 100% rename from posts/series_2/new_post_3/images/all_cas_first_name.png rename to _site/posts/data_stories/cas_kenya_viz/images/all_cas_first_name.png diff --git a/posts/series_2/new_post_3/images/app_stl_nom_cas_gender.jpg b/_site/posts/data_stories/cas_kenya_viz/images/app_stl_nom_cas_gender.jpg similarity index 100% rename from posts/series_2/new_post_3/images/app_stl_nom_cas_gender.jpg rename to _site/posts/data_stories/cas_kenya_viz/images/app_stl_nom_cas_gender.jpg diff --git a/posts/series_2/new_post_3/images/nom_county_equal_0.jpg b/_site/posts/data_stories/cas_kenya_viz/images/nom_county_equal_0.jpg similarity index 100% rename from posts/series_2/new_post_3/images/nom_county_equal_0.jpg rename to _site/posts/data_stories/cas_kenya_viz/images/nom_county_equal_0.jpg diff --git a/posts/series_2/new_post_3/images/nom_county_over_1.jpg b/_site/posts/data_stories/cas_kenya_viz/images/nom_county_over_1.jpg similarity index 100% rename from posts/series_2/new_post_3/images/nom_county_over_1.jpg rename to _site/posts/data_stories/cas_kenya_viz/images/nom_county_over_1.jpg diff --git a/posts/series_2/new_post_3/images/pwd.jpg b/_site/posts/data_stories/cas_kenya_viz/images/pwd.jpg similarity index 100% rename from posts/series_2/new_post_3/images/pwd.jpg rename to _site/posts/data_stories/cas_kenya_viz/images/pwd.jpg diff --git a/posts/series_2/new_post_3/images/stl_cas_county.jpg b/_site/posts/data_stories/cas_kenya_viz/images/stl_cas_county.jpg similarity index 100% rename from posts/series_2/new_post_3/images/stl_cas_county.jpg rename to _site/posts/data_stories/cas_kenya_viz/images/stl_cas_county.jpg diff --git a/posts/series_2/new_post_3/data_story_headline.png b/_site/posts/data_stories/kenya_gender_dist/data_story_headline.png similarity index 100% rename from posts/series_2/new_post_3/data_story_headline.png rename to _site/posts/data_stories/kenya_gender_dist/data_story_headline.png diff --git a/posts/series_2/new_post_1/images/age_sex_ratio_2.png b/_site/posts/data_stories/kenya_gender_dist/images/age_sex_ratio_2.png similarity index 100% rename from posts/series_2/new_post_1/images/age_sex_ratio_2.png rename to _site/posts/data_stories/kenya_gender_dist/images/age_sex_ratio_2.png diff --git a/posts/series_2/new_post_1/images/barplot_map.png b/_site/posts/data_stories/kenya_gender_dist/images/barplot_map.png similarity index 100% rename from posts/series_2/new_post_1/images/barplot_map.png rename to _site/posts/data_stories/kenya_gender_dist/images/barplot_map.png diff --git a/posts/series_2/new_post_1/images/sex_ratio.png b/_site/posts/data_stories/kenya_gender_dist/images/sex_ratio.png similarity index 100% rename from posts/series_2/new_post_1/images/sex_ratio.png rename to _site/posts/data_stories/kenya_gender_dist/images/sex_ratio.png diff --git a/posts/series_2/new_post_1/images/top_bottom_plot.png b/_site/posts/data_stories/kenya_gender_dist/images/top_bottom_plot.png similarity index 100% rename from posts/series_2/new_post_1/images/top_bottom_plot.png rename to _site/posts/data_stories/kenya_gender_dist/images/top_bottom_plot.png diff --git a/_site/posts/data_stories/kenya_gender_dist/kenya_gender_dist.html b/_site/posts/data_stories/kenya_gender_dist/kenya_gender_dist.html new file mode 100644 index 0000000..b33ca5c --- /dev/null +++ b/_site/posts/data_stories/kenya_gender_dist/kenya_gender_dist.html @@ -0,0 +1,551 @@ + + + + + + + + + + + +William Okech - Exploring gender distributions in Kenya: Are there more women than men? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

Exploring gender distributions in Kenya: Are there more women than men?

+

Insights from the Kenya Population and Housing Census (2019)

+
+
RStudio
+
R
+
Data Stories
+
Blog
+
Kenya Census
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

June 30, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +

To answer this question, I reviewed the Kenya Population and Housing Census (2019)12 report, which provides data on population by sex and age at the county and subcounty levels. This analysis was inspired by Rose Mintzer-Sweeney’s article “Sex and the Census,” published on the Datawrapper website 3.

+

Various biological, cultural, public health, and economic factors can influence the global human sex ratio. For instance, at birth, the human sex ratio is “male-biased,” with approximately 105 males born per 100 girls. However, with increasing age, the susceptibility to infectious diseases, sex-selective abortions, and higher life expectancies for women can cause fluctuations in the human sex ratio4. The total Kenyan population in 2019 (according to the census) was 47,564,296. When I compared the number of males to females at the national level (Figure 1), I found that there were 98 males for every 100 females in the country5.

+

+

Figure 1: At the national level, there are more females compared with males

+

Knowing there were more females than males, I sought to determine whether these differences persisted across all age groups (Figure 2).

+

+

Figure 2: There are more males than females between 0–19 yrs and between 36–58 yrs.

+

As expected, I observed a higher number of males compared with females between 0 to 18 years. One reason could be the higher male-to-female ratio seen at birth globally6. Between the ages of 19 to 34 years, the male-to-female ratio decreases rapidly, while from 35 to 56 years, the ratio increases rapidly. The cause of this fluctuation is not apparent, but various public health factors may be responsible for the shifts observed within these age groups. Finally, the number of males compared with females steadily decreases after age 60. One reason for this could be the prevalence of medical conditions that disproportionately affect men. Additionally, the decrease in the number of males to females could result from increases in life expectancy favoring women, as demonstrated by the Economic Survey 2022, which shows that the male life expectancy is 60.6 vs. 66.5 for females7.

+

By focusing on the national human sex ratio, we may assume that the male-to-female ratio across all the regions in Kenya is equal. Kenya has 47 diverse counties with different population densities, climatic conditions, economic opportunities, and levels of development. Not surprisingly, we find (Figure 3) that there is a wide range of human sex ratios (90–120 males per 100 females) across the different counties (administrative units).

+

+

Figure 3: Counties in the West of Kenya have higher female-to-male ratios, while counties in the North-East have higher male-female ratios.

+

The highest sex ratio is found in Garissa county (120 males per 100 females), and the lowest is observed in Siaya county (90 males per 100 females). Many counties with low sex ratios (more females) are primarily located in the west of Kenya, and counties with high sex ratios (more males) are found in the north of Kenya. According to the Economic Survey (2022) [^8], male life expectancy in the west of Kenya is the lowest in the country. Homa Bay and Migori recorded a life expectancy of 50.5 years, which was approximately 10 years lower than that of females in the respective counties. This is against a difference of 3 to 5 years lower for males in some of the counties in the north of Kenya.

+

Within each of Kenya’s 47 counties are smaller administrative units known as subcounties. For the final analysis, I thought it would be interesting to see whether the patterns observed at the county level were consistent across the various subcounties.

+

+

Figure 4: Balambala subcounty has the highest female-to-male ratio, while Mandera Central has the lowest.

+

Having just observed that counties in the north of Kenya had the highest number of males per 100 females, I was surprised to find that Mandera Central (Mandera County) and Tarbaj (Wajir County) subcounties in the north were among the subcounties with the lowest number of males per 100 females (Figure 4). Why females tend to concentrate within specific regions in these two counties may be an interesting aspect to investigate in future studies.

+

Overall, many factors may affect the human sex ratio at the county and subcounty levels and cause the differences in the human sex ratio seen with age. High rural-urban migration, public health factors (including the prevalence of various communicable and non-communicable diseases), climate, and location’s primary source of employment may skew the number of males to females in certain subcounties. Therefore, future investigations should focus on the causes of these variations in the human sex ratio and the implications for administrative planning at the national, county, and subcounty levels.

+
+

+ + +
+ + +

Footnotes

+ +
    +
  1. Kenya National Bureau of Statistics. The 2019 Kenya Population and Housing Census. Volume I: Population by County and SubCounty and Volume III: Distribution of Population by Age and Sex.↩︎

  2. +
  3. Shelmith Kariuki (2020). rKenyaCensus: 2019 Kenya Population and Housing Census Results. R package version 0.0.2.↩︎

  4. +
  5. Rose Mintzer-Sweeney’s article: https://blog.datawrapper.de/gender-ratio-american-history/↩︎

  6. +
  7. Hannah Ritchie and Max Roser (2019) - “Gender Ratio.” Published online at OurWorldInData.org. Retrieved from: ‘https://ourworldindata.org/gender-ratio’ [Online Resource]↩︎

  8. +
  9. Additionally, there were also 1,524 individuals classified as intersex, but their low numbers prevented their inclusion in the analysis.↩︎

  10. +
  11. Hannah Ritchie and Max Roser (2019) - “Gender Ratio.” Published online at OurWorldInData.org. Retrieved from: ‘https://ourworldindata.org/gender-ratio’ [Online Resource]↩︎

  12. +
  13. Kenya National Bureau of Statistics. The Economic Survey 2022↩︎

  14. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/data_stories/new_post_1/data_story_headline.png b/_site/posts/data_stories/new_post_1/data_story_headline.png new file mode 100644 index 0000000..56736c7 Binary files /dev/null and b/_site/posts/data_stories/new_post_1/data_story_headline.png differ diff --git a/_site/posts/data_stories/new_post_1/images/age_sex_ratio_2.png b/_site/posts/data_stories/new_post_1/images/age_sex_ratio_2.png new file mode 100644 index 0000000..3088314 Binary files /dev/null and b/_site/posts/data_stories/new_post_1/images/age_sex_ratio_2.png differ diff --git a/_site/posts/data_stories/new_post_1/images/barplot_map.png b/_site/posts/data_stories/new_post_1/images/barplot_map.png new file mode 100644 index 0000000..750d559 Binary files /dev/null and b/_site/posts/data_stories/new_post_1/images/barplot_map.png differ diff --git a/_site/posts/data_stories/new_post_1/images/sex_ratio.png b/_site/posts/data_stories/new_post_1/images/sex_ratio.png new file mode 100644 index 0000000..a78b0f0 Binary files /dev/null and b/_site/posts/data_stories/new_post_1/images/sex_ratio.png differ diff --git a/_site/posts/data_stories/new_post_1/images/top_bottom_plot.png b/_site/posts/data_stories/new_post_1/images/top_bottom_plot.png new file mode 100644 index 0000000..9f3a6f8 Binary files /dev/null and b/_site/posts/data_stories/new_post_1/images/top_bottom_plot.png differ diff --git a/_site/posts/data_stories/new_post_1/post_1.html b/_site/posts/data_stories/new_post_1/post_1.html new file mode 100644 index 0000000..b33ca5c --- /dev/null +++ b/_site/posts/data_stories/new_post_1/post_1.html @@ -0,0 +1,551 @@ + + + + + + + + + + + +William Okech - Exploring gender distributions in Kenya: Are there more women than men? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

Exploring gender distributions in Kenya: Are there more women than men?

+

Insights from the Kenya Population and Housing Census (2019)

+
+
RStudio
+
R
+
Data Stories
+
Blog
+
Kenya Census
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

June 30, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +

To answer this question, I reviewed the Kenya Population and Housing Census (2019)12 report, which provides data on population by sex and age at the county and subcounty levels. This analysis was inspired by Rose Mintzer-Sweeney’s article “Sex and the Census,” published on the Datawrapper website 3.

+

Various biological, cultural, public health, and economic factors can influence the global human sex ratio. For instance, at birth, the human sex ratio is “male-biased,” with approximately 105 males born per 100 girls. However, with increasing age, the susceptibility to infectious diseases, sex-selective abortions, and higher life expectancies for women can cause fluctuations in the human sex ratio4. The total Kenyan population in 2019 (according to the census) was 47,564,296. When I compared the number of males to females at the national level (Figure 1), I found that there were 98 males for every 100 females in the country5.

+

+

Figure 1: At the national level, there are more females compared with males

+

Knowing there were more females than males, I sought to determine whether these differences persisted across all age groups (Figure 2).

+

+

Figure 2: There are more males than females between 0–19 yrs and between 36–58 yrs.

+

As expected, I observed a higher number of males compared with females between 0 to 18 years. One reason could be the higher male-to-female ratio seen at birth globally6. Between the ages of 19 to 34 years, the male-to-female ratio decreases rapidly, while from 35 to 56 years, the ratio increases rapidly. The cause of this fluctuation is not apparent, but various public health factors may be responsible for the shifts observed within these age groups. Finally, the number of males compared with females steadily decreases after age 60. One reason for this could be the prevalence of medical conditions that disproportionately affect men. Additionally, the decrease in the number of males to females could result from increases in life expectancy favoring women, as demonstrated by the Economic Survey 2022, which shows that the male life expectancy is 60.6 vs. 66.5 for females7.

+

By focusing on the national human sex ratio, we may assume that the male-to-female ratio across all the regions in Kenya is equal. Kenya has 47 diverse counties with different population densities, climatic conditions, economic opportunities, and levels of development. Not surprisingly, we find (Figure 3) that there is a wide range of human sex ratios (90–120 males per 100 females) across the different counties (administrative units).

+

+

Figure 3: Counties in the West of Kenya have higher female-to-male ratios, while counties in the North-East have higher male-female ratios.

+

The highest sex ratio is found in Garissa county (120 males per 100 females), and the lowest is observed in Siaya county (90 males per 100 females). Many counties with low sex ratios (more females) are primarily located in the west of Kenya, and counties with high sex ratios (more males) are found in the north of Kenya. According to the Economic Survey (2022) [^8], male life expectancy in the west of Kenya is the lowest in the country. Homa Bay and Migori recorded a life expectancy of 50.5 years, which was approximately 10 years lower than that of females in the respective counties. This is against a difference of 3 to 5 years lower for males in some of the counties in the north of Kenya.

+

Within each of Kenya’s 47 counties are smaller administrative units known as subcounties. For the final analysis, I thought it would be interesting to see whether the patterns observed at the county level were consistent across the various subcounties.

+

+

Figure 4: Balambala subcounty has the highest female-to-male ratio, while Mandera Central has the lowest.

+

Having just observed that counties in the north of Kenya had the highest number of males per 100 females, I was surprised to find that Mandera Central (Mandera County) and Tarbaj (Wajir County) subcounties in the north were among the subcounties with the lowest number of males per 100 females (Figure 4). Why females tend to concentrate within specific regions in these two counties may be an interesting aspect to investigate in future studies.

+

Overall, many factors may affect the human sex ratio at the county and subcounty levels and cause the differences in the human sex ratio seen with age. High rural-urban migration, public health factors (including the prevalence of various communicable and non-communicable diseases), climate, and location’s primary source of employment may skew the number of males to females in certain subcounties. Therefore, future investigations should focus on the causes of these variations in the human sex ratio and the implications for administrative planning at the national, county, and subcounty levels.

+
+

+ + +
+ + +

Footnotes

+ +
    +
  1. Kenya National Bureau of Statistics. The 2019 Kenya Population and Housing Census. Volume I: Population by County and SubCounty and Volume III: Distribution of Population by Age and Sex.↩︎

  2. +
  3. Shelmith Kariuki (2020). rKenyaCensus: 2019 Kenya Population and Housing Census Results. R package version 0.0.2.↩︎

  4. +
  5. Rose Mintzer-Sweeney’s article: https://blog.datawrapper.de/gender-ratio-american-history/↩︎

  6. +
  7. Hannah Ritchie and Max Roser (2019) - “Gender Ratio.” Published online at OurWorldInData.org. Retrieved from: ‘https://ourworldindata.org/gender-ratio’ [Online Resource]↩︎

  8. +
  9. Additionally, there were also 1,524 individuals classified as intersex, but their low numbers prevented their inclusion in the analysis.↩︎

  10. +
  11. Hannah Ritchie and Max Roser (2019) - “Gender Ratio.” Published online at OurWorldInData.org. Retrieved from: ‘https://ourworldindata.org/gender-ratio’ [Online Resource]↩︎

  12. +
  13. Kenya National Bureau of Statistics. The Economic Survey 2022↩︎

  14. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/data_stories/new_post_2/data_story_headline.png b/_site/posts/data_stories/new_post_2/data_story_headline.png new file mode 100644 index 0000000..56736c7 Binary files /dev/null and b/_site/posts/data_stories/new_post_2/data_story_headline.png differ diff --git a/_site/posts/data_stories/new_post_2/images/all_counties_asbestos_barplot_map.png b/_site/posts/data_stories/new_post_2/images/all_counties_asbestos_barplot_map.png new file mode 100644 index 0000000..558316b Binary files /dev/null and b/_site/posts/data_stories/new_post_2/images/all_counties_asbestos_barplot_map.png differ diff --git a/_site/posts/data_stories/new_post_2/images/national_treemap.png b/_site/posts/data_stories/new_post_2/images/national_treemap.png new file mode 100644 index 0000000..31e700d Binary files /dev/null and b/_site/posts/data_stories/new_post_2/images/national_treemap.png differ diff --git a/_site/posts/data_stories/new_post_2/images/top_households_asbestos_raw.png b/_site/posts/data_stories/new_post_2/images/top_households_asbestos_raw.png new file mode 100644 index 0000000..3591aa2 Binary files /dev/null and b/_site/posts/data_stories/new_post_2/images/top_households_asbestos_raw.png differ diff --git a/_site/posts/data_stories/new_post_2/post_2.html b/_site/posts/data_stories/new_post_2/post_2.html new file mode 100644 index 0000000..22abf36 --- /dev/null +++ b/_site/posts/data_stories/new_post_2/post_2.html @@ -0,0 +1,572 @@ + + + + + + + + + + + +William Okech - A roof over your head + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

A roof over your head

+

The prevalence of asbestos-based roofing in Kenya and its potential effects

+
+
RStudio
+
R
+
Data Stories
+
Blog
+
Kenya Census
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

December 1, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

An aerial view of most settlements in Kenya will demonstrate that many residential home roofs are constructed using iron sheets.

+

Indeed, this is confirmed by the Kenya Population and Housing Census (2019) report 1 2 where we see that 4 out of every 5 households (total number = 12,043,016) in Kenya is roofed using iron sheets (Figure 1). Overall, the top 5 building materials are iron sheets (80.3%), concrete (8.2%), grass/twigs (5.1%), makuti (sun-dried coconut palm leaves; 1.6%), and asbestos (1.4%). Despite the widespread use of iron sheets, it is surprising to note that 1.4% (2.2% urban and 0.9% rural) of residential household roofs (which is approximately 170,000) are covered with asbestos-based roofing materials (NB: this figure does not include public buildings such as educational institutions and government facilities).

+
+
+

+
+
+

Figure 1: Roof types in Kenya (visualizations generated in RStudio)

+
+
+

Asbestos and its potential risks

+

Asbestos refers to a class of six minerals that naturally form a bundle of fibers. These fibers have many properties that make them attractive, including a lack of electrical conductivity, and chemical, heat, and fire resistance. Historically, asbestos has been used for various commercial and industrial applications, including roofing shingles, automobile brakes, and textured paints for walls and ceilings 3. However, using asbestos for products that come into regular contact with humans is quite problematic. Why? Asbestos is a known human carcinogen, and the primary risk factor for most mesotheliomas is asbestos exposure 4 5. Furthermore, asbestos exposure (depending on the frequency, amount, and type) can cause asbestosis, pleural disease, and cancer. If asbestos-based materials remain intact, there is minimal risk to the user, but if materials are damaged via natural degradation or during home demolition and remodeling, tiny asbestos fibers will be released into the air 6 7. In Kenya, Legal Notice No. 121 of the Environmental Management and Coordination (Waste Management) Regulations (2006)8 states that waste containing asbestos is classified as hazardous. Why should Kenyans be concerned about this? In the 2013/2014 financial year, Kenya spent approximately one-tenth of its total health budget on asbestos-related cancers 9 10 11.

+

Where do we find high numbers of asbestos-based roofs in Kenya? As previously stated, 1.4% of households in Kenya have asbestos-based roofs. Figure 2 demonstrates the percentage of households with asbestos-based roofs in every county in Kenya. Interestingly, 4 (Nairobi, Kajiado, Machakos, and Kiambu) out of the top 6 counties (from a total of 47) fall within the Nairobi Metropolitan region.

+
+
+

Where do we find high numbers of asbestos-based roofs in Kenya?

+

As previously stated, 1.4% of households in Kenya have asbestos-based roofs. Figure 2 demonstrates the percentage of households with asbestos-based roofs in every county in Kenya. Interestingly, 4 (Nairobi, Kajiado, Machakos, and Kiambu) out of the top 6 counties (from a total of 47) fall within the Nairobi Metropolitan region.

+
+
+

+
+
+

Figure 2: Percentage(%) of households with asbestos-based roofs distributed by county (visualizations generated using RStudio)

+

Next, I investigated the subcounties with the highest number of households with asbestos-based roofs. The top 5 subcounties are located within Nairobi county, with Embakasi subcounty taking the lead with just over 8,000 households.

+
+
+

+
+
+

Figure 3: The top ten subcounties with the highest number of households that have asbestos-based roofs (visualizations generated using RStudio)

+
+
+

Conclusion

+

Overall, this study demonstrates that a notable proportion of Kenyan households used asbestos-based roofing materials, with Nairobi Metropolitan county accounting for the largest number of households. It is widely acknowledged that asbestos is harmful to our health, and asbestos-related diseases impose a significant burden on the economy. However, the impact of these roofs on the health of residents may not be fully apparent as asbestos exposure may also occur in various settings such as educational facilities and government institutions. To lessen the impact of asbestos exposure, it would be beneficial for local/county governments to educate residents about the dangers of asbestos and facilitate the complex and costly removal of asbestos-based roofing materials.

+ + +
+ + +

Footnotes

+ +
    +
  1. Kenya National Bureau of Statistics. The 2019 Kenya Population and Housing Census. Volume I: Population by County and Sub-County and Volume III: Distribution of Population by Age and Sex.↩︎

  2. +
  3. Shelmith Kariuki (2020). rKenyaCensus: 2019 Kenya Population and Housing Census Results. R package version 0.0.2.↩︎

  4. +
  5. Learn About Asbestos (no date) EPA. Environmental Protection Agency. Available at: https://www.epa.gov/asbestos/learn-about-asbestos (Accessed: December 1, 2022).↩︎

  6. +
  7. Asbestos exposure and cancer risk fact sheet (no date) National Cancer Institute. Available at: https://www.cancer.gov/about-cancer/causes-prevention/risk/substances/asbestos/asbestos-fact-sheet (Accessed: December 1, 2022).↩︎

  8. +
  9. Asbestos (no date) World Health Organization. World Health Organization. Available at: https://www.iarc.who.int/risk-factor/asbestos/ (Accessed: December 1, 2022).↩︎

  10. +
  11. Learn About Asbestos (no date) EPA. Environmental Protection Agency. Available at: https://www.epa.gov/asbestos/learn-about-asbestos (Accessed: December 1, 2022).↩︎

  12. +
  13. Asbestos and your health (2016) Centers for Disease Control and Prevention. Centers for Disease Control and Prevention. Available at: https://www.atsdr.cdc.gov/asbestos/index.html (Accessed: December 1, 2022).↩︎

  14. +
  15. Act Title: ENVIRONMENTAL MANAGEMENT AND CO-ORDINATION (no date) No. 8 of 1999. Available at: http://kenyalaw.org:8181/exist/kenyalex/sublegview.xql?subleg=No.+8+of+1999 (Accessed: December 1, 2022).↩︎

  16. +
  17. Okoth, D. (2013) Slow transition from use of asbestos raises concern as cancer cases rise, The Standard. Available at: https://www.standardmedia.co.ke/lifestyle/article/2000096118/slow-transition-from-use-of-asbestos-raises-concern-as-cancer-cases-rise (Accessed: December 1, 2022).↩︎

  18. +
  19. GCR, S. (2016) Kenya faces cancer epidemic caused by asbestos roofs, Global Construction Review. Available at: https://www.globalconstructionreview.com/kenya-faces-cancer-epid7emic-caus7ed-asbe7stos/ (Accessed: December 1, 2022).↩︎

  20. +
  21. Irungu, S. (2020) Exposure to the noxious asbestos needs to be alleviated with a lot of care, Kenya Climate Innovation Center (KCIC). Available at: https://www.kenyacic.org/2019/11/exposure-to-the-noxious-asbestos-needs-to-be-alleviated-with-a-lot-of-care/ (Accessed: December 1, 2022).↩︎

  22. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/data_stories/new_post_3/data_story_headline.png b/_site/posts/data_stories/new_post_3/data_story_headline.png new file mode 100644 index 0000000..56736c7 Binary files /dev/null and b/_site/posts/data_stories/new_post_3/data_story_headline.png differ diff --git a/_site/posts/data_stories/new_post_3/images/all_cas_first_name.png b/_site/posts/data_stories/new_post_3/images/all_cas_first_name.png new file mode 100644 index 0000000..d40e444 Binary files /dev/null and b/_site/posts/data_stories/new_post_3/images/all_cas_first_name.png differ diff --git a/_site/posts/data_stories/new_post_3/images/app_stl_nom_cas_gender.jpg b/_site/posts/data_stories/new_post_3/images/app_stl_nom_cas_gender.jpg new file mode 100644 index 0000000..451c50a Binary files /dev/null and b/_site/posts/data_stories/new_post_3/images/app_stl_nom_cas_gender.jpg differ diff --git a/_site/posts/data_stories/new_post_3/images/nom_county_equal_0.jpg b/_site/posts/data_stories/new_post_3/images/nom_county_equal_0.jpg new file mode 100644 index 0000000..d19ca92 Binary files /dev/null and b/_site/posts/data_stories/new_post_3/images/nom_county_equal_0.jpg differ diff --git a/_site/posts/data_stories/new_post_3/images/nom_county_over_1.jpg b/_site/posts/data_stories/new_post_3/images/nom_county_over_1.jpg new file mode 100644 index 0000000..75ac23a Binary files /dev/null and b/_site/posts/data_stories/new_post_3/images/nom_county_over_1.jpg differ diff --git a/_site/posts/data_stories/new_post_3/images/pwd.jpg b/_site/posts/data_stories/new_post_3/images/pwd.jpg new file mode 100644 index 0000000..7c190b2 Binary files /dev/null and b/_site/posts/data_stories/new_post_3/images/pwd.jpg differ diff --git a/_site/posts/data_stories/new_post_3/images/stl_cas_county.jpg b/_site/posts/data_stories/new_post_3/images/stl_cas_county.jpg new file mode 100644 index 0000000..e6cb0a5 Binary files /dev/null and b/_site/posts/data_stories/new_post_3/images/stl_cas_county.jpg differ diff --git a/_site/posts/data_stories/new_post_3/post_3.html b/_site/posts/data_stories/new_post_3/post_3.html new file mode 100644 index 0000000..d353a2b --- /dev/null +++ b/_site/posts/data_stories/new_post_3/post_3.html @@ -0,0 +1,603 @@ + + + + + + + + + + + +William Okech - Data Visualization for Government + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

Data Visualization for Government

+

Visualizing HR data from the Chief Administrative Secretary recruitment process in Kenya

+
+
RStudio
+
R
+
Data Stories
+
Blog
+
Data Visualization
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

April 13, 2023

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

In January 2018, former President Uhuru Kenyatta announced the creation of the Chief Administrative Secretary (CAS) position. The primary role of the appointees would be to “help the Cabinet Secretary to better coordinate the running of the affairs of the respective ministries 1.” Though highly controversial at the time (as civil society groups 2 challenged the legality of the position), the High Court (in February 2023) eventually ruled that the decision to create the new position was lawful 3.

+

The Public Service Commission (PSC) of Kenya is responsible for government recruitment, and Section 234 of the Constitution of Kenya states that part of its mandate includes the “establishment of public offices” and “appointment of persons to those offices.” The most recent call for applications to the CAS position was circulated in October 2022 4. Here, the PSC listed the roles, responsibilities, and requirements for all applicants. To ensure transparency in the recruitment process, the PSC provided a list of all the applicants 5, the shortlisted applicants [5], and a schedule of the interview times [5]. After the interview, the names of successful candidates were forwarded to the President for appointment. On March 16th, 2023, President William Ruto appointed fifty (50) individuals to the CAS position, which was more than double the number of positions created by the PSC. The legality of these appointments has already been challenged in court and we await the final verdict 6 7.

+

Controversy aside, the goal of this article is to provide a data-driven assessment of publicly available recruitment information (specifically gender, disability status, and county of origin) to determine whether principles of diversity, equity, and inclusion (DEI) were promoted during the hiring process.

+
+
+

Summary of Findings

+
    +
  1. Only 26% of the CAS appointments went to women, even though 33% of applicants and 36% of those shortlisted were female. This may not be in line with the “two-thirds gender rule” outlined in Article 27(8) of the Constitution of Kenya (2010) which states that “not more than two-thirds of the members of elective or appointive bodies shall be of the same gender.”
  2. +
+
+
+

+
+
+
    +
  1. Two percent (2%) of the nominees were Persons with Disabilities (PWDs), which is representative of the percentage of Kenyans (at the national level) living with a disability (2.2%) 8. However, institutions such as the National Gender and Equality Commission should develop programs to increase the number of applicants from marginalized groups.
  2. +
+
+
+

+
+
+
    +
  1. In total, there were 133 applicants from Busia County (2.5%). Despite this, it was the only county that did not have a representative on the shortlist. To remedy this, the PSC may need to put in place affirmative action policies that ensure that at least one applicant from each county makes it to the shortlist.
  2. +
+
+
+

+
+
+
    +
  1. Nine counties are not represented in the final list of nominees.
  2. +
+
+
+

+
+
+
    +
  1. Ten counties have more than one CAS nominee.
  2. +
+
+
+

+
+
+
+
+

Conclusion

+

Overall, this article provided a set of data visualizations to help understand the demographics of applicants for the CAS position in Kenya. Specifically, the gender, disability status, and county of origin were assessed at the applicant, shortlisting, and nomination stages. This study had several key findings:

+
    +
  1. Less than one-third of the nominees were women even though 36% of the nominees shortlisted were female.
  2. +
  3. Only one nominee was a person with a disability (PWD).
  4. +
  5. One county (Busia) failed to have any applicants make it to the shortlist of 240 and nine counties in total had zero nominees out of 50.
  6. +
+

Future work will assess the data at a more granular level and determine the counties where there were low numbers of total applicants, low numbers of women at the application and shortlisting stage, and also determine which counties did not have PWD applicants. Additionally, it would be helpful if the PSC could provide age and education level data to help perform a more thorough analysis of the recruitment process. Lastly, it is commendable that the PSC makes this information publicly available, and I hope that further analysis of the data can result in initiatives to diversify the candidate pool at the applicant and shortlisting stages.

+
+
+

Fun Fact

+

The most common applicant first names (male and female).

+
+
+

+
+
+ + +
+ + +

Footnotes

+ +
    +
  1. PSCU (2018) Uhuru Kenyatta’s Full Statement On New Cabinet. Available at: https://www.citizen.digital/news/uhuru-kenyattas-full-statement-on-new-cabinet-189331 (Accessed: April 2, 2023).↩︎

  2. +
  3. Betty Njeru (2018) Activist Omtatah moves to court challenging the creation of new cabinet positions. Available at: https://www.standardmedia.co.ke/kenya/article/2001267705/okiya-omtatah-moves-to-court-challenging-new-cabinet-positions (Accessed: April 6, 2023).↩︎

  4. +
  5. Correspondent (2023) Labour Court Clears Way For Appointment Of Chief Administrative Secretaries. Available at: https://www.capitalfm.co.ke/news/2023/02/labour-court-clears-the-way-for-appointment-of-chief-administrative-secretaries/ (Accessed: April 10, 2023).↩︎

  6. +
  7. PSC (2023) CALL FOR APPLICATIONS TO THE POSITION OF CHIEF ADMINISTRATIVE SECRETARY IN THE PUBLIC SERVICE. Available at: https://www.publicservice.go.ke/index.php/media-center/2/202-call-for-applications-to-the-position-of-chief-administrative-secretary-in-the-public-service (Accessed: April 10, 2023).↩︎

  8. +
  9. PSC (2023) Shortlisted Candidates. Available at: https://www.publicservice.go.ke/index.php/recruitment/shortlisted-candidates (Accessed: April 10, 2023).↩︎

  10. +
  11. Susan Muhindi (2023) Ruto’s 50 CAS nomination list challenged in court. Available at: https://www.the-star.co.ke/news/2023-03-20-rutos-50-cas-nomination-list-challenged-in-court/ (Accessed: April 2, 2023).↩︎

  12. +
  13. Emmanuel Wanjala (2023). Available at: https://www.the-star.co.ke/news/2023-03-17-ruto-has-created-27-illegal-cas-posts-lsks-eric-theuri/ (Accessed: April 5, 2023).↩︎

  14. +
    1. +
    2. Shelmith Kariuki (2020). rKenyaCensus: 2019 Kenya Population and Housing Census Results. R package version 0.0.2.
    3. +
    +↩︎
  15. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/r_rstudio/new_post_1/post_1.html b/_site/posts/r_rstudio/new_post_1/post_1.html new file mode 100644 index 0000000..44bea3d --- /dev/null +++ b/_site/posts/r_rstudio/new_post_1/post_1.html @@ -0,0 +1,600 @@ + + + + + + + + + + + +William Okech - Getting Started with R and RStudio + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

Getting Started with R and RStudio

+

Software Installation

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

June 8, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Welcome!

+

In this 1st post, the reader will be introduced to the R programming language and RStudio software.

+
+

Introduction

+

This blog aims to introduce new R/RStudio users to the fundamentals of R and lay the groundwork for more in-depth statistical analysis, data visualization, and reporting methods. I hope to present the topics in a straightforward manner so that anyone new to programming is not intimidated.

+
+
+

What is R?

+

R is a programming language and open-source (freely available) software invented by Ross Ihaka and Robert Gentleman in 1993 (published as open-source in 1995) when they were based at the University of Auckland. Fun fact: R represents the first letter of the first names of the creators. The software is utilized by individuals working for various organizations ranging from academic institutions and healthcare organizations to financial services and information technology companies. In May 2022, the TIOBE index (a measure of programming language popularity) demonstrated that R was the 13th most popular programming language. R’s popularity may result from its highly extensible nature that allows users to perform statistical data analysis, generate visualizations, and report findings.

+
+

What are the benefits of using R?

+

As mentioned in the previous section, R is an open-source software that is highly extensible. Thousands of extensions (also known as packages) can be installed, allowing one to increase the number of available applications. The main advantages of R include: 1. A large community of users and developers that can provide learning support and assist with technical challenges, 2. The ability to perform reproducible research. 3. Its cross-platform nature, which means that it can be used on Linux, Windows, and Mac operating systems. 4. The ability to generate high-quality graphics from datasets of varying dimensions.

+
+
+

I’m looking for R. Where can I find it?

+

To install R on your personal computer, visit The R Project for Statistical Computing’s Comprehensive R Archive Network (CRAN), download the most recent version, and install it according to the website’s instructions. Once you download R, you can now experiment with some of its features.

+
+
+

+
+
+

Figure 1: The standard R interface (Windows)

+

When you open R, you will notice that it has a basic graphical user interface (GUI), and the console displays a command-line interface (CLI; where each command is executed one at a time). This may be intimidating for new users; however, there is a workaround for those who are not comfortable working at the command line. For those who are not experienced programmers, R can be used with an application called RStudio.

+
+
+
+

What is RStudio and how does it differ from R?

+

RStudio is an integrated development environment (IDE) for R that was developed by JJ Allaire. This software contains tools that make programming in R easier. RStudio extends R’s capabilities by making it easier to import data, write scripts, and generate visualizations and reports. The RStudio IDE is available for download from the RStudio website.

+
+
+

+
+
+

Figure 2: RStudio interface with four main panes (Windows)

+

Once installed, the basic layout of RStudio reveals that there is a script (text editor), console, navigation, and environment/history window pane. The script pane (text editor) in the upper-left allows one to write, open, edit, and execute more extended programs compared with using the standalone R software. The console pane (bottom-left) displays the script’s output and offers a command-line interface for typing code that is immediately executed. The environment pane (upper-right) displays information about the created objects, the history of executed code, and any external connections. Finally, the navigation pane (bottom-right) shows multiple tabs. Its primary tabs include the “Plot” tab, which shows graphics created by code, the “Packages” tab where the packages are installed, and the “Help” tab, which provides assistance for all things R and allows one to search the R documentation.

+
+

What are the primary benefits of RStudio?

+

RStudio allows one to create projects (a collection of related files stored within a working directory). Additionally, RStudio can be customized using options available under the “Tools” tab. Lastly, RStudio has Git integration that allows for version control where you can back up your code at different timepoints and effortlessly transfer code between computers.1

+
+
+
+

Conclusion

+

Hopefully, this was a helpful introduction to R and RStudio. In subsequent blog posts, we will focus on:

+
    +
  1. Part 1: Simple arithmetic,
  2. +
  3. Part 2: Variables,
  4. +
  5. Part 3: Data types,
  6. +
  7. Part 4: Operators,
  8. +
  9. Part 5: Vectors,
  10. +
  11. Part 6: Missing data
  12. +
  13. Part 7: Data Structures
  14. +
+ + +
+
+ + +

Footnotes

+ +
    +
  1. Summary of the benefits of R and RStudio obtained from Lander, J. P. (2014). R for everyone: Advanced analytics and graphics. Addison-Wesley.↩︎

  2. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/posts/series_1/new_post_1/r_and_rstudio.png b/_site/posts/r_rstudio/new_post_1/r_and_rstudio.png similarity index 100% rename from posts/series_1/new_post_1/r_and_rstudio.png rename to _site/posts/r_rstudio/new_post_1/r_and_rstudio.png diff --git a/posts/series_1/new_post_1/r_interface.png b/_site/posts/r_rstudio/new_post_1/r_interface.png similarity index 100% rename from posts/series_1/new_post_1/r_interface.png rename to _site/posts/r_rstudio/new_post_1/r_interface.png diff --git a/posts/series_1/new_post_1/rstudio_interface.png b/_site/posts/r_rstudio/new_post_1/rstudio_interface.png similarity index 100% rename from posts/series_1/new_post_1/rstudio_interface.png rename to _site/posts/r_rstudio/new_post_1/rstudio_interface.png diff --git a/_site/posts/r_rstudio/new_post_2/post_2.html b/_site/posts/r_rstudio/new_post_2/post_2.html new file mode 100644 index 0000000..d03448a --- /dev/null +++ b/_site/posts/r_rstudio/new_post_2/post_2.html @@ -0,0 +1,715 @@ + + + + + + + + + + + +William Okech - The Basics of R and RStudio + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

The Basics of R and RStudio

+

Part 1: Simple Arithmetic

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

June 15, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

This is the first in a series of blog posts looking at the basics of R and RStudio. These programs allow us to perform various basic and complex calculations.

+

To get started, first, we will open R or RStudio. In R, go to the console, and in RStudio, head to the console pane. Next, type in a basic arithmetic calculation such as “1 + 1” after the angle bracket (>) and hit “Enter.”

+

An example of a basic calculation:

+
+
1+1
+
+
[1] 2
+
+
+

The output will be observed next to the square bracket containing the number 1 ([1]).

+
+
+

+
+
+

Additionally, to include comments into the code block we use the hash (#) symbol. Anything written after the code block will be commented out and not run.

+
+
# A simple arithmetic calculation (which is not run because of the hash symbol)
+1+1
+
+
[1] 2
+
+
+
+
+

Arithmetic operators available in R/RStudio

+

Various arithmetic operators (listed below) can be used in R/RStudio.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Arithmetic OperatorDescription
+Addition
-Subtraction
*Multiplication
/Division
** or ^Exponentiation
%%Modulus (remainder after division)
%/%Integer division
+
+
+

Examples

+
+

Addition

+
+
10+30
+
+
[1] 40
+
+
+
+
+

Subtraction

+
+
30-24
+
+
[1] 6
+
+
+
+
+

Multiplication

+
+
20*4
+
+
[1] 80
+
+
+
+
+

Division

+
+
93/4
+
+
[1] 23.25
+
+
+
+
+

Exponentiation

+
+
3^6
+
+
[1] 729
+
+
+
+
+

Modulus (remainder with division)

+
+
94%%5
+
+
[1] 4
+
+
+
+
+

Integer Division

+
+
54%/%7
+
+
[1] 7
+
+
+
+
+

Slightly more complex arithmetic operations

+
+
5-1+(4*3)/16*3
+
+
[1] 6.25
+
+
+ + +
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/posts/series_1/new_post_2/r_and_rstudio.png b/_site/posts/r_rstudio/new_post_2/r_and_rstudio.png similarity index 100% rename from posts/series_1/new_post_2/r_and_rstudio.png rename to _site/posts/r_rstudio/new_post_2/r_and_rstudio.png diff --git a/posts/series_1/new_post_2/r_console_1plus1.png b/_site/posts/r_rstudio/new_post_2/r_console_1plus1.png similarity index 100% rename from posts/series_1/new_post_2/r_console_1plus1.png rename to _site/posts/r_rstudio/new_post_2/r_console_1plus1.png diff --git a/posts/series_1/new_post_3/env_pane_1.png b/_site/posts/r_rstudio/new_post_3/env_pane_1.png similarity index 100% rename from posts/series_1/new_post_3/env_pane_1.png rename to _site/posts/r_rstudio/new_post_3/env_pane_1.png diff --git a/_site/posts/r_rstudio/new_post_3/post_3.html b/_site/posts/r_rstudio/new_post_3/post_3.html new file mode 100644 index 0000000..5c81690 --- /dev/null +++ b/_site/posts/r_rstudio/new_post_3/post_3.html @@ -0,0 +1,707 @@ + + + + + + + + + + + +William Okech - The Basics of R and RStudio + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

The Basics of R and RStudio

+

Part 2: Variables

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

June 22, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

Variables are instrumental in programming because they are used as “containers” to store data values.

+

To assign a value to a variable, we can use <− or =. However, most R users prefer to use <−.

+
+
+

Variable assignment

+
+

1. Using <-

+
+
variable_1 <- 5
+variable_1
+
+
[1] 5
+
+
+
+
+

2. Using =

+
+
variable_2 = 10
+variable_2
+
+
[1] 10
+
+
+
+
+

3. Reverse the value and variable with ->

+
+
15 -> variable_3
+variable_3
+
+
[1] 15
+
+
+
+
+

4. Assign two variables to one value

+
+
variable_4 <- variable_5 <- 30
+variable_4
+
+
[1] 30
+
+
variable_5
+
+
[1] 30
+
+
+
+
+
+

Variable output

+

The output of the variable can then be obtained by:

+
    +
  1. Typing the variable name and then pressing “Enter,”
  2. +
  3. Typing “print” with the variable name in brackets, print(variable), and
  4. +
  5. Typing “View” with the variable name in brackets, View(variable).
  6. +
+

Both print() and View() are some of the many built-in functions1 available in R.

+

In RStudio, the list of variables that have been loaded can be viewed in the environment pane.

+
+
+

+
+
+

Figure 1: A screenshot of the environment pane with the stored variables.

+
+
print(variable_1)
+
+
[1] 5
+
+
+
+
View(variable_2)
+
+

Output of View() will be seen in the script pane

+
+
+

The assign() and rm() functions

+

In addition to using the assignment operators (<- and =), we can use the assign() function to assign a value to a variable.

+
+
assign("variable_6", 555)
+variable_6
+
+
[1] 555
+
+
+

To remove the assignment of the value to the variable, either delete the variable in the “environment pane” or use the rm() function.

+
+
variable_7 <- 159
+
+
+
rm(variable_7)
+
+

After running rm() look at the environment pane to confirm whether variable_7 has been removed.

+
+
+

Naming variables

+

At this point, you may be wondering what conventions are used for naming variables. First, variables need to have meaningful names such as current_temp, time_24_hr, or weight_lbs. However, we need to be mindful of the variable style guide which provides us with the appropriate rules for naming variables.

+

Some rules to keep in mind are:

+
    +
  1. R is case-sensitive (variable is not the same as Variable),
  2. +
  3. Names similar to typical outputs or functions (TRUE, FALSE, if, or else) cannot be used,
  4. +
  5. Appropriate variable names can contain letters, numbers, dots, and underscores. However, you cannot start with an underscore, number, or dot followed by a number.
  6. +
+
+
+

Valid and invalid names

+
+

Valid names:

+
    +
  • time_24_hr
  • +
  • .time24_hr
  • +
+
+
+

Invalid names:

+
    +
  • _24_hr.time
  • +
  • 24_hr_time
  • +
  • .24_hr_time
  • +
+ + +
+
+ + +

Footnotes

+ +
    +
  1. Functions are a collection of statements (organized and reusable code) that perform a specific task, and R has many built-in functions.↩︎

  2. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/posts/series_1/new_post_3/r_and_rstudio.png b/_site/posts/r_rstudio/new_post_3/r_and_rstudio.png similarity index 100% rename from posts/series_1/new_post_3/r_and_rstudio.png rename to _site/posts/r_rstudio/new_post_3/r_and_rstudio.png diff --git a/_site/posts/r_rstudio/new_post_4/post_4.html b/_site/posts/r_rstudio/new_post_4/post_4.html new file mode 100644 index 0000000..cdd1b73 --- /dev/null +++ b/_site/posts/r_rstudio/new_post_4/post_4.html @@ -0,0 +1,850 @@ + + + + + + + + + + + +William Okech - The Basics of R and RStudio + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

The Basics of R and RStudio

+

Part 3: Data Types

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

June 23, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

R and RStudio utilize multiple data types to store different kinds of data.

+

The most common data types in R are listed below.

+ ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Data TypeDescription
NumericThe most common data type. The values can be numbers or decimals (all real numbers).
IntegerSpecial case of numeric data without decimals.
LogicalBoolean data type with only 2 values (TRUE or FALSE).
ComplexSpecifies imaginary values in R.
CharacterAssigns a character or string to a variable. The character variables are enclosed in single quotes (‘character’) while the string variables are enclosed in double quotes (“string”).
FactorSpecial type of character variable that represents a categorical such as gender.
RawSpecifies values as raw bytes. It uses built-in functions to convert between raw and character (charToRaw() or rawToChar()).
DatesSpecifies the date variable. Date stores a date and POSIXct stores a date and time. The output is indicated as the number of days (Date) or number of seconds (POSIXct) since 01/01/1970.
+
+
+

Data types

+
+

1. Numeric

+
+
89.98
+
+
[1] 89.98
+
+
55
+
+
[1] 55
+
+
+
+
+

2. Integer

+
+
5L
+
+
[1] 5
+
+
5768L
+
+
[1] 5768
+
+
+
+
+

3. Logical

+
+
TRUE
+
+
[1] TRUE
+
+
FALSE
+
+
[1] FALSE
+
+
+
+
+

4. Complex

+
+
10 + 30i
+
+
[1] 10+30i
+
+
287 + 34i
+
+
[1] 287+34i
+
+
+
+
+

5. Character or String

+
+
'abc'
+
+
[1] "abc"
+
+
"def"
+
+
[1] "def"
+
+
"I like learning R"
+
+
[1] "I like learning R"
+
+
+
+
+

6. Dates

+
+
"2022-06-23 14:39:21 EAT"
+
+
[1] "2022-06-23 14:39:21 EAT"
+
+
"2022-06-23"
+
+
[1] "2022-06-23"
+
+
+
+
+
+

Examining various data types

+

Several functions exist to examine the features of the various data types. These include:

+
    +
  1. typeof() – what is the data type of the object (low-level)?
  2. +
  3. class() – what is the data type of the object (high-level)?
  4. +
  5. length() – how long is the object?
  6. +
  7. attributes() – any metadata available?
  8. +
+

Let’s look at how these functions work with a few examples

+
+
a <- 45.84
+b <- 858L
+c <- TRUE
+d <- 89 + 34i
+e <- 'abc'
+
+
+

1. Examine the data type at a low-level with typeof()

+
+
typeof(a)
+
+
[1] "double"
+
+
typeof(b)
+
+
[1] "integer"
+
+
typeof(c)
+
+
[1] "logical"
+
+
typeof(d)
+
+
[1] "complex"
+
+
typeof(e)
+
+
[1] "character"
+
+
+
+
+

2. Examine the data type at a high-level with class()

+
+
class(a)
+
+
[1] "numeric"
+
+
class(b)
+
+
[1] "integer"
+
+
class(c)
+
+
[1] "logical"
+
+
class(d)
+
+
[1] "complex"
+
+
class(e)
+
+
[1] "character"
+
+
+
+
+

3. Use the is.____() functions to determine the data type

+

To test whether the variable is of a specific type, we can use the is.____() functions.

+

First, we test the variable a which is numeric.

+
+
is.numeric(a)
+
+
[1] TRUE
+
+
is.integer(a)
+
+
[1] FALSE
+
+
is.logical(a)
+
+
[1] FALSE
+
+
is.character(a)
+
+
[1] FALSE
+
+
+

Second, we test the variable c which is logical.

+
+
is.numeric(c)
+
+
[1] FALSE
+
+
is.integer(c)
+
+
[1] FALSE
+
+
is.logical(c)
+
+
[1] TRUE
+
+
is.character(c)
+
+
[1] FALSE
+
+
+
+
+
+

Converting between various data types

+

To convert between data types we can use the as.____() functions. These include: as.Date(), as.numeric(), and as.factor(). Additionally, other helpful functions include factor() which adds levels to the data and nchar() which provides the length of the data.

+
+

Examples

+
+
as.integer(a)
+
+
[1] 45
+
+
as.logical(0)
+
+
[1] FALSE
+
+
as.logical(1)
+
+
[1] TRUE
+
+
nchar(e)
+
+
[1] 3
+
+
+ + +
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/posts/series_1/new_post_4/r_and_rstudio.png b/_site/posts/r_rstudio/new_post_4/r_and_rstudio.png similarity index 100% rename from posts/series_1/new_post_4/r_and_rstudio.png rename to _site/posts/r_rstudio/new_post_4/r_and_rstudio.png diff --git a/posts/series_1/new_post_5/r_and_rstudio.png b/_site/posts/r_rstudio/new_post_5/r_and_rstudio.png similarity index 100% rename from posts/series_1/new_post_5/r_and_rstudio.png rename to _site/posts/r_rstudio/new_post_5/r_and_rstudio.png diff --git a/posts/series_1/new_post_6/r_and_rstudio.png b/_site/posts/r_rstudio/new_post_6/r_and_rstudio.png similarity index 100% rename from posts/series_1/new_post_6/r_and_rstudio.png rename to _site/posts/r_rstudio/new_post_6/r_and_rstudio.png diff --git a/posts/series_1/new_post_7/r_and_rstudio.png b/_site/posts/r_rstudio/new_post_7/r_and_rstudio.png similarity index 100% rename from posts/series_1/new_post_7/r_and_rstudio.png rename to _site/posts/r_rstudio/new_post_7/r_and_rstudio.png diff --git a/posts/series_1/new_post_8/r_and_rstudio.png b/_site/posts/r_rstudio/new_post_8/r_and_rstudio.png similarity index 100% rename from posts/series_1/new_post_8/r_and_rstudio.png rename to _site/posts/r_rstudio/new_post_8/r_and_rstudio.png diff --git a/_site/posts/r_rstudio_basics/arithmetic/arithmetic.html b/_site/posts/r_rstudio_basics/arithmetic/arithmetic.html new file mode 100644 index 0000000..d03448a --- /dev/null +++ b/_site/posts/r_rstudio_basics/arithmetic/arithmetic.html @@ -0,0 +1,715 @@ + + + + + + + + + + + +William Okech - The Basics of R and RStudio + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

The Basics of R and RStudio

+

Part 1: Simple Arithmetic

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

June 15, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

This is the first in a series of blog posts looking at the basics of R and RStudio. These programs allow us to perform various basic and complex calculations.

+

To get started, first, we will open R or RStudio. In R, go to the console, and in RStudio, head to the console pane. Next, type in a basic arithmetic calculation such as “1 + 1” after the angle bracket (>) and hit “Enter.”

+

An example of a basic calculation:

+
+
1+1
+
+
[1] 2
+
+
+

The output will be observed next to the square bracket containing the number 1 ([1]).

+
+
+

+
+
+

Additionally, to include comments into the code block we use the hash (#) symbol. Anything written after the code block will be commented out and not run.

+
+
# A simple arithmetic calculation (which is not run because of the hash symbol)
+1+1
+
+
[1] 2
+
+
+
+
+

Arithmetic operators available in R/RStudio

+

Various arithmetic operators (listed below) can be used in R/RStudio.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Arithmetic OperatorDescription
+Addition
-Subtraction
*Multiplication
/Division
** or ^Exponentiation
%%Modulus (remainder after division)
%/%Integer division
+
+
+

Examples

+
+

Addition

+
+
10+30
+
+
[1] 40
+
+
+
+
+

Subtraction

+
+
30-24
+
+
[1] 6
+
+
+
+
+

Multiplication

+
+
20*4
+
+
[1] 80
+
+
+
+
+

Division

+
+
93/4
+
+
[1] 23.25
+
+
+
+
+

Exponentiation

+
+
3^6
+
+
[1] 729
+
+
+
+
+

Modulus (remainder with division)

+
+
94%%5
+
+
[1] 4
+
+
+
+
+

Integer Division

+
+
54%/%7
+
+
[1] 7
+
+
+
+
+

Slightly more complex arithmetic operations

+
+
5-1+(4*3)/16*3
+
+
[1] 6.25
+
+
+ + +
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/r_rstudio_basics/arithmetic/r_and_rstudio.png b/_site/posts/r_rstudio_basics/arithmetic/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/_site/posts/r_rstudio_basics/arithmetic/r_and_rstudio.png differ diff --git a/_site/posts/r_rstudio_basics/arithmetic/r_console_1plus1.png b/_site/posts/r_rstudio_basics/arithmetic/r_console_1plus1.png new file mode 100644 index 0000000..b1ab85f Binary files /dev/null and b/_site/posts/r_rstudio_basics/arithmetic/r_console_1plus1.png differ diff --git a/_site/posts/r_rstudio_basics/data_structures/data_structures.html b/_site/posts/r_rstudio_basics/data_structures/data_structures.html new file mode 100644 index 0000000..bec5e15 --- /dev/null +++ b/_site/posts/r_rstudio_basics/data_structures/data_structures.html @@ -0,0 +1,1019 @@ + + + + + + + + + + + +William Okech - The Basics of R and RStudio + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

The Basics of R and RStudio

+

Part 7: Data Structures

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

November 16, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

Data structures in R are tools for storing and organizing multiple values.

+

They help to organize stored data in a way that the data can be used more effectively. Data structures vary according to the number of dimensions and the data types (heterogeneous or homogeneous) contained. The primary data structures are:

+
    +
  1. Vectors (link)

  2. +
  3. Lists

  4. +
  5. Data frames

  6. +
  7. Matrices

  8. +
  9. Arrays

  10. +
  11. Factors

  12. +
+
+
+

Data structures

+
+

1. Vectors

+

Discussed in a previous post

+
+
+

2. Lists

+

Lists are objects/containers that hold elements of the same or different types. They can containing strings, numbers, vectors, matrices, functions, or other lists. Lists are created with the list() function

+
+

Examples

+
+
+

a. Three element list

+
+
list_1 <- list(10, 30, 50)
+
+
+
+

b. Single element list

+
+
list_2 <- list(c(10, 30, 50))
+
+
+
+

c. Three element list

+
+
list_3 <- list(1:3, c(50,40), 3:-5)
+
+
+
+

d. List with elements of different types

+
+
list_4 <- list(c("a", "b", "c"), 5:-1)
+
+
+
+

e. List which contains a list

+
+
list_5 <- list(c("a", "b", "c"), 5:-1, list_1)
+
+
+
+

f. Set names for the list elements

+
+
names(list_5)
+
+
NULL
+
+
names(list_5) <- c("character vector", "numeric vector", "list")
+names(list_5)
+
+
[1] "character vector" "numeric vector"   "list"            
+
+
+
+
+

g. Access elements

+
+
list_5[[1]]
+
+
[1] "a" "b" "c"
+
+
list_5[["character vector"]]
+
+
[1] "a" "b" "c"
+
+
+
+
+

h. Length of list

+
+
length(list_1)
+
+
[1] 3
+
+
length(list_5)
+
+
[1] 3
+
+
+
+
+
+

3. Data frames

+

A data frame is one of the most common data objects used to store tabular data in R. Tabular data has rows representing observations and columns representing variables. Dataframes contain lists of equal-length vectors. Each column holds a different type of data, but within each column, the elements must be of the same type. The most common data frame characteristics are listed below:

+

• Columns should have a name;

+

• Row names should be unique;

+

• Various data can be stored (such as numeric, factor, and character);

+

• The individual columns should contain the same number of data items.

+
+
+

Creation of data frames

+
+
level <- c("Low", "Mid", "High")
+language <- c("R", "RStudio", "Shiny")
+age <- c(25, 36, 47)
+
+df_1 <- data.frame(level, language, age)
+
+
+
+

Functions used to manipulate data frames

+
+

a. Number of rows

+
+
nrow(df_1)
+
+
[1] 3
+
+
+
+
+

b. Number of columns

+
+
ncol(df_1)
+
+
[1] 3
+
+
+
+
+

c. Dimensions

+
+
dim(df_1)
+
+
[1] 3 3
+
+
+
+
+

d. Class of data frame

+
+
class(df_1)
+
+
[1] "data.frame"
+
+
+
+
+

e. Column names

+
+
colnames(df_1)
+
+
[1] "level"    "language" "age"     
+
+
+
+
+

f. Row names

+
+
rownames(df_1)
+
+
[1] "1" "2" "3"
+
+
+
+
+

g. Top and bottom values

+
+
head(df_1, n=2)
+
+
  level language age
+1   Low        R  25
+2   Mid  RStudio  36
+
+
tail(df_1, n=2)
+
+
  level language age
+2   Mid  RStudio  36
+3  High    Shiny  47
+
+
+
+
+

h. Access columns

+
+
df_1$level
+
+
[1] "Low"  "Mid"  "High"
+
+
+
+
+

i. Access individual elements

+
+
df_1[3,2]
+
+
[1] "Shiny"
+
+
df_1[2, 1:2]
+
+
  level language
+2   Mid  RStudio
+
+
+
+
+

j. Access columns with index

+
+
df_1[, 3]
+
+
[1] 25 36 47
+
+
df_1[, c("language")]
+
+
[1] "R"       "RStudio" "Shiny"  
+
+
+
+
+

k. Access rows with index

+
+
df_1[2, ]
+
+
  level language age
+2   Mid  RStudio  36
+
+
+
+
+
+

4. Matrices

+

A matrix is a rectangular two-dimensional (2D) homogeneous data set containing rows and columns. It contains real numbers that are arranged in a fixed number of rows and columns. Matrices are generally used for various mathematical and statistical applications.

+
+

a. Creation of matrices

+
+
m1 <- matrix(1:9, nrow = 3, ncol = 3) 
+m2 <- matrix(21:29, nrow = 3, ncol = 3) 
+m3 <- matrix(1:12, nrow = 2, ncol = 6)
+
+
+
+

b. Obtain the dimensions of the matrices

+
+
# m1
+nrow(m1)
+
+
[1] 3
+
+
ncol(m1)
+
+
[1] 3
+
+
dim(m1)
+
+
[1] 3 3
+
+
# m3
+nrow(m3)
+
+
[1] 2
+
+
ncol(m3)
+
+
[1] 6
+
+
dim(m3)
+
+
[1] 2 6
+
+
+
+
+

c. Arithmetic with matrices

+
+
m1+m2
+
+
     [,1] [,2] [,3]
+[1,]   22   28   34
+[2,]   24   30   36
+[3,]   26   32   38
+
+
m1-m2
+
+
     [,1] [,2] [,3]
+[1,]  -20  -20  -20
+[2,]  -20  -20  -20
+[3,]  -20  -20  -20
+
+
m1*m2
+
+
     [,1] [,2] [,3]
+[1,]   21   96  189
+[2,]   44  125  224
+[3,]   69  156  261
+
+
m1/m2
+
+
           [,1]      [,2]      [,3]
+[1,] 0.04761905 0.1666667 0.2592593
+[2,] 0.09090909 0.2000000 0.2857143
+[3,] 0.13043478 0.2307692 0.3103448
+
+
m1 == m2
+
+
      [,1]  [,2]  [,3]
+[1,] FALSE FALSE FALSE
+[2,] FALSE FALSE FALSE
+[3,] FALSE FALSE FALSE
+
+
+
+
+

d. Matrix multiplication

+
+
m5 <- matrix(1:10, nrow = 5)
+m6 <- matrix(43:34, nrow = 5)
+
+m5*m6
+
+
     [,1] [,2]
+[1,]   43  228
+[2,]   84  259
+[3,]  123  288
+[4,]  160  315
+[5,]  195  340
+
+
# m5%*%m6 will not work because of the dimesions.
+# the vector m6 needs to be transposed.
+
+# Transpose
+m5%*%t(m6)
+
+
     [,1] [,2] [,3] [,4] [,5]
+[1,]  271  264  257  250  243
+[2,]  352  343  334  325  316
+[3,]  433  422  411  400  389
+[4,]  514  501  488  475  462
+[5,]  595  580  565  550  535
+
+
+
+
+

e. Generate an identity matrix

+
+
diag(5)
+
+
     [,1] [,2] [,3] [,4] [,5]
+[1,]    1    0    0    0    0
+[2,]    0    1    0    0    0
+[3,]    0    0    1    0    0
+[4,]    0    0    0    1    0
+[5,]    0    0    0    0    1
+
+
+
+
+

f. Column and row names

+
+
colnames(m5)
+
+
NULL
+
+
rownames(m6)
+
+
NULL
+
+
+
+
+
+

5. Arrays

+

An array is a multidimensional vector that stores homogeneous data. It can be thought of as a stacked matrix and stores data in more than 2 dimensions (n-dimensional). An array is composed of rows by columns by dimensions. Example: an array with dimensions, dim = c(2,3,3), has 2 rows, 3 columns, and 3 matrices.

+
+

a. Creating arrays

+
+
arr_1 <- array(1:12, dim = c(2,3,2))
+
+arr_1
+
+
, , 1
+
+     [,1] [,2] [,3]
+[1,]    1    3    5
+[2,]    2    4    6
+
+, , 2
+
+     [,1] [,2] [,3]
+[1,]    7    9   11
+[2,]    8   10   12
+
+
+
+
+

b. Filter array by index

+
+
arr_1[1, , ]
+
+
     [,1] [,2]
+[1,]    1    7
+[2,]    3    9
+[3,]    5   11
+
+
arr_1[1, ,1]
+
+
[1] 1 3 5
+
+
arr_1[, , 1]
+
+
     [,1] [,2] [,3]
+[1,]    1    3    5
+[2,]    2    4    6
+
+
+
+
+
+

6. Factors

+

Factors are used to store integers or strings which are categorical. They categorize data and store the data in different levels. This form of data storage is useful for statistical modeling. Examples include TRUE or FALSE and male or female.

+
+
vector <- c("Male", "Female")
+factor_1 <- factor(vector)
+factor_1
+
+
[1] Male   Female
+Levels: Female Male
+
+
+

OR

+
+
factor_2 <- as.factor(vector)
+factor_2
+
+
[1] Male   Female
+Levels: Female Male
+
+
as.numeric(factor_2)
+
+
[1] 2 1
+
+
+ + +
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/r_rstudio_basics/data_structures/r_and_rstudio.png b/_site/posts/r_rstudio_basics/data_structures/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/_site/posts/r_rstudio_basics/data_structures/r_and_rstudio.png differ diff --git a/_site/posts/r_rstudio_basics/data_types/data_types.html b/_site/posts/r_rstudio_basics/data_types/data_types.html new file mode 100644 index 0000000..cdd1b73 --- /dev/null +++ b/_site/posts/r_rstudio_basics/data_types/data_types.html @@ -0,0 +1,850 @@ + + + + + + + + + + + +William Okech - The Basics of R and RStudio + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

The Basics of R and RStudio

+

Part 3: Data Types

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

June 23, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

R and RStudio utilize multiple data types to store different kinds of data.

+

The most common data types in R are listed below.

+ ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Data TypeDescription
NumericThe most common data type. The values can be numbers or decimals (all real numbers).
IntegerSpecial case of numeric data without decimals.
LogicalBoolean data type with only 2 values (TRUE or FALSE).
ComplexSpecifies imaginary values in R.
CharacterAssigns a character or string to a variable. The character variables are enclosed in single quotes (‘character’) while the string variables are enclosed in double quotes (“string”).
FactorSpecial type of character variable that represents a categorical such as gender.
RawSpecifies values as raw bytes. It uses built-in functions to convert between raw and character (charToRaw() or rawToChar()).
DatesSpecifies the date variable. Date stores a date and POSIXct stores a date and time. The output is indicated as the number of days (Date) or number of seconds (POSIXct) since 01/01/1970.
+
+
+

Data types

+
+

1. Numeric

+
+
89.98
+
+
[1] 89.98
+
+
55
+
+
[1] 55
+
+
+
+
+

2. Integer

+
+
5L
+
+
[1] 5
+
+
5768L
+
+
[1] 5768
+
+
+
+
+

3. Logical

+
+
TRUE
+
+
[1] TRUE
+
+
FALSE
+
+
[1] FALSE
+
+
+
+
+

4. Complex

+
+
10 + 30i
+
+
[1] 10+30i
+
+
287 + 34i
+
+
[1] 287+34i
+
+
+
+
+

5. Character or String

+
+
'abc'
+
+
[1] "abc"
+
+
"def"
+
+
[1] "def"
+
+
"I like learning R"
+
+
[1] "I like learning R"
+
+
+
+
+

6. Dates

+
+
"2022-06-23 14:39:21 EAT"
+
+
[1] "2022-06-23 14:39:21 EAT"
+
+
"2022-06-23"
+
+
[1] "2022-06-23"
+
+
+
+
+
+

Examining various data types

+

Several functions exist to examine the features of the various data types. These include:

+
    +
  1. typeof() – what is the data type of the object (low-level)?
  2. +
  3. class() – what is the data type of the object (high-level)?
  4. +
  5. length() – how long is the object?
  6. +
  7. attributes() – any metadata available?
  8. +
+

Let’s look at how these functions work with a few examples

+
+
a <- 45.84
+b <- 858L
+c <- TRUE
+d <- 89 + 34i
+e <- 'abc'
+
+
+

1. Examine the data type at a low-level with typeof()

+
+
typeof(a)
+
+
[1] "double"
+
+
typeof(b)
+
+
[1] "integer"
+
+
typeof(c)
+
+
[1] "logical"
+
+
typeof(d)
+
+
[1] "complex"
+
+
typeof(e)
+
+
[1] "character"
+
+
+
+
+

2. Examine the data type at a high-level with class()

+
+
class(a)
+
+
[1] "numeric"
+
+
class(b)
+
+
[1] "integer"
+
+
class(c)
+
+
[1] "logical"
+
+
class(d)
+
+
[1] "complex"
+
+
class(e)
+
+
[1] "character"
+
+
+
+
+

3. Use the is.____() functions to determine the data type

+

To test whether the variable is of a specific type, we can use the is.____() functions.

+

First, we test the variable a which is numeric.

+
+
is.numeric(a)
+
+
[1] TRUE
+
+
is.integer(a)
+
+
[1] FALSE
+
+
is.logical(a)
+
+
[1] FALSE
+
+
is.character(a)
+
+
[1] FALSE
+
+
+

Second, we test the variable c which is logical.

+
+
is.numeric(c)
+
+
[1] FALSE
+
+
is.integer(c)
+
+
[1] FALSE
+
+
is.logical(c)
+
+
[1] TRUE
+
+
is.character(c)
+
+
[1] FALSE
+
+
+
+
+
+

Converting between various data types

+

To convert between data types we can use the as.____() functions. These include: as.Date(), as.numeric(), and as.factor(). Additionally, other helpful functions include factor() which adds levels to the data and nchar() which provides the length of the data.

+
+

Examples

+
+
as.integer(a)
+
+
[1] 45
+
+
as.logical(0)
+
+
[1] FALSE
+
+
as.logical(1)
+
+
[1] TRUE
+
+
nchar(e)
+
+
[1] 3
+
+
+ + +
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/r_rstudio_basics/data_types/r_and_rstudio.png b/_site/posts/r_rstudio_basics/data_types/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/_site/posts/r_rstudio_basics/data_types/r_and_rstudio.png differ diff --git a/_site/posts/r_rstudio_basics/missing_data/missing_data.html b/_site/posts/r_rstudio_basics/missing_data/missing_data.html new file mode 100644 index 0000000..4c25144 --- /dev/null +++ b/_site/posts/r_rstudio_basics/missing_data/missing_data.html @@ -0,0 +1,555 @@ + + + + + + + + + + + +William Okech - The Basics of R and RStudio + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

The Basics of R and RStudio

+

Part 6: Missing Data

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

November 14, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

R has two types of missing data, NA and NULL.1

+
+
+

NA

+

R uses NA to represent missing data. The NA appears as another element of a vector. To test each element for missingness we use is.na(). Generally, we can use tools such as mi, mice, and Amelia (which will be discussed later) to deal with missing data. The deletion of this missing data may lead to bias or data loss, so we need to be very careful when handling it. In subsequent blog posts, we will look at the use of imputation to deal with missing data.

+
+
+

NULL

+

NULL represents nothingness or the “absence of anything”. 2

+

It does not mean missing but represents nothing. NULL cannot exist within a vector because it disappears.

+
+
+

Supplementary Reading

+
    +
  1. An excellent post from the blog “Data Science by Design” on the role of missingness.
  2. +
+ + +
+ + +

Footnotes

+ +
    +
  1. Adapted from Lander, J. P. (2014) R for everyone: Advanced analytics and graphics. Addison-Wesley.↩︎

  2. +
  3. Adapted from Lander, J. P. (2014) R for everyone: Advanced analytics and graphics. Addison-Wesley.↩︎

  4. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/r_rstudio_basics/missing_data/r_and_rstudio.png b/_site/posts/r_rstudio_basics/missing_data/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/_site/posts/r_rstudio_basics/missing_data/r_and_rstudio.png differ diff --git a/_site/posts/r_rstudio_basics/operators/operators.html b/_site/posts/r_rstudio_basics/operators/operators.html new file mode 100644 index 0000000..8653f11 --- /dev/null +++ b/_site/posts/r_rstudio_basics/operators/operators.html @@ -0,0 +1,823 @@ + + + + + + + + + + + +William Okech - The Basics of R and RStudio + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

The Basics of R and RStudio

+

Part 4: Operators

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

November 9, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

R has many different types of operators that can perform different tasks.

+

Here we will focus on 5 major types of operators. The major types of operators are:

+
    +
  1. Arithmetic,

  2. +
  3. Relational,

  4. +
  5. Logical,

  6. +
  7. Assignment, and

  8. +
  9. Miscellaneous.

  10. +
+
+
+

1. Arithmetic Operators

+

Arithmetic operators are used to perform mathematical operations. These operators have been highlighted in Part 1 of the series.

+
+
+

2. Relational Operators

+

Relational operators are used to find the relationship between 2 variables and compare objects. The output of these comparisons is Boolean (TRUE or FALSE). The table below describes the most common relational operators.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Relational OperatorDescription
<Less than
>Greater than
<=Less than or equal to
>=Greater than or equal to
==Equal to
!=Not Equal to
+

Assign values to variables

+
+
x <- 227
+y <- 639
+
+
+

a. Less than

+
+
x < y
+
+
[1] TRUE
+
+
+
+
+

b. Greater than

+
+
x > y
+
+
[1] FALSE
+
+
+
+
+

c. Less than or equal to

+
+
x <= 300
+
+
[1] TRUE
+
+
+
+
+

d. Greater than or equal to

+
+
y >= 700
+
+
[1] FALSE
+
+
+
+
+

e. Equal to

+
+
y == 639
+
+
[1] TRUE
+
+
+
+
+

f. Not Equal to

+
+
x != 227
+
+
[1] FALSE
+
+
+
+
+
+

3. Logical Operators

+

Logical operators are used to specify multiple conditions between objects. Logical operators work with basic data types such as logical, numeric, and complex data types. This returns TRUE or FALSE values. Numbers greater that 1 are TRUE and 0 equals FALSE. The table below describes the most common logical operators.

+ + + + + + + + + + + + + + + + + + + + + +
Logical OperatorDescription
!Logical NOT
|Element-wise logical OR
&Element-wise logical AND
+

Assign vectors to variables

+
+
vector_1 <- c(0,2)
+vector_2 <- c(1,0)
+
+
+

a. Logical NOT

+
+
!vector_1
+
+
[1]  TRUE FALSE
+
+
!vector_2
+
+
[1] FALSE  TRUE
+
+
+
+
+

b. Element-wise Logical OR

+
+
vector_1 | vector_2
+
+
[1] TRUE TRUE
+
+
+
+
+

c. Element-wise Logical AND

+
+
vector_1 & vector_2
+
+
[1] FALSE FALSE
+
+
+
+
+
+

4. Assignment Operators

+

These operators assign values to variables. A more comprehensive review can be obtained in Part 2 of the series.

+
+
+

5. Miscellaneous Operators

+

These are helpful operators for working in that can perform a variety of functions. A few common miscellaneous operators are described below.

+ ++++ + + + + + + + + + + + + + + + + + + + + +
Miscellaneous OperatorDescription
%*%Matrix multiplication (to be discussed in subsequent chapters)
%in%Does an element belong to a vector
:Generate a sequence
+
+

a. Sequence

+
+
a <- 1:8
+a
+
+
[1] 1 2 3 4 5 6 7 8
+
+
b <- 4:10
+b
+
+
[1]  4  5  6  7  8  9 10
+
+
+
+
+

b. Element in a vector

+
+
a %in% b
+
+
[1] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
+
+
9 %in% b
+
+
[1] TRUE
+
+
9 %in% a
+
+
[1] FALSE
+
+
+ + +
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/r_rstudio_basics/operators/r_and_rstudio.png b/_site/posts/r_rstudio_basics/operators/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/_site/posts/r_rstudio_basics/operators/r_and_rstudio.png differ diff --git a/_site/posts/r_rstudio_basics/software_install/r_and_rstudio.png b/_site/posts/r_rstudio_basics/software_install/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/_site/posts/r_rstudio_basics/software_install/r_and_rstudio.png differ diff --git a/_site/posts/r_rstudio_basics/software_install/r_interface.png b/_site/posts/r_rstudio_basics/software_install/r_interface.png new file mode 100644 index 0000000..274e214 Binary files /dev/null and b/_site/posts/r_rstudio_basics/software_install/r_interface.png differ diff --git a/_site/posts/r_rstudio_basics/software_install/rstudio_interface.png b/_site/posts/r_rstudio_basics/software_install/rstudio_interface.png new file mode 100644 index 0000000..fbe05e8 Binary files /dev/null and b/_site/posts/r_rstudio_basics/software_install/rstudio_interface.png differ diff --git a/_site/posts/r_rstudio_basics/software_install/software_install.html b/_site/posts/r_rstudio_basics/software_install/software_install.html new file mode 100644 index 0000000..44bea3d --- /dev/null +++ b/_site/posts/r_rstudio_basics/software_install/software_install.html @@ -0,0 +1,600 @@ + + + + + + + + + + + +William Okech - Getting Started with R and RStudio + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

Getting Started with R and RStudio

+

Software Installation

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

June 8, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Welcome!

+

In this 1st post, the reader will be introduced to the R programming language and RStudio software.

+
+

Introduction

+

This blog aims to introduce new R/RStudio users to the fundamentals of R and lay the groundwork for more in-depth statistical analysis, data visualization, and reporting methods. I hope to present the topics in a straightforward manner so that anyone new to programming is not intimidated.

+
+
+

What is R?

+

R is a programming language and open-source (freely available) software invented by Ross Ihaka and Robert Gentleman in 1993 (published as open-source in 1995) when they were based at the University of Auckland. Fun fact: R represents the first letter of the first names of the creators. The software is utilized by individuals working for various organizations ranging from academic institutions and healthcare organizations to financial services and information technology companies. In May 2022, the TIOBE index (a measure of programming language popularity) demonstrated that R was the 13th most popular programming language. R’s popularity may result from its highly extensible nature that allows users to perform statistical data analysis, generate visualizations, and report findings.

+
+

What are the benefits of using R?

+

As mentioned in the previous section, R is an open-source software that is highly extensible. Thousands of extensions (also known as packages) can be installed, allowing one to increase the number of available applications. The main advantages of R include: 1. A large community of users and developers that can provide learning support and assist with technical challenges, 2. The ability to perform reproducible research. 3. Its cross-platform nature, which means that it can be used on Linux, Windows, and Mac operating systems. 4. The ability to generate high-quality graphics from datasets of varying dimensions.

+
+
+

I’m looking for R. Where can I find it?

+

To install R on your personal computer, visit The R Project for Statistical Computing’s Comprehensive R Archive Network (CRAN), download the most recent version, and install it according to the website’s instructions. Once you download R, you can now experiment with some of its features.

+
+
+

+
+
+

Figure 1: The standard R interface (Windows)

+

When you open R, you will notice that it has a basic graphical user interface (GUI), and the console displays a command-line interface (CLI; where each command is executed one at a time). This may be intimidating for new users; however, there is a workaround for those who are not comfortable working at the command line. For those who are not experienced programmers, R can be used with an application called RStudio.

+
+
+
+

What is RStudio and how does it differ from R?

+

RStudio is an integrated development environment (IDE) for R that was developed by JJ Allaire. This software contains tools that make programming in R easier. RStudio extends R’s capabilities by making it easier to import data, write scripts, and generate visualizations and reports. The RStudio IDE is available for download from the RStudio website.

+
+
+

+
+
+

Figure 2: RStudio interface with four main panes (Windows)

+

Once installed, the basic layout of RStudio reveals that there is a script (text editor), console, navigation, and environment/history window pane. The script pane (text editor) in the upper-left allows one to write, open, edit, and execute more extended programs compared with using the standalone R software. The console pane (bottom-left) displays the script’s output and offers a command-line interface for typing code that is immediately executed. The environment pane (upper-right) displays information about the created objects, the history of executed code, and any external connections. Finally, the navigation pane (bottom-right) shows multiple tabs. Its primary tabs include the “Plot” tab, which shows graphics created by code, the “Packages” tab where the packages are installed, and the “Help” tab, which provides assistance for all things R and allows one to search the R documentation.

+
+

What are the primary benefits of RStudio?

+

RStudio allows one to create projects (a collection of related files stored within a working directory). Additionally, RStudio can be customized using options available under the “Tools” tab. Lastly, RStudio has Git integration that allows for version control where you can back up your code at different timepoints and effortlessly transfer code between computers.1

+
+
+
+

Conclusion

+

Hopefully, this was a helpful introduction to R and RStudio. In subsequent blog posts, we will focus on:

+
    +
  1. Part 1: Simple arithmetic,
  2. +
  3. Part 2: Variables,
  4. +
  5. Part 3: Data types,
  6. +
  7. Part 4: Operators,
  8. +
  9. Part 5: Vectors,
  10. +
  11. Part 6: Missing data
  12. +
  13. Part 7: Data Structures
  14. +
+ + +
+
+ + +

Footnotes

+ +
    +
  1. Summary of the benefits of R and RStudio obtained from Lander, J. P. (2014). R for everyone: Advanced analytics and graphics. Addison-Wesley.↩︎

  2. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/r_rstudio_basics/variables/env_pane_1.png b/_site/posts/r_rstudio_basics/variables/env_pane_1.png new file mode 100644 index 0000000..f4d782e Binary files /dev/null and b/_site/posts/r_rstudio_basics/variables/env_pane_1.png differ diff --git a/_site/posts/r_rstudio_basics/variables/post_3.html b/_site/posts/r_rstudio_basics/variables/post_3.html new file mode 100644 index 0000000..5c81690 --- /dev/null +++ b/_site/posts/r_rstudio_basics/variables/post_3.html @@ -0,0 +1,707 @@ + + + + + + + + + + + +William Okech - The Basics of R and RStudio + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

The Basics of R and RStudio

+

Part 2: Variables

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

June 22, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

Variables are instrumental in programming because they are used as “containers” to store data values.

+

To assign a value to a variable, we can use <− or =. However, most R users prefer to use <−.

+
+
+

Variable assignment

+
+

1. Using <-

+
+
variable_1 <- 5
+variable_1
+
+
[1] 5
+
+
+
+
+

2. Using =

+
+
variable_2 = 10
+variable_2
+
+
[1] 10
+
+
+
+
+

3. Reverse the value and variable with ->

+
+
15 -> variable_3
+variable_3
+
+
[1] 15
+
+
+
+
+

4. Assign two variables to one value

+
+
variable_4 <- variable_5 <- 30
+variable_4
+
+
[1] 30
+
+
variable_5
+
+
[1] 30
+
+
+
+
+
+

Variable output

+

The output of the variable can then be obtained by:

+
    +
  1. Typing the variable name and then pressing “Enter,”
  2. +
  3. Typing “print” with the variable name in brackets, print(variable), and
  4. +
  5. Typing “View” with the variable name in brackets, View(variable).
  6. +
+

Both print() and View() are some of the many built-in functions1 available in R.

+

In RStudio, the list of variables that have been loaded can be viewed in the environment pane.

+
+
+

+
+
+

Figure 1: A screenshot of the environment pane with the stored variables.

+
+
print(variable_1)
+
+
[1] 5
+
+
+
+
View(variable_2)
+
+

Output of View() will be seen in the script pane

+
+
+

The assign() and rm() functions

+

In addition to using the assignment operators (<- and =), we can use the assign() function to assign a value to a variable.

+
+
assign("variable_6", 555)
+variable_6
+
+
[1] 555
+
+
+

To remove the assignment of the value to the variable, either delete the variable in the “environment pane” or use the rm() function.

+
+
variable_7 <- 159
+
+
+
rm(variable_7)
+
+

After running rm() look at the environment pane to confirm whether variable_7 has been removed.

+
+
+

Naming variables

+

At this point, you may be wondering what conventions are used for naming variables. First, variables need to have meaningful names such as current_temp, time_24_hr, or weight_lbs. However, we need to be mindful of the variable style guide which provides us with the appropriate rules for naming variables.

+

Some rules to keep in mind are:

+
    +
  1. R is case-sensitive (variable is not the same as Variable),
  2. +
  3. Names similar to typical outputs or functions (TRUE, FALSE, if, or else) cannot be used,
  4. +
  5. Appropriate variable names can contain letters, numbers, dots, and underscores. However, you cannot start with an underscore, number, or dot followed by a number.
  6. +
+
+
+

Valid and invalid names

+
+

Valid names:

+
    +
  • time_24_hr
  • +
  • .time24_hr
  • +
+
+
+

Invalid names:

+
    +
  • _24_hr.time
  • +
  • 24_hr_time
  • +
  • .24_hr_time
  • +
+ + +
+
+ + +

Footnotes

+ +
    +
  1. Functions are a collection of statements (organized and reusable code) that perform a specific task, and R has many built-in functions.↩︎

  2. +
+
+ + +
+ + + + + \ No newline at end of file diff --git a/_site/posts/r_rstudio_basics/variables/r_and_rstudio.png b/_site/posts/r_rstudio_basics/variables/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/_site/posts/r_rstudio_basics/variables/r_and_rstudio.png differ diff --git a/_site/posts/r_rstudio_basics/vectors/r_and_rstudio.png b/_site/posts/r_rstudio_basics/vectors/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/_site/posts/r_rstudio_basics/vectors/r_and_rstudio.png differ diff --git a/_site/posts/r_rstudio_basics/vectors/vectors.html b/_site/posts/r_rstudio_basics/vectors/vectors.html new file mode 100644 index 0000000..e2adba8 --- /dev/null +++ b/_site/posts/r_rstudio_basics/vectors/vectors.html @@ -0,0 +1,920 @@ + + + + + + + + + + + +William Okech - The Basics of R and RStudio + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

The Basics of R and RStudio

+

Part 5: Vectors

+
+
RStudio
+
R
+
Tutorial
+
Blog
+
+
+
+ + +
+ +
+
Author
+
+

William Okech

+
+
+ +
+
Published
+
+

November 12, 2022

+
+
+ + +
+ + +
+ + + + +
+ + + + +
+

Introduction

+

A vector is a collection of elements of the same data type, and they are a basic data structure in R programming.

+

Vectors cannot be of mixed data type. The most common way to create a vector is with c(), where “c” stands for combine. In R, vectors do not have dimensions; therefore, they cannot be defined by columns or rows. Vectors can be divided into atomic vectors and lists (discussed in Part 7). The atomic vectors include logical, character, and numeric (integer or double).

+

Additionally, R is a vectorized language because mathematical operations are applied to each element of the vector without the need to loop through the vector.Examples of vectors are shown below:

+

• Numbers: c(2, 10, 16, -5)

+

• Characters: c("R", "RStudio", "Shiny", "Quarto")

+

• Logicals: c("TRUE", "FALSE", "TRUE")

+
+
+

Sequence Generation

+

To generate a vector with a sequence of consecutive numbers, we can use :, sequence(), or seq().

+
+

Generate a sequence using :

+
+
a <- 9:18
+a
+
+
 [1]  9 10 11 12 13 14 15 16 17 18
+
+
a_rev <- 18:9
+a_rev
+
+
 [1] 18 17 16 15 14 13 12 11 10  9
+
+
a_rev_minus <- 5:-3
+a_rev_minus
+
+
[1]  5  4  3  2  1  0 -1 -2 -3
+
+
+
+
+

Generate a sequence using sequence()

+
+
b <- sequence(7)
+b
+
+
[1] 1 2 3 4 5 6 7
+
+
c <- sequence(c(5,9))
+c
+
+
 [1] 1 2 3 4 5 1 2 3 4 5 6 7 8 9
+
+
+
+
+

Generate a sequence using seq()

+

The seq() function has four main arguments: seq(from, to, by, length.out), where “from” and “to” are the starting and ending elements of the sequence. Additionally, “by” is the difference between the elements, and “length.out” is the maximum length of the vector.

+
+
d <- seq(2,20,by=2)
+d
+
+
 [1]  2  4  6  8 10 12 14 16 18 20
+
+
f <- seq(2,20, length.out=5)
+f
+
+
[1]  2.0  6.5 11.0 15.5 20.0
+
+
h <- seq(20,2,by=-2)
+h
+
+
 [1] 20 18 16 14 12 10  8  6  4  2
+
+
j <- seq(20, 2, length.out=3)
+j
+
+
[1] 20 11  2
+
+
+
+
+
+

Repeating vectors

+

To create a repeating vector, we can use rep().

+
+
k <- rep(c(0,3,6), times = 3)
+k
+
+
[1] 0 3 6 0 3 6 0 3 6
+
+
l <- rep(2:6, each = 3)
+l
+
+
 [1] 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6
+
+
m <- rep(7:10, length.out = 20)
+m
+
+
 [1]  7  8  9 10  7  8  9 10  7  8  9 10  7  8  9 10  7  8  9 10
+
+
+
+
+

Vector Operations

+

Vectors of equal length can be operated on together. If one vector is shorter, it will get recycled, as its elements are repeated until it matches the elements of the longer vector. When using vectors of unequal lengths, it would be ideal if the longer vector is a multiple of the shorter vector.

+
+

Basic Vector Operations

+
+
vec_1 <- 1:10
+
+vec_1*12 # multiplication
+
+
 [1]  12  24  36  48  60  72  84  96 108 120
+
+
vec_1+12 # addition
+
+
 [1] 13 14 15 16 17 18 19 20 21 22
+
+
vec_1-12 # subtraction
+
+
 [1] -11 -10  -9  -8  -7  -6  -5  -4  -3  -2
+
+
vec_1/3 # division
+
+
 [1] 0.3333333 0.6666667 1.0000000 1.3333333 1.6666667 2.0000000 2.3333333
+ [8] 2.6666667 3.0000000 3.3333333
+
+
vec_1^4 # power
+
+
 [1]     1    16    81   256   625  1296  2401  4096  6561 10000
+
+
sqrt(vec_1) # square root
+
+
 [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
+ [9] 3.000000 3.162278
+
+
+
+
+

Operations on vectors of equal length

+

Additionally, we can perform operations on two vectors of equal length.

+
    +
  1. Create two vectors
  2. +
+
+
vec_3 <- 5:14
+vec_3
+
+
 [1]  5  6  7  8  9 10 11 12 13 14
+
+
vec_4 <- 12:3
+vec_4
+
+
 [1] 12 11 10  9  8  7  6  5  4  3
+
+
+
    +
  1. Perform various arithmetic operations
  2. +
+
+
vec_3 + vec_4
+
+
 [1] 17 17 17 17 17 17 17 17 17 17
+
+
vec_3 - vec_4
+
+
 [1] -7 -5 -3 -1  1  3  5  7  9 11
+
+
vec_3 / vec_4
+
+
 [1] 0.4166667 0.5454545 0.7000000 0.8888889 1.1250000 1.4285714 1.8333333
+ [8] 2.4000000 3.2500000 4.6666667
+
+
vec_3 * vec_4
+
+
 [1] 60 66 70 72 72 70 66 60 52 42
+
+
vec_3 ^ vec_4
+
+
 [1] 244140625 362797056 282475249 134217728  43046721  10000000   1771561
+ [8]    248832     28561      2744
+
+
+
+
+
+

Functions that can be applied to vectors

+

The functions listed below can be applied to vectors:

+
    +
  1. any()

  2. +
  3. all()

  4. +
  5. nchar()

  6. +
  7. length()

  8. +
  9. typeof()

  10. +
+
+

Examples

+
+
any(vec_3 > vec_4)
+
+
[1] TRUE
+
+
any(vec_3 < vec_4)
+
+
[1] TRUE
+
+
+
+
all(vec_3 > vec_4)
+
+
[1] FALSE
+
+
all(vec_3 < vec_4)
+
+
[1] FALSE
+
+
+
+
length(vec_3)
+
+
[1] 10
+
+
length(vec_4)
+
+
[1] 10
+
+
+
+
typeof(vec_3)
+
+
[1] "integer"
+
+
typeof(vec_4)
+
+
[1] "integer"
+
+
+

Determine the number of letters in a character

+
+
vec_5 <- c("R", "RStudio", "Shiny", "Quarto")
+nchar(vec_5)
+
+
[1] 1 7 5 6
+
+
+
+
+
+

Recycling of vectors

+
+
vec_3 + c(10, 20)
+
+
 [1] 15 26 17 28 19 30 21 32 23 34
+
+
vec_3 + c(10, 20, 30) # will result in a warning as the longer vector is not a multiple of the shorter one
+
+
Warning in vec_3 + c(10, 20, 30): longer object length is not a multiple of
+shorter object length
+
+
+
 [1] 15 26 37 18 29 40 21 32 43 24
+
+
+
+
+

Accessing elements of a vector

+

To access the elements of a vector, we can use numeric-, character-, or logical-based indexing.

+
+

Examples

+
+

1. Name the columns of a vector with names().

+

Create the vector.

+
+
vec_name <- 1:5
+vec_name
+
+
[1] 1 2 3 4 5
+
+
+

Name the individual elements.

+
+
names(vec_name) <- c("a", "c", "e", "g", "i")
+vec_name
+
+
a c e g i 
+1 2 3 4 5 
+
+
+
+
+

2. Use the vector index to filter

+
+
vec_index <- 1:5
+vec_index
+
+
[1] 1 2 3 4 5
+
+
+
+
a) Logical vector as an index
+
+
vec_index[c(TRUE, FALSE, TRUE, FALSE, TRUE)]
+
+
[1] 1 3 5
+
+
+
+
+
b) Filter vector based on an index
+
+
vec_index[1:3]
+
+
[1] 1 2 3
+
+
+
+
+
c) Access a vector using its position
+
+
vec_index[4]
+
+
[1] 4
+
+
vec_index[c(2,4)]
+
+
[1] 2 4
+
+
+
+
+
d) Modify a vector using indexing
+
+
vec_index
+
+
[1] 1 2 3 4 5
+
+
vec_index[5] <- 1000
+vec_index
+
+
[1]    1    2    3    4 1000
+
+
+ + +
+
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/posts/biotech/alt_protein/alt_protein_intro.qmd b/posts/biotech/alt_protein/alt_protein_intro.qmd new file mode 100644 index 0000000..271a6c4 --- /dev/null +++ b/posts/biotech/alt_protein/alt_protein_intro.qmd @@ -0,0 +1,244 @@ +--- +title: "Biotechnology for the Global South" +subtitle: "Developing alternative proteins to reduce childhood hunger and improve food production in low-income countries" +author: "William Okech" +date: "2023-10-27" +image: "images/biotech_cover.png" +categories: [RStudio, R, Biotechnology, Blog, Data Visualization] +format: html +toc: false +draft: false +warning: false +--- + +![*Bioreactor image created using "Bing Image Creator" with the prompt keywords "bioreactor, computer screen, bubbling liquids, cabinet, with a dark purple, sky-blue, neon lights, carpetpunk, backlit photography, and trillwave theme."*](images/Bioreactor.jpeg){fig-align="center" width="80%" fig-alt="Futuristic Bioreactor"} + +# Key Points + +- Hunger is a significant problem that affects close to one-tenth (1/10) of the world's population. +- Low-income countries, particularly in Africa, bear the brunt of the global childhood hunger crisis. +- Alternative protein technologies have the potential to transform the way humans produce and consume proteins. +- Increased consumer education, reduced production costs, and innovative technologies are key to the widespread acceptance and consumption of alternative proteins. + +# Introduction + +In the year 2022, between 691 and 783 million people were hungry [^1]. This alarming figure indicates that almost one-tenth (1/10) of the world's population (in 2022) did not consume enough calories to maintain a healthy and active lifestyle. Hunger (also referred to as undernutrition) is "an uncomfortable or painful physical sensation caused by insufficient consumption of dietary energy." [^2] The United Nations Children's Fund (UNICEF) estimates that for children under 5, 148.1 million suffered from stunting (low height-for-age) and 45 million from wasting (low weight-for-height), which placed them at increased risk for physical and/or mental disabilities [^3]. Moreover, in 2019, protein-energy undernutrition (an energy deficit resulting from the deficiencies of many macronutrients, primarily proteins) contributed to the deaths of 100,000 children (between 0--14 years), with approximately two-thirds from Africa [^4] [^5]. That a large proportion of the human population still faces the dual challenges of hunger and food insecurity is very disheartening. This is despite the continual increases in global agricultural productivity (resulting from the expansion of agricultural land area, better yielding crops, and improved animal production methods) that have been witnessed over the past century. + +[^1]: Hunger \| FAO \| Food and Agriculture Organization of the United Nations. https://www.fao.org/hunger/en/. + +[^2]: I am hungry. What does it mean? https://unric.org/en/i-am-hungry-what-does-it-mean/. + +[^3]: Malnutrition. https://www.who.int/health-topics/malnutrition#tab=tab_1. + +[^4]: Deaths from protein-energy malnutrition, by age, World, 1990 to 2019. https://ourworldindata.org/grapher/malnutrition-deaths-by-age. + +[^5]: Protein-Energy Undernutrition (PEU) - Nutritional Disorders - MSD Manual Professional Edition. https://www.msdmanuals.com/professional/nutritional-disorders/undernutrition/protein-energy-undernutrition-peu. + +The problem of hunger/undernourishment affects the world in an income-/region-dependent manner. Between the year 2000 and 2020, low-income countries had the highest share of their populations (25%--35%) that were undernourished, which was significantly above the world average (5%--15%) (@fig-undernourished-income). + +::: {#fig-undernourished-income fig-align="center" width="80%" fig-alt="Graph depicting the share of the population that is undernourished (grouped by income)"} +![](images/undernourished_income.png) + +Share of the population that is undernourished (grouped by income) +::: + +Additionally, we observe that the two regions that had the highest share of the population that was undernourished were Sub-Saharan Africa (20.9%) and South Asia (15.9%) (@fig-undernourished-region). + +::: {#fig-undernourished-region fig-align="center" width="80%" fig-alt="Graph depicting the share of the population that is undernourished (grouped by region)"} +![](images/undernourished_region.png) + +Share of the population that is undernourished (grouped by region) +::: + +To reduce the share of the population that is undernourished, various countries and development institutions have introduced interventions and technologies that can boost crop yields and enhance livestock production. These interventions include irrigation, fertilizers, improved seed, better insect and pest control strategies, and gene-editing. However, in Sub-Saharan Africa (which has the greatest burden of undernourishment and lowest crop yields), only 6% of cultivated land was irrigated and the rate of fertilizer application was approximately 17 kg/hectare, in 2018, which was significantly below the world average of 135 kg/hectare [^6]. Additionally, improved livestock rearing methods (such as selective breeding and improved nutrition), and fish and seafood harvesting techniques (such as selective harvesting and aquaculture systems) have increased global production. Sadly, we note that the continent of Africa has seen very little improvement in its livestock, fish, and seafood production capacity (@fig-meat_fish), and this may leave its fast-growing population susceptible to protein deficiencies if not addressed in a timely manner. + +[^6]: Africa Fertilizer Map 2020 -- AF-AP Partnership. https://afap-partnership.org/news/africa-fertilizer-map-2020/. + +::: {#fig-meat_fish layout-ncol="2"} +![Global Meat Production](images/continent_meat_1.png){#fig-meat} + +![Global Fish Production](images/africa_fish.png){#fig-fish} + +Global Meat and Fish Production +::: + +It is widely believed that promoting technologies that can improve crop yields and enhance livestock production in low-income countries is the best method to boost food production and subsequently reduce undernourishment. However, many of these interventions have been detrimental to the climate and local environment. With regard to enhancing crop production, some of these effects include: + +1. Land degradation and deforestation [^7], +2. Poisoning of fresh water/marine ecosystems by chemical runoff, and, +3. Depletion of fresh water sources resulting from overconsumption. + +[^7]: Impact of Sustainable Agriculture and Farming Practices. https://www.worldwildlife.org/industries/sustainable-agriculture/. + +Moreover, boosting output in the livestock sector has contributed to a number of major environmental challenges and resource conflicts. These include: + +1. Overgrazing, soil erosion, and deforestation [^8],[^9], +2. Contributing up to 15% of human-induced greenhouse gas (GHG) emissions [^10], +3. Conflict between pastoralists/farmers over grazing land/water [^11], and +4. Increased prevalence of antimicrobial resistance in livestock resulting from antibiotic misuse/overuse [^12]. + +[^8]: How Industrialized Meat Production Causes Land Degradation. https://populationeducation.org/industrialized-meat-production-and-land-degradation-3-reasons-to-shift-to-a-plant-based-diet/. + +[^9]: Feltran-Barbieri, R. & Féres, J. G. Degraded pastures in Brazil: improving livestock production and forest restoration. R Soc Open Sci 8, (2021). + +[^10]: Moving Towards Sustainability: The Livestock Sector and the World Bank. https://www.worldbank.org/en/topic/agriculture/brief/moving-towards-sustainability-the-livestock-sector-and-the-world-bank. + +[^11]: Pastoral conflict in Kenya -- ACCORD. https://www.accord.org.za/ajcr-issues/pastoral-conflict-in-kenya/. + +[^12]: Antimicrobial resistance and agriculture - OECD. https://www.oecd.org/agriculture/topics/antimicrobial-resistance-and-agriculture/. + +Overall, these findings suggest that enhancing livestock production and boosting crop yields to reduce undernourishment may not be the panacea we envision and may cause more long-term harm than good. Additionally, with more extreme weather events (such as heat waves, floods, and droughts) resulting from climate change [^13], volatility in global crop and food prices, and rapidly increasing world populations, there is a major need to develop and adopt alternative protein sources that can reduce childhood hunger and increase food production in low-income countries. In this essay, I will examine new and innovative alternative protein production technologies that can aid in generating foods with sufficient dietary energy and nutrients to meet human consumption requirements while reducing dependence on animal-based proteins. + +[^13]: Extreme Weather \| Facts -- Climate Change: Vital Signs of the Planet. https://climate.nasa.gov/extreme-weather/. + +# What are alternative proteins? + +Alternative proteins are plant-based and food technology alternatives to animal-based proteins [^14]. Proteins are large, complex molecules made up of smaller units called amino acids, and they are a key component of quality nutrition that promotes normal growth and maintenance [^15]. A significant advantage of alternative protein production is the reduced impact on the environment resulting from a decreased dependence on livestock-based protein production. This reduced impact is seen in the decreased greenhouse gas emissions and environmental pollution as well as the decline in the amounts of land and water required for livestock. Major sources of alternative proteins include plant proteins, insects, cultured meat, and fermentation-derived proteins [^16]. Generally, plant, insect, and fermentation-derived proteins are commercially available, while cultivated meats are still in the research and development phase [^17]. + +[^14]: Alternative proteins. https://sustainablecampus.unimelb.edu.au/sustainable-research/case-studies/alternative-proteins. + +[^15]: Protein. https://www.genome.gov/genetics-glossary/Protein. + +[^16]: The market for alternative protein: Pea protein, cultured meat, and more \| McKinsey. https://www.mckinsey.com/industries/agriculture/our-insights/alternative-proteins-the-race-for-market-share-is-on. + +[^17]: Defining alternative protein \| GFI. https://gfi.org/defining-alternative-protein/. + +# Plant proteins + +Plant proteins are harnessed directly from protein-rich seeds, and the main sources include leguminous (such as soy and pea), cereal (such as wheat and corn), and oilseed (such as peanut and flaxseed) proteins [^18]. The three major processing steps include protein extraction (centrifugation), protein purification (precipitation and ultrafiltration), and heat treatment (pasteurization) [^19]. Using the isolated proteins, specific products such as plant-based meats can be developed. To create these meats, the proteins are mixed with fibers and fats, then structured using heat and agitation, and lastly color, flavor, and aroma components are added to make the product more palatable. + +[^18]: Chandran, A. S., Suri, S. & Choudhary, P. Sustainable plant protein: an up-to-date overview of sources, extraction techniques and utilization. Sustainable Food Technology 1, 466--483 (2023). + +[^19]: Plant-based protein processing \| Alfa Laval. https://www.alfalaval.com/industries/food-dairy-beverage/food-processing/protein-processing/plant-based-protein-processing/. + +## Notable Companies + +1. Fry Family Food (South Africa) +2. Moolec Science (Luxembourg) +3. Beyond Meat (Los Angeles, California, USA) +4. Impossible Foods (Redwood City, California, USA) +5. New Wave Foods (Stamford, Connecticut, USA) +6. Eat Just (Alameda, California, USA) + +## Challenges to be addressed + +1. Develop crops optimized for plant-based meat that produce higher quantities of high-quality protein, +2. Improve protein extraction and processing methods, +3. Confirm that the taste, texture, and nutritional value are similar to conventional meats, and, +4. Ensure that the cost of production is competitive, and the process is energy-efficient [^20]. + +[^20]: The science of plant-based meat \| GFI APAC. https://gfi-apac.org/science/the-science-of-plant-based-meat/. + +# Insect proteins + +Proteins derived from insects are referred to as insect proteins. Insects are rich in essential nutrients such as amino acids, vitamins, and minerals. Insect-derived proteins have a dual role, as they can be eaten directly by humans or used as animal feed. A significant advantage of insect-derived proteins is their negligible environmental footprint, low cost of production, and absence of disease-causing pathogens (post-processing) [^21]. Numerous people groups across the world have traditionally consumed insects, and it is estimated that approximately 2,000 insect species are consumed in at least 113 countries [^22]. However, the reluctance to eat insects in many high-income countries and the abundance of other protein sources has prevented widespread acceptance. In contrast, insect-based proteins have shown great promise in the animal feed industry. Both black soldier fly and housefly-larvae have been used to replace fish meal and broiler feed, significantly reducing costs while not compromising final product quality [^23]. + +[^21]: How Insect Protein can Revolutionize the Food Industry. https://mindthegraph.com/blog/insect-protein/. + +[^22]: Yen, A. L. Edible insects: Traditional knowledge or western phobia? Entomol Res 39, 289--298 (2009). + +[^23]: Kim, T. K., Yong, H. I., Kim, Y. B., Kim, H. W. & Choi, Y. S. Edible Insects as a Protein Source: A Review of Public Perception, Processing Technology, and Research Trends. Food Sci Anim Resour 39, 521 (2019). + +## Notable Companies + +1. Next Protein (France/Tunisia) +2. Biobuu (Tanzania) +3. Inseco (Cape Town, South Africa) +4. InsectiPro (Limuru, Kenya) +5. Entocycle (United Kingdom)\ +6. Ecodudu (Nairobi, Kenya) +7. Ynsect (Paris, France) +8. Protix (Dongen, Netherlands) +9. All Things Bugs (Oklahoma City, Oklahoma, USA) + +## Challenges to be addressed + +1. Develop tools for product scale-up, +2. Lower the production costs, and, +3. Change negative consumer attitudes towards insect-based foods [^24]. + +[^24]: The Growing Animal Feed Insect Protein Market. https://nutrinews.com/en/the-growing-animal-feed-insect-protein-market-opportunities-and-challenges/. + +# Fermentation-derived proteins + +Fermentation involves the transformation of sugars into new products via chemical reactions carried out by microorganisms. This process has been referred to as "humanity's oldest biotechnological tool" because humans have previously used it to create foods, medicines, and fuels [^25]. The three main categories of fermentation include traditional, biomass, and precision fermentation [^26]. + +[^25]: Taveira, I. C., Nogueira, K. M. V., Oliveira, D. L. G. de & Silva, R. do N. Fermentation: Humanity's Oldest Biotechnological Tool. Front Young Minds 9, (2021). + +[^26]: Fermentation for alternative proteins 101 \| Resource guide \| GFI. https://gfi.org/fermentation/. + +1. Traditional fermentation uses intact live microorganisms and microbial anaerobic digestion to process plant-based foods. This results in a change in the flavor and function of plant-based foods and ingredients. +2. Biomass fermentation uses the microorganisms that reproduce during the fermentation process as ingredients. The microorganisms naturally have high-protein content, and allowing them to reproduce efficiently makes large amounts of protein-rich food. +3. Precision fermentation uses programmed microorganisms as "cellular production factories" to develop proteins, fats, and other nutrients [^27]. + +[^27]: Fermentation for alternative proteins 101 \| Resource guide \| GFI. https://gfi.org/fermentation/. + +## Notable Companies + +1. Essential Impact (East Africa) +2. De Novo Foodlabs (Cape Town, South Africa) +3. MycoTechnology (Aurora, Colorado, USA) +4. Quorn (Stokesley, UK) +5. Perfect Day (Berkeley, California, USA) + +## Challenges to be addressed + +1. Identify the correct molecules to manufacture in a cost-effective manner, +2. Develop the appropriate microbial strains for the relevant products, +3. Determine the appropriate feedstocks, +4. Design low-cost bioreactors and systems for scaling-up processes, and, +5. Improve end-product formulation to allow for better taste/texture [^28]. + +[^28]: The science of fermentation (2023) \| GFI. https://gfi.org/science/the-science-of-fermentation/. + +# Animal proteins from cultivated meat + +The cultivated meat industry develops animal proteins that are grown from animal cells directly. Here, tissue-engineering techniques commonly used in regenerative medicine aid in product development [^29]. Cells obtained from an animal are put into a bioreactor to replicate, and when they reach the optimal density, they are harvested via centrifugation, and the resulting muscle and fat tissue are formed into the recognizable meat structure. The advantages of producing meat in this way include: reduced contamination, decreased antibiotic use, and a lower environmental footprint [^30]. + +[^29]: What is cultivated meat? \| McKinsey. https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-cultivated-meat. + +[^30]: The science of cultivated meat \| GFI. https://gfi.org/science/the-science-of-cultivated-meat/. + +## Notable Companies + +1. WildBio (formerly Mogale Meat; Pretoria, South Africa) +2. Newform Foods (formerly Mzansi Meat; Cape Town, South Africa) +3. Mosa Meat (Maastricht, Netherlands) +4. Bluu Seafood (Berlin, Germany) +5. Eat Just (Singapore) +6. Clear Meat (Delhi NCR, India) +7. Sea-Stematic (Cape Town, South Africa) + +## Challenges to be addressed + +Even though many companies have entered the cultivated meat space, not many have received the requisite regulatory approval to sell their products with some countries temporarily halting development [^31]. Other challenges include: + +[^31]: Che Sorpresa! Italy U-Turns on Cultivated Meat Ban -- For Now. https://www.greenqueen.com.hk/italy-cultivated-meat-ban-lab-grown-food-cultured-protein-eu-tris-notification-francesco-lollobrigida/. + +1. Insufficient bioreactor capacity, +2. High cost of growth media and factors required for cultivation, +3. Lack of products in the market despite large investments [^32], and, +4. High final product cost [^33] and a major need for consumer education [^34]. + +[^32]: Is overhype dooming the cultivated meat industry? https://www.fastcompany.com/90966338/hype-built-the-cultivated-meat-industry-now-it-could-end-it. + +[^33]: Meat substitutes need to get a lot cheaper. https://www.sustainabilitybynumbers.com/p/meat-substitutes-price. + +[^34]: What is cultivated meat? \| McKinsey. https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-cultivated-meat. + +# What factors will influence the adoption of alternative proteins in low-income countries? + +The insect-based protein market has the potential to grow faster than the other three alternative protein segments (plant-based, fermentation-based, and cultivated meat) in low-income countries. This is because of fewer barriers to entry and lower setup costs. Therefore, to enhance the adoption of the other alternative protein segments in low-income countries, there is a need to build biomanufacturing capacity, increase R&D funding, and develop a strong workforce by recruiting more students and researchers to the field. Additionally, it would be important for national-level regulations and policies that support the sector to be implemented. On an individual level, several factors will affect the large-scale adoption of alternative proteins in low-income countries [^35]. These include: + +[^35]: The market for alternative protein: Pea protein, cultured meat, and more \| McKinsey. https://www.mckinsey.com/industries/agriculture/our-insights/alternative-proteins-the-race-for-market-share-is-on. + +1. Cost (dollars per kilogram of 100% protein), which will need to be similar or lower than that for conventional animal-derived proteins, +2. The protein digestibility-corrected amino acid score (PDCAAS) which is a tool used to measure a protein by its amino acid requirements and the ability of humans to digest it, +3. The economic impact on agricultural workers in the livestock and fishing industry, and, +4. Consumer adoption [^36] (which is dependent on perception, taste, texture, safety, and convenience). + +[^36]: Meat substitutes need to get a lot cheaper. https://www.sustainabilitybynumbers.com/p/meat-substitutes-price. + +# Conclusion + +In summary, I hope that I have convinced you that there is an urgent need to address the crisis of hunger and undernourishment worldwide. Second, this essay should have demonstrated to you that optimizing conventional agricultural practices may also simultaneously negatively impact the environment. Third, the reader should now have a basic understanding of alternative proteins and their potential to address undernutrition. Lastly, to tackle the problem of hunger and undernourishment, it is imperative for society to embrace novel alternative protein production technologies that can enhance food production while minimizing the environmental impact and contribution to climate change. diff --git a/posts/biotech/alt_protein/images/africa_fish.png b/posts/biotech/alt_protein/images/africa_fish.png new file mode 100644 index 0000000..7fc1e59 Binary files /dev/null and b/posts/biotech/alt_protein/images/africa_fish.png differ diff --git a/posts/biotech/alt_protein/images/bioreactor.jpeg b/posts/biotech/alt_protein/images/bioreactor.jpeg new file mode 100644 index 0000000..663b7cf Binary files /dev/null and b/posts/biotech/alt_protein/images/bioreactor.jpeg differ diff --git a/posts/biotech/alt_protein/images/biotech_cover.png b/posts/biotech/alt_protein/images/biotech_cover.png new file mode 100644 index 0000000..4f97857 Binary files /dev/null and b/posts/biotech/alt_protein/images/biotech_cover.png differ diff --git a/posts/biotech/alt_protein/images/continent_meat_1.png b/posts/biotech/alt_protein/images/continent_meat_1.png new file mode 100644 index 0000000..73e2de8 Binary files /dev/null and b/posts/biotech/alt_protein/images/continent_meat_1.png differ diff --git a/posts/biotech/alt_protein/images/undernourished_income.png b/posts/biotech/alt_protein/images/undernourished_income.png new file mode 100644 index 0000000..ec2abb4 Binary files /dev/null and b/posts/biotech/alt_protein/images/undernourished_income.png differ diff --git a/posts/biotech/alt_protein/images/undernourished_region.png b/posts/biotech/alt_protein/images/undernourished_region.png new file mode 100644 index 0000000..b45f843 Binary files /dev/null and b/posts/biotech/alt_protein/images/undernourished_region.png differ diff --git a/posts/biotech/alt_protein/post_1.ipynb b/posts/biotech/alt_protein/post_1.ipynb new file mode 100644 index 0000000..084d745 --- /dev/null +++ b/posts/biotech/alt_protein/post_1.ipynb @@ -0,0 +1,263 @@ +{ + "cells": [ + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "---\n", + "title: \"Biotechnology for the Global South\"\n", + "subtitle: \"Leveraging biotechnological innovations to reduce childhood hunger and improve food production in low-income countries\"\n", + "author: \"William Okech\"\n", + "date: \"2023-10-27\"\n", + "image: \"images/bioreactor.jpeg\"\n", + "categories: [RStudio, R, Biotech for the Global South, Blog, Data Visualization]\n", + "format: html\n", + "toc: false\n", + "draft: false\n", + "warning: false\n", + "---" + ], + "id": "ee602deb" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "```{css, echo = FALSE}\n", + ".justify {\n", + " text-align: justify !important\n", + "}\n", + "```\n", + "\n", + "\n", + "![*Bioreactor image created using \"Bing Image Creator\" with the prompt keywords \"bioreactor, computer screen, bubbling liquids, cabinet, with a dark purple, sky-blue, neon lights, carpetpunk, backlit photography, and trillwave theme.\"*](images/Bioreactor.jpeg){fig-align=\"center\" width=\"80%\" fig-alt=\"Futuristic Bioreactor\"}\n", + "\n", + "# Key Points\n", + "\n", + "- Hunger is a significant problem affecting close to one-tenth (1/10) of the world's population.\n", + "- Low-income countries, particularly in Africa, bear the brunt of the global hunger crisis.\n", + "- Alternative proteins have the potential to transform the way humans produce proteins.\n", + "- Increased consumer education, reduced production costs, and efficient biotechnological protocols are key to the widespread adoption of alternative proteins.\n", + "\n", + "# Introduction\n", + "\n", + "In the year 2022, between 691 and 783 million people were hungry (or undernourished)[^1]. Hunger results from an \"insufficient consumption of dietary energy,\" and this staggering figure indicated that almost one-tenth (1/10) of the world's population (in 2022) did not consume enough calories to maintain a healthy and active lifestyle. In children, the two main signs of undernutrition are stunting (low height-for-age) and wasting (low weight-for-height), which can result in physical and/or mental disability[^2]. Moreover, in 2019, protein-energy undernutrition (an energy deficit resulting from deficiencies of many macronutrients, primarily proteins) which manifests in 3 main forms, namely marasmus (wasting), kwashiorkor (edema), and marasmic kwashiorkor (both edema and wasting), contributed to the deaths of 200,000 people worldwide (mainly under-5s and 70+ year olds), with approximately half from Africa[^3] [^4]. Despite the continual increases in global agricultural productivity (resulting from the expansion of agricultural land area, better yielding crops, and improved animal production methods) that have been witnessed over the past century, it is still discouraging that a large proportion of the human population faces the dual challenges of hunger and food insecurity. This is particularly true for low-income countries that have the highest share of their populations affected by undernourishment (@fig-undernourished-income).\n", + "\n", + "[^1]: Hunger \\| FAO \\| Food and Agriculture Organization of the United Nations. https://www.fao.org/hunger/en/.\n", + "\n", + "[^2]: Malnutrition. https://www.who.int/health-topics/malnutrition#tab=tab_1.\n", + "\n", + "[^3]: Deaths from protein-energy malnutrition, by age, World, 1990 to 2019. https://ourworldindata.org/grapher/malnutrition-deaths-by-age.\n", + "\n", + "[^4]: Protein-Energy Undernutrition (PEU) - Nutritional Disorders - MSD Manual Professional Edition. https://www.msdmanuals.com/professional/nutritional-disorders/undernutrition/protein-energy-undernutrition-peu.\n", + "\n", + "::: {#fig-undernourished-income fig-align=\"center\" width=\"80%\" fig-alt=\"Graph depicting the share of the population that is undernourished (grouped by income)\"}\n", + "![](images/undernourished_income.png)\n", + "\n", + "Share of the population that is undernourished (grouped by income)\n", + ":::\n", + "\n", + "Additionally, we observe that the two regions that had the highest share of the population that was undernourished were Sub-Saharan Africa (20.9%) and South Asia (15.9%) (@fig-undernourished-region).\n", + "\n", + "::: {#fig-undernourished-region fig-align=\"center\" width=\"80%\" fig-alt=\"Graph depicting the share of the population that is undernourished (grouped by region)\"}\n", + "![](images/undernourished_region.png)\n", + "\n", + "Share of the population that is undernourished (grouped by region)\n", + ":::\n", + "\n", + "It is widely acknowledged that interventions such as irrigation, fertilizer use, insect and pest control, gene-editing, and improved seed can boost crop yields and agricultural productivity. However, on the continent of Africa (which has the greatest burden of undernourishment and lowest crop yields), only 6% of cultivated land was irrigated and the rate of fertilizer application was approximately 17 kg/hectare which was way below the world average of 135 kg/hectare in 2018[^5]. Additionally, improved livestock rearing methods (such as selective breeding and improved nutrition), and fish and seafood harvesting techniques (such as selective harvesting and aquaculture systems) have increased global production significantly. Unfortunately, we observe again that the continent of Africa has seen very little improvement in its livestock, fish, and seafood production capacity, which leaves its fast-growing population susceptible to nutrient deficiencies if not addressed in a timely manner (@fig-meat_fish).\n", + "\n", + "[^5]: Africa Fertilizer Map 2020 -- AF-AP Partnership. https://afap-partnership.org/news/africa-fertilizer-map-2020/.\n", + "\n", + "::: {#fig-meat_fish layout-ncol=\"2\"}\n", + "![Global Meat Production](images/continent_meat_1.png){#fig-meat}\n", + "\n", + "![Global Fish Production](images/africa_fish.png){#fig-fish}\n", + "\n", + "Global Meat and Fish Production\n", + ":::\n", + "\n", + "One of the methods used to boost food production and subsequently reduce undernourishment has been to develop technologies that can directly improve agricultural productivity and crop yields. However, we now know that many of these interventions have been detrimental to the climate and local environment. With regard to crop production, some of these effects have included:\n", + "\n", + "1. The poisoning of fresh water and marine ecosystems by chemical runoff,\n", + "2. Excess water consumption and depletion of fresh water sources, and,\n", + "3. Deforestation, land degradation, and ecological destruction[^6].\n", + "\n", + "[^6]: Impact of Sustainable Agriculture and Farming Practices. https://www.worldwildlife.org/industries/sustainable-agriculture/.\n", + "\n", + "Moreover, we know that the livestock sector has contributed to a number of significant challenges facing the world today. These include:\n", + "\n", + "1. Contributing up to 15% of the human-induced greenhouse gas (GHG) emissions[^7],\n", + "2. Overgrazing, soil erosion, and deforestation[^8] [^9],\n", + "3. Conflict between pastoralists over finite resources like grazing land/water, and theft[^10], and,\n", + "4. The prevalence of antimicrobial resistance resulting from misuse/overuse[^11].\n", + "\n", + "[^7]: Moving Towards Sustainability: The Livestock Sector and the World Bank. https://www.worldbank.org/en/topic/agriculture/brief/moving-towards-sustainability-the-livestock-sector-and-the-world-bank.\n", + "\n", + "[^8]: How Industrialized Meat Production Causes Land Degradation. https://populationeducation.org/industrialized-meat-production-and-land-degradation-3-reasons-to-shift-to-a-plant-based-diet/.\n", + "\n", + "[^9]: Feltran-Barbieri, R. & Féres, J. G. Degraded pastures in Brazil: improving livestock production and forest restoration. R Soc Open Sci 8, (2021).\n", + "\n", + "[^10]: Pastoral conflict in Kenya -- ACCORD. https://www.accord.org.za/ajcr-issues/pastoral-conflict-in-kenya/.\n", + "\n", + "[^11]: Antimicrobial resistance and agriculture - OECD. https://www.oecd.org/agriculture/topics/antimicrobial-resistance-and-agriculture/.\n", + "\n", + "Overall, these observations indicate to us that simply enhancing agricultural productivity and crop yields to reduce undernourishment may not be the panacea we envision and may cause more long-term harm than good. Moreover, with an increasing number of extreme weather events (such as heat waves, floods, and droughts) resulting from climate change[^12], international conflicts that influence global crop and food prices, and rapidly changing world populations, there is an increasing need to develop and adopt bespoke biotechnological solutions that will address hunger. In this essay, I will not address the numerous challenges that affect commodity markets and agriculture/food production systems worldwide, but I will examine new and innovative biotechnological solutions that can significantly improve access to dietary energy and nutrients. Specifically, the focus will be on how innovations in the development of alternative proteins can be an effective, low-cost tool for generating large amounts of nutrients for both human and animal consumption.\n", + "\n", + "[^12]: Extreme Weather \\| Facts -- Climate Change: Vital Signs of the Planet. https://climate.nasa.gov/extreme-weather/.\n", + "\n", + "# What are alternative proteins?\n", + "\n", + "Alternative proteins are plant-based and food technology alternatives to animal-based proteins[^13]. Proteins are large, complex molecules made up of smaller units called amino acids that are a key component of quality nutrition and promote normal growth and maintenance[^14]. A significant advantage of these alternative proteins is the fewer inputs required (such as land and water) and the reduced greenhouse gas emissions and environmental pollution associated with their production. Major sources of alternative proteins include plant proteins, insects, cultured meat, and mycoproteins (via fermentation)[^15]. Generally, plant, insect, and fermentation-derived proteins are commercially available. However, cultivated meats are still in the research and development phase[^16].\n", + "\n", + "[^13]: Alternative proteins. https://sustainablecampus.unimelb.edu.au/sustainable-research/case-studies/alternative-proteins.\n", + "\n", + "[^14]: Protein. https://www.genome.gov/genetics-glossary/Protein.\n", + "\n", + "[^15]: The market for alternative protein: Pea protein, cultured meat, and more \\| McKinsey. https://www.mckinsey.com/industries/agriculture/our-insights/alternative-proteins-the-race-for-market-share-is-on.\n", + "\n", + "[^16]: Defining alternative protein \\| GFI. https://gfi.org/defining-alternative-protein/.\n", + "\n", + "# Plant proteins\n", + "\n", + "Plant proteins are harnessed directly from protein-rich seeds, and the main sources include leguminous (such as soy and pea), cereal (such as wheat and corn), and oilseed (such as peanut and flaxseed) proteins[^17]. The three major steps include protein extraction (centrifugation), protein purification (precipitation and ultrafiltration), and heat treatment (pasteurization)[^18]. Using these isolated proteins, specific products such as plant-based meats can be developed. To create these meats, the proteins are mixed with fibers and fats, then structured using heat and agitation, and lastly color, flavor, and aroma components are added to make the product more palatable.\n", + "\n", + "[^17]: Chandran, A. S., Suri, S. & Choudhary, P. Sustainable plant protein: an up-to-date overview of sources, extraction techniques and utilization. Sustainable Food Technology 1, 466--483 (2023).\n", + "\n", + "[^18]: Plant-based protein processing \\| Alfa Laval. https://www.alfalaval.com/industries/food-dairy-beverage/food-processing/protein-processing/plant-based-protein-processing/.\n", + "\n", + "## Companies and their products (notable examples)\n", + "\n", + "1. Beyond Meat (Los Angeles, California, USA) -- sells burgers that contain five main ingredients: pea protein isolate, canola and coconut oils, trace minerals, carbohydrates, and water, which is included in beet juice to simulate the red color of beef.\n", + "2. Impossible Foods (Redwood City, California, USA) -- sells burgers containing soy protein concentrate, soy protein isolate, and potato protein, as well as coconut and sunflower oils, natural flavors, and several hydrocolloids, minerals, and vitamins. Additionally, it also has soy leghemoglobin, a heme-containing protein from the roots of soy plants.\n", + "3. New Wave Foods (Stamford, Connecticut, USA) -- produces a plant-based shrimp alternative containing seaweed, soy protein, and natural flavors and other flavor and aroma components to enhance taste[^19].\n", + "\n", + "[^19]: How Plant-Based Meat and Seafood Are Processed - IFT.org. https://www.ift.org/news-and-publications/food-technology-magazine/issues/2019/october/columns/processing-how-plant-based-meat-and-seafood-are-processed.\n", + "\n", + "## Challenges to be addressed\n", + "\n", + "1. Develop crops optimized for plant-based meat that produce higher quantities of high-quality protein,\n", + "2. Improve protein extraction and processing methods,\n", + "3. Confirm that the taste, texture, and nutritional value are similar to conventional meats, and,\n", + "4. Ensure that the cost of production is competitive and the process is energy-efficient[^20].\n", + "\n", + "[^20]: The science of plant-based meat \\| GFI APAC. https://gfi-apac.org/science/the-science-of-plant-based-meat/.\n", + "\n", + "# Insect proteins\n", + "\n", + "Proteins derived from insects are referred to as insect proteins. Insects are rich in essential nutrients such as amino acids, vitamins, and minerals. Insect-derived proteins have a dual role, as they can be eaten directly by humans or used as animal feed. A significant advantage of insect-derived proteins is the low environmental footprint, low cost of production, and absence of disease-causing pathogens (post-processing)[^21]. Very many people groups across the world have traditionally consumed insects, and it is estimated that approximately 2,000 insect species are consumed in at least 113 countries[^22]; however, the reluctance to eat insects in many high-income countries and the abundance of other protein sources has limited widespread adoption. In contrast, insect-based proteins have great potential for use in the animal feed industry. Both black soldier fly and housefly-larvae have been used to replace fish meal and broiler feed, significantly reducing costs while not compromising final product quality[^23].\n", + "\n", + "[^21]: How Insect Protein can Revolutionize the Food Industry. https://mindthegraph.com/blog/insect-protein/.\n", + "\n", + "[^22]: Yen, A. L. Edible insects: Traditional knowledge or western phobia? Entomol Res 39, 289--298 (2009).\n", + "\n", + "[^23]: Kim, T. K., Yong, H. I., Kim, Y. B., Kim, H. W. & Choi, Y. S. Edible Insects as a Protein Source: A Review of Public Perception, Processing Technology, and Research Trends. Food Sci Anim Resour 39, 521 (2019).\n", + "\n", + "## Companies and their products (notable examples)\n", + "\n", + "1. Ynsect (Paris, France) -- produces animal, human, and plant foods made from mealworm beetles,\n", + "2. Protix (Dongen, Netherlands) -- produces insect ingredients (proteins and other nutrients) and animal feed from black soldier flies,\n", + "3. Hey Planet (Copenhagen, Denmark) -- creates food products out of buffalo beetles and crickets, and,\n", + "4. All Things Bugs (Oklahoma City, Oklahoma, USA) -- develops Griopro® Cricket Powder that is used in a wide variety of foods and drinks[^24].\n", + "\n", + "[^24]: 7 Insect-Based Food Startups on the Rise in 2023 \\| Moneywise. https://moneywise.com/investing/insect-based-food-startups.\n", + "\n", + "## Challenges to be addressed\n", + "\n", + "1. Develop tools for product scale-up,\n", + "2. Lower the production costs, and,\n", + "3. Change negative consumer attitudes towards insect-based foods[^25].\n", + "\n", + "[^25]: The Growing Animal Feed Insect Protein Market. https://nutrinews.com/en/the-growing-animal-feed-insect-protein-market-opportunities-and-challenges/.\n", + "\n", + "# Fermentation-derived proteins\n", + "\n", + "Fermentation involves the transformation of sugars into new products via chemical reactions carried out by microorganisms. This process has been referred to as \"humanity's oldest biotechnological tool\" because humans have previously used it to create foods, medicines, and fuels[^26]. The three main categories of fermentation include traditional, biomass, and precision fermentation[^27].\n", + "\n", + "[^26]: Taveira, I. C., Nogueira, K. M. V., Oliveira, D. L. G. de & Silva, R. do N. Fermentation: Humanity's Oldest Biotechnological Tool. Front Young Minds 9, (2021).\n", + "\n", + "[^27]: Fermentation for alternative proteins 101 \\| Resource guide \\| GFI. https://gfi.org/fermentation/.\n", + "\n", + "1. Traditional fermentation uses intact live microorganisms and microbial anaerobic digestion to process plant-based foods. This results in a change in the flavor and function of plant-based foods and ingredients.\n", + "2. Biomass fermentation uses the microorganisms that reproduce via this process as ingredients. The microorganisms naturally have high-protein content, and allowing them to reproduce efficiently makes large amounts of protein-rich food.\n", + "3. Precision fermentation uses programmed microorganisms as \"cellular production factories\" to develop proteins, fats, and other nutrients[^28].\n", + "\n", + "[^28]: Fermentation for alternative proteins 101 \\| Resource guide \\| GFI. https://gfi.org/fermentation/.\n", + "\n", + "## Companies and their products (notable examples)\n", + "\n", + "1. MycoTechnology (Aurora, Colorado, USA) -- ferments plant-based proteins to enhance flavor.\n", + "2. Quorn (Stokesley, UK) -- a meat-free super-protein produced from unprocessed microbial biomass.\n", + "3. Perfect Day (Berkeley, California, USA) -- creates biosynthetic dairy proteins by fermentation using fungi in bioreactors.\n", + "\n", + "## Challenges to be addressed\n", + "\n", + "1. Identify the correct molecules to manufacture in a cost-effective manner,\n", + "2. Develop the appropriate microbial strains for the relevant products,\n", + "3. Determine the appropriate feedstocks,\n", + "4. Design low-cost bioreactors and systems for scaling-up processes, and,\n", + "5. Improve end-product formulation to allow for better taste/texture[^29].\n", + "\n", + "[^29]: The science of fermentation (2023) \\| GFI. https://gfi.org/science/the-science-of-fermentation/.\n", + "\n", + "# Animal proteins from cultivated meat\n", + "\n", + "The cultivated meat industry develops animal proteins that are grown from animal cells directly. Here, tissue-engineering techniques commonly used in regenerative medicine aid in product development[^30]. Cells obtained from an animal are put into a bioreactor to replicate, and when they reach the optimal density, they are harvested via centrifugation, and the resulting muscle and fat tissue are formed into the recognizable meat structure. The advantages of producing meat in this way include: reduced contamination, decreased antibiotic use, and a lower environmental footprint[^31].\n", + "\n", + "[^30]: What is cultivated meat? \\| McKinsey. https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-cultivated-meat.\n", + "\n", + "[^31]: The science of cultivated meat \\| GFI. https://gfi.org/science/the-science-of-cultivated-meat/.\n", + "\n", + "## Companies and their products (notable examples)\n", + "\n", + "1. Mosa Meat (Maastricht, Netherlands) which is developing beef products\n", + "2. Bluu Seafood (Berlin, Germany) which is developing \"lab-grown\" seafood\n", + "3. Eat Just (Singapore) which sells cultured chicken[^32].\n", + "\n", + "[^32]: Will Europe Follow Singapore in Approving Cultured Meat? https://www.labiotech.eu/trends-news/cultured-meat-eat-just/.\n", + "\n", + "## Challenges to be addressed\n", + "\n", + "Even though many companies have entered the cultivated meat space, not many have received the requisite regulatory approval to sell their products with some countries temporarily halting development[^33]. Other challenges include:\n", + "\n", + "[^33]: Che Sorpresa! Italy U-Turns on Cultivated Meat Ban -- For Now. https://www.greenqueen.com.hk/italy-cultivated-meat-ban-lab-grown-food-cultured-protein-eu-tris-notification-francesco-lollobrigida/.\n", + "\n", + "1. A lack of sufficient bioreactor capacity,\n", + "2. The high cost of growth media and factors,\n", + "3. Delays in releasing products resulting in accusations of hype [^34],\n", + "4. High product cost[^35] and the need for consumer awareness[^36].\n", + "\n", + "[^34]: Is overhype dooming the cultivated meat industry? https://www.fastcompany.com/90966338/hype-built-the-cultivated-meat-industry-now-it-could-end-it.\n", + "\n", + "[^35]: Meat substitutes need to get a lot cheaper. https://www.sustainabilitybynumbers.com/p/meat-substitutes-price.\n", + "\n", + "[^36]: What is cultivated meat? \\| McKinsey. https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-cultivated-meat.\n", + "\n", + "# What factors will influence the adoption of alternative proteins in low-income countries?\n", + "\n", + "From the examples of companies provided above, it is clear that most research and product development in the field of alternative proteins is concentrated in high-income countries in the Global North. This, despite the fact that, the urgent need for alternative proteins is heavily concentrated in low-income countries in the Global South. I believe that the first steps to address this disparity would be to build biomanufacturing capacity, increase R&D funding, and recruit more students and researchers to the field. Additionally, it would be very important for regulations and policies that support the sector to be implemented. On an individual level, a number of factors will affect the large-scale adoption of alternative proteins in low-income countries[^37]. These include:\n", + "\n", + "[^37]: The market for alternative protein: Pea protein, cultured meat, and more \\| McKinsey. https://www.mckinsey.com/industries/agriculture/our-insights/alternative-proteins-the-race-for-market-share-is-on.\n", + "\n", + "1. Cost (dollars per kilogram of 100% protein) and economic viability for low-income earners, which will need to be lower than conventional proteins,\n", + "2. The protein digestibility-corrected amino acid score (PDCAAS) which is a tool used to measure a protein by its amino acid requirements and the ability of humans to digest it,\n", + "3. The environmental impact and effect on the livelihoods of farmers, and,\n", + "4. Consumer acceptance (which is dependent on perception, taste, texture, safety, and convenience).\n", + "\n", + "# Conclusion\n", + "\n", + "In summary, I hope that I have convinced you that there is an urgent need to address hunger and undernourishment worldwide. Second, this essay should have demonstrated to you that optimizing conventional agricultural practices will not effectively yield the required results. Third, the reader should now have a basic understanding of alternative proteins and their potential to address undernutrition. Lastly, to tackle the problem of hunger and undernourishment, it is imperative for society to embrace novel biotechnologies that can enhance food production while minimizing environmental degradation and not contributing negatively to climate change." + ], + "id": "6c5682b4" + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/0_unpublished/danger_disturb.qmd b/posts/code_along_with_me/danger_disturb/danger_disturb.qmd similarity index 87% rename from 0_unpublished/danger_disturb.qmd rename to posts/code_along_with_me/danger_disturb/danger_disturb.qmd index e940c70..71de4f6 100644 --- a/0_unpublished/danger_disturb.qmd +++ b/posts/code_along_with_me/danger_disturb/danger_disturb.qmd @@ -1,9 +1,9 @@ --- title: "Code Along With Me (Episode 1)" -subtitle: "An assessment of the livestock numbers in the six counties declared to be 'dangerous and disturbed' in Kenya" +subtitle: "An assessment of the livestock numbers in the six counties declared to be 'disturbed and dangerous' in Kenya" author: "William Okech" date: "2023-11-28" -image: "" +image: "images/code_along_with_me_cover.png" categories: [RStudio, R, Tutorial, Blog] toc: true format: @@ -13,9 +13,11 @@ format: warning: false --- +![*Image created using "Bing Image Creator" with the prompt keywords "cows, sheep, indigenous cattle, running, dry savanna, river bed, traditional herdsman, nature photography, --ar 5:4 --style raw"*](images/running_livestock.jpeg){fig-align="center" width="80%" fig-alt="Pastoral livestock running through a dry savanna followed by a traditional herder on horseback"} + # Introduction -In February 2023, the government of Kenya described six counties as "disturbed" and "dangerous." This is because in the preceding six months, over 100 civilians and 16 police officers have lost their lives as criminals engaged in banditry and livestock theft have terrorized the rural villages. One of the main causes of the conflict is livestock theft, therefore, the goal of this analysis is to perform an exploratory data analysis of livestock numbers from the Kenya Population and Housing Census (2019) report. +In February 2023, the government of Kenya described six counties as "disturbed" and "dangerous." This is because in the preceding six months, over 100 civilians and 16 police officers have lost their lives, as criminals engaged in banditry have terrorized the rural villages. One of the main causes of the conflict is livestock theft, therefore, my goal is to perform an exploratory data analysis of livestock numbers from the Kenya Population and Housing Census (2019) report. References @@ -145,6 +147,8 @@ data("DataCatalogue") ## b) Load the livestock data +Here, pastoral livestock are defined as sheep, goats, and indigenous cattle. + ```{r} # Select the livestock data from the census report @@ -507,8 +511,8 @@ livestock_area_subcounty %>% # Section 5: Conclusion -The goal of this analysis was to look at pastoral livestock distributions.... +In this post, I have assessed the pastoral livestock (indigenous cattle, sheep, and goats) populations in the Kenyan counties described as "disturbed and dangerous." A major contributor to this classification is the banditry, livestock theft, and limited amounts of pasture and water available to livestock owners. To get a better sense of the number of households engaged in farming and the pastoral livestock populations in these counties, I performed an exploratory data analysis and visualized my results. -Show the reader how to plot.... +Key findings from the study were that: 1. Turkana and Samburu had some of the highest numbers of pastoral livestock, yet they had the fewest numbers of households engaged in farming. This meant that both these counties had the highest average livestock to farming household ratios in the region (Turkana: 55 per household and Samburu: 36 per household). 2. The top three (3) subcounties with the highest average ratios of pastoral livestock to farming households were in Turkana. They included Turkana East (126), Kibish (96), and Turkana West (56). Surprisingly, counties such as Keiyo North (4), Koibatek (4), and Nyahururu (4) had very low ratios of pastoral livestock to farming household ratios despite having relatively high numbers of farming households. This may have resulted from the unsuitability of the land for grazing animals, small average land sizes per farming household, a switch to exotic dairy and beef livestock, or simply, a preference for crop, rather than livestock farming. -Conclude that... \ No newline at end of file +In the next analysis, I will assess the livestock numbers per area (square kilometers), the numbers of indigenous cattle, sheep, and goats in every county and subcounty, as well as the other animal husbandry and/or crop farming activities that take place in the region. The reader is encouraged to use this code and data package (rKenyaCensus) to come up with their own analyses to share with the world. diff --git a/posts/code_along_with_me/danger_disturb/images/code_along_with_me_cover.png b/posts/code_along_with_me/danger_disturb/images/code_along_with_me_cover.png new file mode 100644 index 0000000..e40dae7 Binary files /dev/null and b/posts/code_along_with_me/danger_disturb/images/code_along_with_me_cover.png differ diff --git a/posts/code_along_with_me/danger_disturb/images/running_livestock.jpeg b/posts/code_along_with_me/danger_disturb/images/running_livestock.jpeg new file mode 100644 index 0000000..b3530f6 Binary files /dev/null and b/posts/code_along_with_me/danger_disturb/images/running_livestock.jpeg differ diff --git a/posts/data_stories/asbestos_roof_kenya/asbestos_roof_kenya.qmd b/posts/data_stories/asbestos_roof_kenya/asbestos_roof_kenya.qmd new file mode 100644 index 0000000..bda9e1a --- /dev/null +++ b/posts/data_stories/asbestos_roof_kenya/asbestos_roof_kenya.qmd @@ -0,0 +1,68 @@ +--- +title: "A roof over your head" +subtitle: "The prevalence of asbestos-based roofing in Kenya and its potential effects" +author: "William Okech" +date: "2022-12-01" +image: "data_story_headline.png" +categories: [RStudio, R, Data Stories, Blog, Kenya Census] +format: html +toc: false +draft: false +warning: false +--- + +# Introduction + +An aerial view of most settlements in Kenya will demonstrate that many residential home roofs are constructed using iron sheets. + +Indeed, this is confirmed by the Kenya Population and Housing Census (2019) report [^1] [^2] where we see that 4 out of every 5 households (total number = 12,043,016) in Kenya is roofed using iron sheets (Figure 1). Overall, the top 5 building materials are iron sheets (80.3%), concrete (8.2%), grass/twigs (5.1%), makuti (sun-dried coconut palm leaves; 1.6%), and asbestos (1.4%). Despite the widespread use of iron sheets, it is surprising to note that 1.4% (2.2% urban and 0.9% rural) of residential household roofs (which is approximately 170,000) are covered with asbestos-based roofing materials (NB: this figure does not include public buildings such as educational institutions and government facilities). + +[^1]: Kenya National Bureau of Statistics. The 2019 Kenya Population and Housing Census. Volume I: Population by County and Sub-County and Volume III: Distribution of Population by Age and Sex. + +[^2]: Shelmith Kariuki (2020). rKenyaCensus: 2019 Kenya Population and Housing Census Results. R package version 0.0.2. + +![](images/national_treemap.png){fig-align="center" width="90%"} + +Figure 1: Roof types in Kenya (visualizations generated in RStudio) + +# Asbestos and its potential risks + +Asbestos refers to a class of six minerals that naturally form a bundle of fibers. These fibers have many properties that make them attractive, including a lack of electrical conductivity, and chemical, heat, and fire resistance. Historically, asbestos has been used for various commercial and industrial applications, including roofing shingles, automobile brakes, and textured paints for walls and ceilings [^3]. However, using asbestos for products that come into regular contact with humans is quite problematic. Why? Asbestos is a known human carcinogen, and the primary risk factor for most mesotheliomas is asbestos exposure [^4] [^5]. Furthermore, asbestos exposure (depending on the frequency, amount, and type) can cause asbestosis, pleural disease, and cancer. If asbestos-based materials remain intact, there is minimal risk to the user, but if materials are damaged via natural degradation or during home demolition and remodeling, tiny asbestos fibers will be released into the air [^6] [^7]. In Kenya, Legal Notice No. 121 of the Environmental Management and Coordination (Waste Management) Regulations (2006)[^8] states that waste containing asbestos is classified as hazardous. Why should Kenyans be concerned about this? In the 2013/2014 financial year, Kenya spent approximately one-tenth of its total health budget on asbestos-related cancers [^9] [^10] [^11]. + +[^3]: Learn About Asbestos (no date) EPA. Environmental Protection Agency. Available at: https://www.epa.gov/asbestos/learn-about-asbestos (Accessed: December 1, 2022). + +[^4]: Asbestos exposure and cancer risk fact sheet (no date) National Cancer Institute. Available at: https://www.cancer.gov/about-cancer/causes-prevention/risk/substances/asbestos/asbestos-fact-sheet (Accessed: December 1, 2022). + +[^5]: Asbestos (no date) World Health Organization. World Health Organization. Available at: https://www.iarc.who.int/risk-factor/asbestos/ (Accessed: December 1, 2022). + +[^6]: Learn About Asbestos (no date) EPA. Environmental Protection Agency. Available at: https://www.epa.gov/asbestos/learn-about-asbestos (Accessed: December 1, 2022). + +[^7]: Asbestos and your health (2016) Centers for Disease Control and Prevention. Centers for Disease Control and Prevention. Available at: https://www.atsdr.cdc.gov/asbestos/index.html (Accessed: December 1, 2022). + +[^8]: Act Title: ENVIRONMENTAL MANAGEMENT AND CO-ORDINATION (no date) No. 8 of 1999. Available at: http://kenyalaw.org:8181/exist/kenyalex/sublegview.xql?subleg=No.+8+of+1999 (Accessed: December 1, 2022). + +[^9]: Okoth, D. (2013) Slow transition from use of asbestos raises concern as cancer cases rise, The Standard. Available at: https://www.standardmedia.co.ke/lifestyle/article/2000096118/slow-transition-from-use-of-asbestos-raises-concern-as-cancer-cases-rise (Accessed: December 1, 2022). + +[^10]: GCR, S. (2016) Kenya faces cancer epidemic caused by asbestos roofs, Global Construction Review. Available at: https://www.globalconstructionreview.com/kenya-faces-cancer-epid7emic-caus7ed-asbe7stos/ (Accessed: December 1, 2022). + +[^11]: Irungu, S. (2020) Exposure to the noxious asbestos needs to be alleviated with a lot of care, Kenya Climate Innovation Center (KCIC). Available at: https://www.kenyacic.org/2019/11/exposure-to-the-noxious-asbestos-needs-to-be-alleviated-with-a-lot-of-care/ (Accessed: December 1, 2022). + +Where do we find high numbers of asbestos-based roofs in Kenya? As previously stated, 1.4% of households in Kenya have asbestos-based roofs. Figure 2 demonstrates the percentage of households with asbestos-based roofs in every county in Kenya. Interestingly, 4 (Nairobi, Kajiado, Machakos, and Kiambu) out of the top 6 counties (from a total of 47) fall within the Nairobi Metropolitan region. + +# Where do we find high numbers of asbestos-based roofs in Kenya? + +As previously stated, 1.4% of households in Kenya have asbestos-based roofs. Figure 2 demonstrates the percentage of households with asbestos-based roofs in every county in Kenya. Interestingly, 4 (Nairobi, Kajiado, Machakos, and Kiambu) out of the top 6 counties (from a total of 47) fall within the Nairobi Metropolitan region. + +![](images/all_counties_asbestos_barplot_map.png){fig-align="center" width="90%"} + +Figure 2: Percentage(%) of households with asbestos-based roofs distributed by county (visualizations generated using RStudio) + +Next, I investigated the subcounties with the highest number of households with asbestos-based roofs. The top 5 subcounties are located within Nairobi county, with Embakasi subcounty taking the lead with just over 8,000 households. + +![](images/top_households_asbestos_raw.png){fig-align="center" width="90%"} + +Figure 3: The top ten subcounties with the highest number of households that have asbestos-based roofs (visualizations generated using RStudio) + +# Conclusion + +Overall, this study demonstrates that a notable proportion of Kenyan households used asbestos-based roofing materials, with Nairobi Metropolitan county accounting for the largest number of households. It is widely acknowledged that asbestos is harmful to our health, and asbestos-related diseases impose a significant burden on the economy. However, the impact of these roofs on the health of residents may not be fully apparent as asbestos exposure may also occur in various settings such as educational facilities and government institutions. To lessen the impact of asbestos exposure, it would be beneficial for local/county governments to educate residents about the dangers of asbestos and facilitate the complex and costly removal of asbestos-based roofing materials. diff --git a/posts/data_stories/asbestos_roof_kenya/data_story_2.R b/posts/data_stories/asbestos_roof_kenya/data_story_2.R new file mode 100644 index 0000000..4f94f0f --- /dev/null +++ b/posts/data_stories/asbestos_roof_kenya/data_story_2.R @@ -0,0 +1,381 @@ +# A roof over your head +# By @willyokech +# Data: rKenyaCensus + +# 1) National + +#1) Load the required packages + +#install.packages("devtools") +#devtools::install_github("Shelmith-Kariuki/rKenyaCensus") +library(rKenyaCensus) # Contains the 2019 Kenya Census data +library(tidyverse) +library(janitor) + +# 2) View the data available in the data catalogue + +data("DataCatalogue") + +# 3) Load the required data + +df_roof <- V4_T2.12 + +View(df_roof) + +# Table 1 for National Analysis +table_1 <- df_roof[1:3,] +View(table_1) + +glimpse(table_1) + +table_1 <- table_1 %>% + clean_names() + +table_1_select <- table_1 %>% + select(-c(conventional_households, admin_area, not_stated)) + +View(table_1_select) +glimpse(table_1_select) + + +# 5) ggplot2 visualization + +# Treemap + +#install.packages("treemapify") +library(treemapify) + +table_1_select_tidy <- table_1_select %>% + pivot_longer(c(grass_twigs:shingles), + names_to = "roof_type", values_to = "percentage") %>% + mutate(roof_type = ifelse(roof_type == "grass_twigs", "Grass/Twigs", + ifelse(roof_type == "makuti_thatch", "Makuti", + ifelse(roof_type == "dung_mud", "Dung/Mud", + ifelse(roof_type == "ironsheets", "Iron Sheets", + ifelse(roof_type == "tincans", "Tin Cans", + ifelse(roof_type == "asbestos_sheets", "Asbestos ", + ifelse(roof_type == "concrete_cement", "Concrete", + ifelse(roof_type == "tiles", "Tiles", + ifelse(roof_type == "canvas_tents", "Canvas/Tents", + ifelse(roof_type == "decra_versatile", "Decra", + ifelse(roof_type == "nylon_cartons_cardboard", "Nyl_Cart_Card", + ifelse(roof_type == "shingles", "Shingles", roof_type))))))))))))) + +table_1_select_tidy + +table_1_select_tidy$roof_type + +# National + +table_1_select_tidy_national <- table_1_select_tidy %>% + filter(sub_county == "KENYA") + + +# Remember that treemap is not a ggplot + +ggplot(table_1_select_tidy_national, + aes(area = percentage, fill = roof_type, + label = roof_type)) + + geom_treemap() + + labs(caption = "Visualization @willyokech | Source:rKenyaCensus") + + labs(fill = "Roof Type") + + theme(legend.position = "bottom") + + geom_treemap_text(colour = "black", + place = "centre", + size = 10, + grow = TRUE) + + scale_fill_brewer(palette = "Spectral") + +# 2 County + +#1) Load the required packages + +#install.packages("devtools") +#devtools::install_github("Shelmith-Kariuki/rKenyaCensus") +library(rKenyaCensus) # Contains the 2019 Kenya Census data +library(tidyverse) +library(janitor) + +# 2) View the data available in the data catalogue + +data("DataCatalogue") + +# 3) Load the required data + +df_roof <- V4_T2.12 + +View(df_roof) + +# Table 1 for County and Subcounty Analysis +table_1 <- df_roof[4:395,] +View(table_1) + +glimpse(table_1) + +table_1 <- table_1 %>% + clean_names() + +table_1 + +table_1_select <- table_1 %>% + select(c(county, sub_county, admin_area, asbestos_sheets)) + +View(table_1_select) +glimpse(table_1_select) + + +table_1_select_county <- table_1_select %>% + filter(admin_area == "County") + +table_1_select_subcounty <- table_1_select %>% + filter(admin_area == "SubCounty") + + +# 5) Load the packages required for the maps + +#install.packages("sf") +library(sf) # simple features + +#install.packages("tmap") #Thematic maps +library(tmap) + +#install.packages("leaflet") # Used for creating interactive maps +library(leaflet) + +# Load the shapefiles that are downloaded from online source +KenyaSHP <- read_sf("posts/series_2/new_post_2/kenyan-counties/County.shp", quiet = TRUE, stringsAsFactors = FALSE,as_tibble = TRUE) + + +# To easily view the shapefile in RStudio View pane, you can drop the geometry column and view the rest of the data. + +View(KenyaSHP %>% st_drop_geometry()) + +# Shapefile Data Inspection + +print(KenyaSHP[5:9], n = 6) + +colnames(KenyaSHP) + +class(KenyaSHP) + +# Look at the variable data types + +glimpse(KenyaSHP) + +# View the geometry column + +KenyaSHP_geometry <- st_geometry(KenyaSHP) + +### View one geometry entry +KenyaSHP_geometry[[1]] + +# View the classes of the geometry columns + +class(KenyaSHP_geometry) #sfc, the list-column with the geometries for each feature + +class(KenyaSHP_geometry[[1]]) #sfg, the feature geometry of an individual simple feature + + +# Change the projection of the shapefiles (if necessary) + +KenyaSHP <- st_transform(KenyaSHP, crs = 4326) + +### Inspect the co-ordinate reference system +st_crs(KenyaSHP) + + +# 6) Clean the data, so that the counties match those in the shapefile + +### Inspect the county names in the asbestoss dataset +table_1_select_county_unique <- unique(table_1_select_county$county) +table_1_select_county_unique + +### Inspect the county names of the shape file +counties_KenyaSHP <- KenyaSHP %>% + st_drop_geometry() %>% + select(COUNTY) %>% + pull() %>% + unique() + +counties_KenyaSHP + +### Convert the table_1_select_county county names to title case +table_1_select_county <- table_1_select_county %>% + ungroup() %>% + mutate(county = tools::toTitleCase(tolower(county))) + +### Inspect the county names of the asbestos data again +table_1_select_county_unique <- unique(table_1_select_county$county) + + +### Inspect the county names that are different in each of the datasets +unique(table_1_select_county$county)[which(!unique(table_1_select_county$county) %in% counties_KenyaSHP)] + + +table_1_select_county <- table_1_select_county %>% + mutate(county = ifelse(county == "Taita/Taveta", "Taita Taveta", + ifelse(county == "Tharaka-Nithi", "Tharaka", + ifelse(county == "Elgeyo/Marakwet", "Keiyo-Marakwet", + ifelse(county == "Nairobi City", "Nairobi", county))))) + +# Check again for unique datasets +unique(table_1_select_county$county)[which(!unique(table_1_select_county$county) %in% counties_KenyaSHP)] + +# 7) Join the shapefile and the data + +### Rename the COUNTY variable, to match the variable name in the shapefile data +table_1_select_county <- table_1_select_county %>% + rename(COUNTY = county) + +### Ensure that there are no leading or trailing spaces in the county variable +KenyaSHP$COUNTY <- trimws(KenyaSHP$COUNTY) +table_1_select_county$COUNTY <- trimws(table_1_select_county$COUNTY) + +### Merge the data +merged_df <- left_join(KenyaSHP, table_1_select_county, by = "COUNTY") + +### Sort the data so that the County variable appears first +merged_df <- merged_df %>% + select(COUNTY, everything()) + + +# 8) Inspect the merged data + +# View the data +View(merged_df) +View(merged_df %>% st_drop_geometry()) + +### Class of the merged data +class(merged_df) + +### Column names +colnames(merged_df) + +# Glimpse +glimpse(merged_df) + + + +# 9) Visualize the data + +#install.packages("ggbreak") +library(ggbreak) + +library(patchwork) + +barplot <- table_1_select_county %>% + ggplot(aes(x = reorder(COUNTY, asbestos_sheets), y = asbestos_sheets, fill = asbestos_sheets)) + + geom_bar(stat = "identity", width = 0.5) + + coord_flip() + + scale_fill_gradient(low = "darkred", high = "yellow") + + theme_classic()+ + labs(x = "County", + y = "Percentage(%) of households with asbestos-based roofs", + title = "", + subtitle = "", + caption = "", + fill = "Percentage (%)\nof households")+ + theme(axis.title.x =element_text(size = 14), + axis.title.y =element_text(size = 14), + plot.title = element_text(family = "URW Palladio L, Italic",size = 16, hjust = 0.5), + plot.subtitle = element_text(family = "URW Palladio L, Italic",size = 10, hjust = 0.5), + legend.title = element_text("Helvetica",size = 8, vjust = 1), + plot.caption = element_text(family = "URW Palladio L, Italic",size = 12), + panel.background = element_rect(fill = "white", colour = "white")) + + geom_hline(yintercept = 1.4, color = "black") + + geom_text(aes(x = 20 , y = 1.4, label = "Average (Kenya)"), + size = 3, + angle=90, vjust = 1.5) + + geom_hline(yintercept = 0.9, linetype = "dashed", color = "black") + + geom_text(aes(x = 20 , y = 0.9, label = "Average (Rural)"), + size = 3, + angle=90, vjust = 1.5) + + geom_hline(yintercept = 2.2, linetype = "dashed", color = "black") + + geom_text(aes(x = 20 , y = 2.2, label = "Average (Urban)"), + size = 3, + angle=90, vjust = 1.5) + +barplot + + +# Plot a base plot / map. + +plot(KenyaSHP$geometry, lty = 5, col = "green") + + +# ggplot2() + +# Legend in map is silenced because the bar graph has one + +map <- ggplot(data = merged_df)+ + geom_sf(aes(geometry = geometry, fill = asbestos_sheets))+ + theme_void()+ + labs(title = "", + caption = "Visualization @willyokech | Source:rKenyaCensus", + fill = "")+ + theme(plot.title = element_text(family = "URW Palladio L, Italic",size = 16, hjust = 0.5), + legend.title = element_blank(), + legend.position = "none", + plot.caption = element_text(family = "URW Palladio L, Italic",size = 12))+ + scale_fill_gradient(low = "darkred", high = "yellow") + +map + +# Save the plot + +barplot + map + + +# 3) Subcounty + +#1) Load the required packages + +#install.packages("devtools") +#devtools::install_github("Shelmith-Kariuki/rKenyaCensus") +library(rKenyaCensus) # Contains the 2019 Kenya Census data +library(tidyverse) + +# 2) View the data available in the data catalogue + +data("DataCatalogue") + +# 3) Load the required data + +df_1 <- V4_T2.12 +View(df_1) + +# 4) Preliminary filtering and cleanup + +# Subcounty table +table_2_sc <- df_1[4:395,] %>% + filter(AdminArea != "County") + +# Remove unnecessary columns +table_2_sc_new <- table_2_sc %>% + select(County, SubCounty, AsbestosSheets, ConventionalHouseholds) + +top_subcounty_raw <- table_2_sc_new %>% + unite(col = "county_sub", c("SubCounty", "County"), sep = ", ", remove = TRUE) %>% + mutate(AffectedHouseholds = round((AsbestosSheets/100) * ConventionalHouseholds)) %>% + arrange(desc(AffectedHouseholds)) %>% + slice(1:10) + +View(top_subcounty_raw) + +top_subcounty_raw_plot <- top_subcounty_raw %>% + ggplot(aes(x = reorder(county_sub, AffectedHouseholds), y = AffectedHouseholds)) + + geom_bar(stat = "identity", width = 0.5, fill = "darkorange") + + coord_flip() + + theme_classic()+ + labs(x = "Subcounty", + y = "Number of households with asbestos-based roofs", + title = "", + caption = "Visualization @willyokech | Source:rKenyaCensus") + + theme(axis.title.x =element_text(size = 15), + axis.title.y =element_text(size = 15), + plot.title = element_text(family = "URW Palladio L, Italic",size = 16, hjust = 0.5), + plot.subtitle = element_text(family = "URW Palladio L, Italic",size = 10, hjust = 0.5), + legend.title = element_text("URW Palladio L, Italic",size = 8, vjust = 1), + plot.caption = element_text(family = "URW Palladio L, Italic",size = 12)) + +top_subcounty_raw_plot diff --git a/posts/data_stories/asbestos_roof_kenya/data_story_headline.png b/posts/data_stories/asbestos_roof_kenya/data_story_headline.png new file mode 100644 index 0000000..56736c7 Binary files /dev/null and b/posts/data_stories/asbestos_roof_kenya/data_story_headline.png differ diff --git a/posts/data_stories/asbestos_roof_kenya/images/all_counties_asbestos_barplot_map.png b/posts/data_stories/asbestos_roof_kenya/images/all_counties_asbestos_barplot_map.png new file mode 100644 index 0000000..558316b Binary files /dev/null and b/posts/data_stories/asbestos_roof_kenya/images/all_counties_asbestos_barplot_map.png differ diff --git a/posts/data_stories/asbestos_roof_kenya/images/national_treemap.png b/posts/data_stories/asbestos_roof_kenya/images/national_treemap.png new file mode 100644 index 0000000..31e700d Binary files /dev/null and b/posts/data_stories/asbestos_roof_kenya/images/national_treemap.png differ diff --git a/posts/data_stories/asbestos_roof_kenya/images/top_households_asbestos_raw.png b/posts/data_stories/asbestos_roof_kenya/images/top_households_asbestos_raw.png new file mode 100644 index 0000000..3591aa2 Binary files /dev/null and b/posts/data_stories/asbestos_roof_kenya/images/top_households_asbestos_raw.png differ diff --git a/posts/series_2/new_post_1/kenyan-counties/County.dbf b/posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.dbf similarity index 100% rename from posts/series_2/new_post_1/kenyan-counties/County.dbf rename to posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.dbf diff --git a/posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.prj b/posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.prj new file mode 100644 index 0000000..f45cbad --- /dev/null +++ b/posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.prj @@ -0,0 +1 @@ +GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]] \ No newline at end of file diff --git a/posts/series_2/new_post_1/kenyan-counties/County.sbn b/posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.sbn similarity index 100% rename from posts/series_2/new_post_1/kenyan-counties/County.sbn rename to posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.sbn diff --git a/posts/series_2/new_post_1/kenyan-counties/County.sbx b/posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.sbx similarity index 100% rename from posts/series_2/new_post_1/kenyan-counties/County.sbx rename to posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.sbx diff --git a/posts/series_2/new_post_1/kenyan-counties/County.shp b/posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.shp similarity index 100% rename from posts/series_2/new_post_1/kenyan-counties/County.shp rename to posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.shp diff --git a/posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.shp.xml b/posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.shp.xml new file mode 100644 index 0000000..516a48d --- /dev/null +++ b/posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.shp.xml @@ -0,0 +1,2 @@ + +20111014225737001.0FALSEfile://\\ESRIEA-CLOUD\H\Esri\Professional Services\Solutions\OpenDataKenya\County\GIS\Share\county\v10\county_data0.gdbLocal Area Networkcounty002DefineProjection county GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]CopyFeatures D:\david\gisdata\Kecounty\GIS\Shp\county.shp D:\david\gisdata\Kecounty\GIS\Database\county.gdb\county # 0 0 0GeographicGCS_WGS_1984<GeographicCoordinateSystem xsi:type='typens:GeographicCoordinateSystem' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:xs='http://www.w3.org/2001/XMLSchema' xmlns:typens='http://www.esri.com/schemas/ArcGIS/10.0'><WKT>GEOGCS[&quot;GCS_WGS_1984&quot;,DATUM[&quot;D_WGS_1984&quot;,SPHEROID[&quot;WGS_1984&quot;,6378137.0,298.257223563]],PRIMEM[&quot;Greenwich&quot;,0.0],UNIT[&quot;Degree&quot;,0.0174532925199433],AUTHORITY[&quot;EPSG&quot;,4326]]</WKT><XOrigin>-399.99999999999989</XOrigin><YOrigin>-399.99999999999989</YOrigin><XYScale>1111948722.2222221</XYScale><ZOrigin>-100000</ZOrigin><ZScale>10000</ZScale><MOrigin>-100000</MOrigin><MScale>10000</MScale><XYTolerance>8.9831528411952133e-009</XYTolerance><ZTolerance>0.001</ZTolerance><MTolerance>0.001</MTolerance><HighPrecision>true</HighPrecision><LeftLongitude>-180</LeftLongitude><WKID>4326</WKID></GeographicCoordinateSystem>20110727105949002011072710594900ISO 19139 Metadata Implementation SpecificationMicrosoft Windows Server 2008 R2 Version 6.1 (Build 7601) Service Pack 1; ESRI ArcGIS 10.0.0.2414county002File Geodatabase Feature ClassdatasetEPSG7.4.1Simple4FALSE0TRUEFALSE0countyFeature Class0OBJECTIDOBJECTIDOID400Internal feature number.ESRISequential unique whole numbers that are automatically generated.ShapeShapeGeometry000Feature geometry.ESRICoordinates defining the features.AREAAREADouble800PERIMETERPERIMETERDouble800COUNTY3_COUNTY3_Double800COUNTY3_IDCOUNTY3_IDDouble800COUNTYCOUNTYString2000Shape_LengthShape_LengthDouble800Length of feature in internal units.ESRIPositive real numbers that are automatically generated.Shape_AreaShape_AreaDouble800Area of feature in internal units squared.ESRIPositive real numbers that are automatically generated.20110727 diff --git a/posts/series_2/new_post_1/kenyan-counties/County.shx b/posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.shx similarity index 100% rename from posts/series_2/new_post_1/kenyan-counties/County.shx rename to posts/data_stories/asbestos_roof_kenya/kenyan-counties/County.shx diff --git a/posts/data_stories/cas_kenya_viz/cas_kenya_viz.qmd b/posts/data_stories/cas_kenya_viz/cas_kenya_viz.qmd new file mode 100644 index 0000000..2b4d6ed --- /dev/null +++ b/posts/data_stories/cas_kenya_viz/cas_kenya_viz.qmd @@ -0,0 +1,75 @@ +--- +title: "Data Visualization for Government" +subtitle: "Visualizing HR data from the Chief Administrative Secretary recruitment process in Kenya" +author: "William Okech" +date: "2023-04-13" +image: "data_story_headline.png" +categories: [RStudio, R, Data Stories, Blog, Data Visualization] +format: html +toc: false +draft: false +warning: false +--- + +# Introduction + +In January 2018, former President Uhuru Kenyatta announced the creation of the Chief Administrative Secretary (CAS) position. The primary role of the appointees would be to "help the Cabinet Secretary to better coordinate the running of the affairs of the respective ministries [^1]." Though highly controversial at the time (as civil society groups [^2] challenged the legality of the position), the High Court (in February 2023) eventually ruled that the decision to create the new position was lawful [^3]. + +[^1]: PSCU (2018) Uhuru Kenyatta's Full Statement On New Cabinet. Available at: https://www.citizen.digital/news/uhuru-kenyattas-full-statement-on-new-cabinet-189331 (Accessed: April 2, 2023). + +[^2]: Betty Njeru (2018) Activist Omtatah moves to court challenging the creation of new cabinet positions. Available at: https://www.standardmedia.co.ke/kenya/article/2001267705/okiya-omtatah-moves-to-court-challenging-new-cabinet-positions (Accessed: April 6, 2023). + +[^3]: Correspondent (2023) Labour Court Clears Way For Appointment Of Chief Administrative Secretaries. Available at: https://www.capitalfm.co.ke/news/2023/02/labour-court-clears-the-way-for-appointment-of-chief-administrative-secretaries/ (Accessed: April 10, 2023). + +The Public Service Commission (PSC) of Kenya is responsible for government recruitment, and Section 234 of the Constitution of Kenya states that part of its mandate includes the "establishment of public offices" and "appointment of persons to those offices." The most recent call for applications to the CAS position was circulated in October 2022 [^4]. Here, the PSC listed the roles, responsibilities, and requirements for all applicants. To ensure transparency in the recruitment process, the PSC provided a list of all the applicants [^5], the shortlisted applicants \[5\], and a schedule of the interview times \[5\]. After the interview, the names of successful candidates were forwarded to the President for appointment. On March 16th, 2023, President William Ruto appointed fifty (50) individuals to the CAS position, which was more than double the number of positions created by the PSC. The legality of these appointments has already been challenged in court and we await the final verdict [^6] [^7]. + +[^4]: PSC (2023) CALL FOR APPLICATIONS TO THE POSITION OF CHIEF ADMINISTRATIVE SECRETARY IN THE PUBLIC SERVICE. Available at: https://www.publicservice.go.ke/index.php/media-center/2/202-call-for-applications-to-the-position-of-chief-administrative-secretary-in-the-public-service (Accessed: April 10, 2023). + +[^5]: PSC (2023) Shortlisted Candidates. Available at: https://www.publicservice.go.ke/index.php/recruitment/shortlisted-candidates (Accessed: April 10, 2023). + +[^6]: Susan Muhindi (2023) Ruto's 50 CAS nomination list challenged in court. Available at: https://www.the-star.co.ke/news/2023-03-20-rutos-50-cas-nomination-list-challenged-in-court/ (Accessed: April 2, 2023). + +[^7]: Emmanuel Wanjala (2023). Available at: https://www.the-star.co.ke/news/2023-03-17-ruto-has-created-27-illegal-cas-posts-lsks-eric-theuri/ (Accessed: April 5, 2023). + +Controversy aside, the goal of this article is to provide a data-driven assessment of publicly available recruitment information (specifically gender, disability status, and county of origin) to determine whether principles of diversity, equity, and inclusion (DEI) were promoted during the hiring process. + +# Summary of Findings + +1. Only 26% of the CAS appointments went to women, even though 33% of applicants and 36% of those shortlisted were female. This may not be in line with the "two-thirds gender rule" outlined in Article 27(8) of the Constitution of Kenya (2010) which states that "not more than two-thirds of the members of elective or appointive bodies shall be of the same gender." + +![](images/app_stl_nom_cas_gender.jpg){fig-align="center" width="90%"} + +2. Two percent (2%) of the nominees were Persons with Disabilities (PWDs), which is representative of the percentage of Kenyans (at the national level) living with a disability (2.2%) [^8]. However, institutions such as the National Gender and Equality Commission should develop programs to increase the number of applicants from marginalized groups. + +[^8]: + 8. Shelmith Kariuki (2020). rKenyaCensus: 2019 Kenya Population and Housing Census Results. R package version 0.0.2. + +![](images/pwd.jpg){fig-align="center" width="90%"} + +3. In total, there were 133 applicants from Busia County (2.5%). Despite this, it was the only county that did not have a representative on the shortlist. To remedy this, the PSC may need to put in place affirmative action policies that ensure that at least one applicant from each county makes it to the shortlist. + +![](images/stl_cas_county.jpg){fig-align="center" width="90%"} + +4. Nine counties are not represented in the final list of nominees. + +![](images/nom_county_equal_0.jpg){fig-align="center" width="90%"} + +5. Ten counties have more than one CAS nominee. + +![](images/nom_county_over_1.jpg){fig-align="center" width="90%"} + +# Conclusion + +Overall, this article provided a set of data visualizations to help understand the demographics of applicants for the CAS position in Kenya. Specifically, the gender, disability status, and county of origin were assessed at the applicant, shortlisting, and nomination stages. This study had several key findings: + +1. **Less than one-third of the nominees were women** even though 36% of the nominees shortlisted were female. +2. **Only one nominee was a person with a disability (PWD).** +3. **One county (Busia) failed to have any applicants make it to the shortlist** of 240 and nine counties in total had zero nominees out of 50. + +Future work will assess the data at a more granular level and determine the counties where there were low numbers of total applicants, low numbers of women at the application and shortlisting stage, and also determine which counties did not have PWD applicants. Additionally, it would be helpful if the PSC could provide age and education level data to help perform a more thorough analysis of the recruitment process. Lastly, it is commendable that the PSC makes this information publicly available, and I hope that further analysis of the data can result in initiatives to diversify the candidate pool at the applicant and shortlisting stages. + +# Fun Fact + +The most common applicant first names (male and female). + +![](images/all_cas_first_name.png){fig-align="center" width="90%"} diff --git a/posts/data_stories/cas_kenya_viz/data_story_3.R b/posts/data_stories/cas_kenya_viz/data_story_3.R new file mode 100644 index 0000000..a5cedb8 --- /dev/null +++ b/posts/data_stories/cas_kenya_viz/data_story_3.R @@ -0,0 +1 @@ +# Check the PSC Job Data Analysis project file \ No newline at end of file diff --git a/posts/data_stories/cas_kenya_viz/data_story_headline.png b/posts/data_stories/cas_kenya_viz/data_story_headline.png new file mode 100644 index 0000000..56736c7 Binary files /dev/null and b/posts/data_stories/cas_kenya_viz/data_story_headline.png differ diff --git a/posts/data_stories/cas_kenya_viz/images/all_cas_first_name.png b/posts/data_stories/cas_kenya_viz/images/all_cas_first_name.png new file mode 100644 index 0000000..d40e444 Binary files /dev/null and b/posts/data_stories/cas_kenya_viz/images/all_cas_first_name.png differ diff --git a/posts/data_stories/cas_kenya_viz/images/app_stl_nom_cas_gender.jpg b/posts/data_stories/cas_kenya_viz/images/app_stl_nom_cas_gender.jpg new file mode 100644 index 0000000..451c50a Binary files /dev/null and b/posts/data_stories/cas_kenya_viz/images/app_stl_nom_cas_gender.jpg differ diff --git a/posts/data_stories/cas_kenya_viz/images/nom_county_equal_0.jpg b/posts/data_stories/cas_kenya_viz/images/nom_county_equal_0.jpg new file mode 100644 index 0000000..d19ca92 Binary files /dev/null and b/posts/data_stories/cas_kenya_viz/images/nom_county_equal_0.jpg differ diff --git a/posts/data_stories/cas_kenya_viz/images/nom_county_over_1.jpg b/posts/data_stories/cas_kenya_viz/images/nom_county_over_1.jpg new file mode 100644 index 0000000..75ac23a Binary files /dev/null and b/posts/data_stories/cas_kenya_viz/images/nom_county_over_1.jpg differ diff --git a/posts/data_stories/cas_kenya_viz/images/pwd.jpg b/posts/data_stories/cas_kenya_viz/images/pwd.jpg new file mode 100644 index 0000000..7c190b2 Binary files /dev/null and b/posts/data_stories/cas_kenya_viz/images/pwd.jpg differ diff --git a/posts/data_stories/cas_kenya_viz/images/stl_cas_county.jpg b/posts/data_stories/cas_kenya_viz/images/stl_cas_county.jpg new file mode 100644 index 0000000..e6cb0a5 Binary files /dev/null and b/posts/data_stories/cas_kenya_viz/images/stl_cas_county.jpg differ diff --git a/posts/data_stories/kenya_gender_dist/data_story_1.R b/posts/data_stories/kenya_gender_dist/data_story_1.R new file mode 100644 index 0000000..e223dcb --- /dev/null +++ b/posts/data_stories/kenya_gender_dist/data_story_1.R @@ -0,0 +1,281 @@ +# Does Kenya have more men than women? +# By @willyokech +# Data: rKenyaCensus + +# 1. The national human sex ratio + +# Load the required libraries +library(rKenyaCensus) # Contains the 2019 Kenya Census data +library(tidyverse) +library(janitor) +library(scales) + +# Obtain the data required to plot the national human sex ratio +df_pop <- V1_T2.2 +df_pop_national <- df_pop[1,] +df_pop_national_tidy <- df_pop_national %>% +pivot_longer(!County, names_to = "Gender", values_to = "Number") +df_pop_national_tidy <- df_pop_national_tidy +df_pop_national_tidy <- df_pop_national_tidy[1:2,] + +# Plot of the national human sex ratio +ggplot(df_pop_national_tidy, aes(County, Number, fill = Gender)) + +geom_bar(position="stack", stat="identity", width = 2) + +coord_flip() + +theme_void() + +scale_fill_manual(values = c("yellow", "blue")) + +labs(x = "", +y = "", +title = "Figure 1: The number of males and females in Kenya", +subtitle = "", +caption = "Visualization: @willyokech | Source: rKenyaCensus", +fill = "") + +geom_text(aes(label=comma(Number)),color="black",size=10,position=position_stack(vjust=0.5)) + +theme(axis.title.x =element_text(size = 14), +axis.title.y =element_text(size = 14), +plot.title = element_text(family = "URW Palladio L, Italic",size = 16, hjust = 0.5), +plot.subtitle = element_text(family = "URW Palladio L, Italic",size = 10, hjust = 0.5), +legend.title = element_text("Helvetica",size = 8, vjust = 1), +legend.position = "right", +plot.caption = element_text(family = "URW Palladio L, Italic",size = 10), +panel.background = element_rect(fill = "white", colour = "white")) + +ggsave("posts/series_2/new_post_1/images/national_sex_ratio.png", width = 8, height = 2) + +# 2. The human sex ratio by age + +df_age_sex_ratio <- V3_T2.2 +df_age_sex_ratio_filter <- df_age_sex_ratio %>% +filter(!grepl('0-|5-|100|Total|Not', Age)) +df_age_sex_ratio_filter$Age <- as.numeric(as.character(df_age_sex_ratio_filter$Age)) +df_age_sex_ratio_filter_select <- df_age_sex_ratio_filter %>% +mutate(m_f_ratio = round(Male *100/Female)) %>% +select(Age, m_f_ratio) + +# Plot of the national human sex ratio by age +df_age_sex_ratio_plot <- df_age_sex_ratio_filter_select%>% +ggplot(aes(Age, m_f_ratio)) + +geom_line(color = "darkblue", size = 1.5) + +theme_classic() + +labs(x = "Age (yrs)", +y = "Number of males per 100 females", +title = "Figure 2: The human sex ratio as a function of age", +caption = "Visualization: @willyokech | Source: rKenyaCensus") + +theme(axis.title.x =element_text(size = 14), +axis.title.y =element_text(size = 14, vjust = 2), +axis.text.x =element_text(size = 14), +axis.text.y =element_text(size = 14), +plot.title = element_text(family = "URW Palladio L, Italic",size = 16, hjust = 0.5), +plot.subtitle = element_text(family = "URW Palladio L, Italic",size = 10, hjust = 0.5), +legend.title = element_text("URW Palladio L, Italic",size = 8, vjust = 1), +plot.caption = element_text(family = "URW Palladio L, Italic",size = 10)) + +geom_hline(yintercept = 100, linetype="dashed", color = "purple", size=0.5) + +geom_text(aes(x = 75 , y = 100, label = "Male:Female ratio = 1:1"), size = 4, angle=0, vjust = -1) +df_age_sex_ratio_plot + +ggsave("posts/series_2/new_post_1/images/age_sex_ratio.png", width = 8, height = 6) + + +# 3. The county human sex ratio + +# Obtain the required data +df_1 <- V1_T2.2 + +# Calculate the male:female ratio per 100 +df_1_ratio <- df_1 %>% +mutate(m_f_ratio = Male/Female, +m_f_ratio_100 = round(m_f_ratio*100, 0)) + +# Keep the County, Total, and ratio columns +df_1_ratio_only <- df_1_ratio %>% +select(County, m_f_ratio_100, Total) + +# Remove the "Total" row +df_1_ratio_only_county <- df_1_ratio %>% +select(County, m_f_ratio_100, Total) %>% +filter(County != "Total") + +# Load the packages required for the maps + +#install.packages("sf") +library(sf) # simple features +#install.packages("tmap") #Thematic maps +library(tmap) +#install.packages("leaflet") # Used for creating interactive maps +library(leaflet) +# Load the shapefiles that are downloaded from downloaded file +KenyaSHP <- read_sf("posts/series_2/new_post_1/kenyan-counties/County.shp", quiet = TRUE, stringsAsFactors = FALSE,as_tibble = TRUE) +# Change the projection of the shapefiles (if necessary) +KenyaSHP <- st_transform(KenyaSHP, crs = 4326) + +# Clean the data, so that the counties match those in the shapefile +# Inspect the county names in the male/female ratio dataset +df_1_ratio_only_county_unique <- unique(df_1_ratio_only_county$County) +df_1_ratio_only_county_unique +# Inspect the county names of the shape file +counties_KenyaSHP <- KenyaSHP %>% +st_drop_geometry() %>% +select(COUNTY) %>% +pull() %>% +unique() +# Convert the m_f_ratio county names to title case +df_1_ratio_only_county <- df_1_ratio_only_county %>% +ungroup() %>% +mutate(County = tools::toTitleCase(tolower(County))) +# Inspect the county names of the m_f_ratio data again +df_1_ratio_only_county_unique <- unique(df_1_ratio_only_county$County) +# Inspect the county names that are different in each of the datasets +unique(df_1_ratio_only_county$County)[which(!unique(df_1_ratio_only_county$County) %in% counties_KenyaSHP)] +df_1_ratio_only_county <- df_1_ratio_only_county %>% +mutate(County = ifelse(County == "Taita/Taveta", "Taita Taveta", +ifelse(County == "Tharaka-Nithi", "Tharaka", +ifelse(County == "Elgeyo/Marakwet", "Keiyo-Marakwet", +ifelse(County == "Nairobi City", "Nairobi", County))))) +# Check again for unique datasets +unique(df_1_ratio_only_county$County)[which(!unique(df_1_ratio_only_county$County) %in% counties_KenyaSHP)] + +# Join the shapefile and the data +# Rename the COUNTY variable, to match the variable name in the shapefile data +df_1_ratio_only_county <- df_1_ratio_only_county %>% +rename(COUNTY = County) +# Ensure that there are no leading or trailing spaces in the county variable +KenyaSHP$COUNTY <- trimws(KenyaSHP$COUNTY) +df_1_ratio_only_county$COUNTY <- trimws(df_1_ratio_only_county$COUNTY) +# Merge the data +merged_df <- left_join(KenyaSHP, df_1_ratio_only_county, by = "COUNTY") +# Sort the data so that the County variable appears first +merged_df <- merged_df %>% +select(COUNTY, everything()) + +# Visualize the data +#install.packages("ggbreak") +library(ggbreak) +library(patchwork) + +barplot <- df_1_ratio_only_county %>% +ggplot(aes(x = reorder(COUNTY, m_f_ratio_100), y = m_f_ratio_100, fill = m_f_ratio_100)) + +geom_bar(stat = "identity", width = 0.5) + +coord_flip() + +scale_fill_gradient(low = "blue", high = "yellow") + +scale_y_break(c(7.5, 80)) + +theme_classic()+ +labs(x = "County", +y = "Number of males per 100 females", +title = "", +subtitle = "", +caption = "", +fill = "Number of males\nper 100 females")+ +theme(axis.title.x =element_text(size = 14), +axis.title.y =element_text(size = 14), +axis.text.x =element_text(size = 12), +axis.text.y =element_text(size = 12), +plot.title = element_text(family = "URW Palladio L, Italic",size = 14, hjust = 0.5), +plot.subtitle = element_text(family = "URW Palladio L, Italic",size = 10, hjust = 0.5), +legend.title = element_text("URW Palladio L, Italic",size = 10, vjust = 1), +plot.caption = element_text(family = "URW Palladio L, Italic",size = 10), +panel.background = element_rect(fill = "white", colour = "white")) + +geom_hline(yintercept = 100, linetype="dashed", color = "purple", size=0.5) + +geom_text(aes(x = 14 , y = 100, label = "Male:Female ratio = 1:1"), size = 4, angle=90, vjust = 1.5) + +# Map +# Legend in map is silenced because the bar graph has one +map <- ggplot(data = merged_df)+ +geom_sf(aes(geometry = geometry, fill = m_f_ratio_100))+ +theme_void()+ +labs(title = "", +caption = "Visualization: @willyokech | Source: rKenyaCensus", +fill = "Number of males\nper 100 females")+ +theme(plot.title = element_text(family = "URW Palladio L, Italic",size = 20, hjust = 0.5), +legend.title = element_blank(), +legend.position = "none", +plot.caption = element_text(family = "URW Palladio L, Italic",size = 10))+ +scale_fill_gradient(low = "blue", high = "yellow") + +barplot + map + plot_annotation( +title = "Figure 3: The human sex ratio in different counties", +theme = theme(plot.title = element_text(size = 26, hjust = 0.5))) + +ggsave("posts/series_2/new_post_1/images/barplot_map.png", width = 12, height = 10) + +# 4. The subcounty human sex ratio + +# Load the required libraries +library(patchwork) +library(tidyverse) +library(ggbreak) + +# Obtain the required subcounty sex census data and clean +df_2 <- V1_T2.5 +df_2 + +# Calculate the male:female ratio per 100 +df_2_ratio <- df_2 %>% +mutate(m_f_ratio = Male/Female, +m_f_ratio_100 = round(m_f_ratio*100, 0)) + +# Remove the "Total" row and include the "Subcounty" +df_2_ratio_subcounty <- df_2_ratio %>% +filter(AdminArea == "SubCounty") %>% +select(County, SubCounty, m_f_ratio_100, Total) %>% +filter(County != "Total") + +# Find the top 10 subcounties +top_subcounty <- df_2_ratio_subcounty %>% +unite(col = "county_sub", c("SubCounty", "County"), sep = ", ", remove = TRUE) %>% +arrange(desc(m_f_ratio_100)) %>% +slice(1:20) + +top_subcounty_plot <- top_subcounty %>% +ggplot(aes(x = reorder(county_sub, m_f_ratio_100), y = m_f_ratio_100)) + +geom_bar(stat = "identity", width = 0.5, fill = "lightblue") + +coord_flip() + +scale_y_break(c(40, 80)) + +theme_classic()+ +labs(x = "Sub-county", +y = "Number of males per 100 females", +title = "Top 20 Sub-counties", +caption = "") + +theme(axis.title.x =element_text(size = 14), +axis.title.y =element_text(size = 14), +axis.text.x =element_text(size = 12), +axis.text.y =element_text(size = 12), +plot.title = element_text(family = "URW Palladio L, Italic",size = 14, hjust = 0.5), +plot.subtitle = element_text(family = "URW Palladio L, Italic",size = 10, hjust = 0.5), +legend.title = element_text("URW Palladio L, Italic",size = 8, vjust = 1), +plot.caption = element_text(family = "URW Palladio L, Italic",size = 10)) + +geom_hline(yintercept = 100, linetype="dashed", color = "purple", size=0.5) + +geom_text(aes(x = 5 , y = 100, label = "Male:Female ratio = 1:1"), size = 4, angle=90, vjust = 1.5) + +# Find the bottom 10 subcounties +bottom_subcounty <- df_2_ratio_subcounty %>% +unite(col = "county_sub", c("SubCounty", "County"), sep = ", ", remove = TRUE) %>% +arrange(m_f_ratio_100) %>% +slice(1:20) + +bottom_subcounty_plot <- bottom_subcounty %>% +ggplot(aes(x = reorder(county_sub, m_f_ratio_100), y = m_f_ratio_100)) + +geom_bar(stat = "identity", width = 0.5, fill = "blue") + +coord_flip() + +scale_y_break(c(40, 80)) + +theme_classic()+ +labs(x = "Sub-county", +y = "Number of males per 100 females", +title = "Bottom 20 Sub-counties", +caption = "Visualization: @willyokech | Source: rKenyaCensus") + +theme(axis.title.x =element_text(size = 14), +axis.title.y =element_text(size = 14), +axis.text.x =element_text(size = 12), +axis.text.y =element_text(size = 12), +plot.title = element_text(family = "URW Palladio L, Italic",size = 14, hjust = 0.5), +plot.subtitle = element_text(family = "URW Palladio L, Italic",size = 10, hjust = 0.5), +legend.title = element_text("URW Palladio L, Italic",size = 8, vjust = 1), +plot.caption = element_text(family = "URW Palladio L, Italic",size = 10)) + +geom_hline(yintercept = 100, linetype="dashed", color = "purple", size=0.5) + +geom_text(aes(x = 5 , y = 100, label = "Male:Female ratio = 1:1"), size = 4, angle=90, vjust = -1.5) + +top_subcounty_plot + bottom_subcounty_plot + plot_annotation( +title = "Figure 4: The human sex ratio at the sub-county level", +theme = theme(plot.title = element_text(size = 26, hjust = 0.5))) + +ggsave("posts/series_2/new_post_1/images/top_bottom_plot.png", width = 12, height = 8) + diff --git a/posts/data_stories/kenya_gender_dist/data_story_headline.png b/posts/data_stories/kenya_gender_dist/data_story_headline.png new file mode 100644 index 0000000..56736c7 Binary files /dev/null and b/posts/data_stories/kenya_gender_dist/data_story_headline.png differ diff --git a/posts/data_stories/kenya_gender_dist/images/age_sex_ratio_2.png b/posts/data_stories/kenya_gender_dist/images/age_sex_ratio_2.png new file mode 100644 index 0000000..3088314 Binary files /dev/null and b/posts/data_stories/kenya_gender_dist/images/age_sex_ratio_2.png differ diff --git a/posts/data_stories/kenya_gender_dist/images/barplot_map.png b/posts/data_stories/kenya_gender_dist/images/barplot_map.png new file mode 100644 index 0000000..750d559 Binary files /dev/null and b/posts/data_stories/kenya_gender_dist/images/barplot_map.png differ diff --git a/posts/data_stories/kenya_gender_dist/images/sex_ratio.png b/posts/data_stories/kenya_gender_dist/images/sex_ratio.png new file mode 100644 index 0000000..a78b0f0 Binary files /dev/null and b/posts/data_stories/kenya_gender_dist/images/sex_ratio.png differ diff --git a/posts/data_stories/kenya_gender_dist/images/top_bottom_plot.png b/posts/data_stories/kenya_gender_dist/images/top_bottom_plot.png new file mode 100644 index 0000000..9f3a6f8 Binary files /dev/null and b/posts/data_stories/kenya_gender_dist/images/top_bottom_plot.png differ diff --git a/posts/data_stories/kenya_gender_dist/kenya_gender_dist.qmd b/posts/data_stories/kenya_gender_dist/kenya_gender_dist.qmd new file mode 100644 index 0000000..1020132 --- /dev/null +++ b/posts/data_stories/kenya_gender_dist/kenya_gender_dist.qmd @@ -0,0 +1,62 @@ +--- +title: "Exploring gender distributions in Kenya: Are there more women than men?" +subtitle: "Insights from the Kenya Population and Housing Census (2019)" +author: "William Okech" +date: "2022-06-30" +image: "data_story_headline.png" +categories: [RStudio, R, Data Stories, Blog, Kenya Census] +format: html +toc: false +draft: false +warning: false +--- + +To answer this question, I reviewed the Kenya Population and Housing Census (2019)[^1][^2] report, which provides data on population by sex and age at the county and subcounty levels. This analysis was inspired by Rose Mintzer-Sweeney's article "Sex and the Census," published on the Datawrapper website [^3]. + +[^1]: Kenya National Bureau of Statistics. The 2019 Kenya Population and Housing Census. Volume I: Population by County and SubCounty and Volume III: Distribution of Population by Age and Sex. + +[^2]: Shelmith Kariuki (2020). rKenyaCensus: 2019 Kenya Population and Housing Census Results. R package version 0.0.2. + +[^3]: Rose Mintzer-Sweeney's article: https://blog.datawrapper.de/gender-ratio-american-history/ + +Various biological, cultural, public health, and economic factors can influence the global human sex ratio. For instance, at birth, the human sex ratio is "male-biased," with approximately 105 males born per 100 girls. However, with increasing age, the susceptibility to infectious diseases, sex-selective abortions, and higher life expectancies for women can cause fluctuations in the human sex ratio[^4]. The total Kenyan population in 2019 (according to the census) was 47,564,296. When I compared the number of males to females at the national level (Figure 1), I found that there were 98 males for every 100 females in the country[^5]. + +[^4]: Hannah Ritchie and Max Roser (2019) - "Gender Ratio." Published online at OurWorldInData.org. Retrieved from: 'https://ourworldindata.org/gender-ratio' \[Online Resource\] + +[^5]: Additionally, there were also 1,524 individuals classified as intersex, but their low numbers prevented their inclusion in the analysis. + +![](images/sex_ratio.png) + +**Figure 1: At the national level, there are more females compared with males** + +Knowing there were more females than males, I sought to determine whether these differences persisted across all age groups (Figure 2). + +![](images/age_sex_ratio_2.png) + +**Figure 2: There are more males than females between 0--19 yrs and between 36--58 yrs.** + +As expected, I observed a higher number of males compared with females between 0 to 18 years. One reason could be the higher male-to-female ratio seen at birth globally[^6]. Between the ages of 19 to 34 years, the male-to-female ratio decreases rapidly, while from 35 to 56 years, the ratio increases rapidly. The cause of this fluctuation is not apparent, but various public health factors may be responsible for the shifts observed within these age groups. Finally, the number of males compared with females steadily decreases after age 60. One reason for this could be the prevalence of medical conditions that disproportionately affect men. Additionally, the decrease in the number of males to females could result from increases in life expectancy favoring women, as demonstrated by the Economic Survey 2022, which shows that the male life expectancy is 60.6 vs. 66.5 for females[^7]. + +[^6]: Hannah Ritchie and Max Roser (2019) - "Gender Ratio." Published online at OurWorldInData.org. Retrieved from: 'https://ourworldindata.org/gender-ratio' \[Online Resource\] + +[^7]: Kenya National Bureau of Statistics. The Economic Survey 2022 + +By focusing on the national human sex ratio, we may assume that the male-to-female ratio across all the regions in Kenya is equal. Kenya has 47 diverse counties with different population densities, climatic conditions, economic opportunities, and levels of development. Not surprisingly, we find (Figure 3) that there is a wide range of human sex ratios (90--120 males per 100 females) across the different counties (administrative units). + +![](images/barplot_map.png) + +**Figure 3: Counties in the West of Kenya have higher female-to-male ratios, while counties in the North-East have higher male-female ratios.** + +The highest sex ratio is found in Garissa county (120 males per 100 females), and the lowest is observed in Siaya county (90 males per 100 females). Many counties with low sex ratios (more females) are primarily located in the west of Kenya, and counties with high sex ratios (more males) are found in the north of Kenya. According to the Economic Survey (2022) \[\^8\], male life expectancy in the west of Kenya is the lowest in the country. Homa Bay and Migori recorded a life expectancy of 50.5 years, which was approximately 10 years lower than that of females in the respective counties. This is against a difference of 3 to 5 years lower for males in some of the counties in the north of Kenya. + +Within each of Kenya's 47 counties are smaller administrative units known as subcounties. For the final analysis, I thought it would be interesting to see whether the patterns observed at the county level were consistent across the various subcounties. + +![](images/top_bottom_plot.png) + +**Figure 4: Balambala subcounty has the highest female-to-male ratio, while Mandera Central has the lowest.** + +Having just observed that counties in the north of Kenya had the highest number of males per 100 females, I was surprised to find that Mandera Central (Mandera County) and Tarbaj (Wajir County) subcounties in the north were among the subcounties with the lowest number of males per 100 females (Figure 4). Why females tend to concentrate within specific regions in these two counties may be an interesting aspect to investigate in future studies. + +Overall, many factors may affect the human sex ratio at the county and subcounty levels and cause the differences in the human sex ratio seen with age. High rural-urban migration, public health factors (including the prevalence of various communicable and non-communicable diseases), climate, and location's primary source of employment may skew the number of males to females in certain subcounties. Therefore, future investigations should focus on the causes of these variations in the human sex ratio and the implications for administrative planning at the national, county, and subcounty levels. + +## diff --git a/posts/series_2/new_post_2/kenyan-counties/County.dbf b/posts/data_stories/kenya_gender_dist/kenyan-counties/County.dbf similarity index 100% rename from posts/series_2/new_post_2/kenyan-counties/County.dbf rename to posts/data_stories/kenya_gender_dist/kenyan-counties/County.dbf diff --git a/posts/data_stories/kenya_gender_dist/kenyan-counties/County.prj b/posts/data_stories/kenya_gender_dist/kenyan-counties/County.prj new file mode 100644 index 0000000..f45cbad --- /dev/null +++ b/posts/data_stories/kenya_gender_dist/kenyan-counties/County.prj @@ -0,0 +1 @@ +GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]] \ No newline at end of file diff --git a/posts/series_2/new_post_2/kenyan-counties/County.sbn b/posts/data_stories/kenya_gender_dist/kenyan-counties/County.sbn similarity index 100% rename from posts/series_2/new_post_2/kenyan-counties/County.sbn rename to posts/data_stories/kenya_gender_dist/kenyan-counties/County.sbn diff --git a/posts/series_2/new_post_2/kenyan-counties/County.sbx b/posts/data_stories/kenya_gender_dist/kenyan-counties/County.sbx similarity index 100% rename from posts/series_2/new_post_2/kenyan-counties/County.sbx rename to posts/data_stories/kenya_gender_dist/kenyan-counties/County.sbx diff --git a/posts/series_2/new_post_2/kenyan-counties/County.shp b/posts/data_stories/kenya_gender_dist/kenyan-counties/County.shp similarity index 100% rename from posts/series_2/new_post_2/kenyan-counties/County.shp rename to posts/data_stories/kenya_gender_dist/kenyan-counties/County.shp diff --git a/posts/data_stories/kenya_gender_dist/kenyan-counties/County.shp.xml b/posts/data_stories/kenya_gender_dist/kenyan-counties/County.shp.xml new file mode 100644 index 0000000..516a48d --- /dev/null +++ b/posts/data_stories/kenya_gender_dist/kenyan-counties/County.shp.xml @@ -0,0 +1,2 @@ + +20111014225737001.0FALSEfile://\\ESRIEA-CLOUD\H\Esri\Professional Services\Solutions\OpenDataKenya\County\GIS\Share\county\v10\county_data0.gdbLocal Area Networkcounty002DefineProjection county GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]CopyFeatures D:\david\gisdata\Kecounty\GIS\Shp\county.shp D:\david\gisdata\Kecounty\GIS\Database\county.gdb\county # 0 0 0GeographicGCS_WGS_1984<GeographicCoordinateSystem xsi:type='typens:GeographicCoordinateSystem' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:xs='http://www.w3.org/2001/XMLSchema' xmlns:typens='http://www.esri.com/schemas/ArcGIS/10.0'><WKT>GEOGCS[&quot;GCS_WGS_1984&quot;,DATUM[&quot;D_WGS_1984&quot;,SPHEROID[&quot;WGS_1984&quot;,6378137.0,298.257223563]],PRIMEM[&quot;Greenwich&quot;,0.0],UNIT[&quot;Degree&quot;,0.0174532925199433],AUTHORITY[&quot;EPSG&quot;,4326]]</WKT><XOrigin>-399.99999999999989</XOrigin><YOrigin>-399.99999999999989</YOrigin><XYScale>1111948722.2222221</XYScale><ZOrigin>-100000</ZOrigin><ZScale>10000</ZScale><MOrigin>-100000</MOrigin><MScale>10000</MScale><XYTolerance>8.9831528411952133e-009</XYTolerance><ZTolerance>0.001</ZTolerance><MTolerance>0.001</MTolerance><HighPrecision>true</HighPrecision><LeftLongitude>-180</LeftLongitude><WKID>4326</WKID></GeographicCoordinateSystem>20110727105949002011072710594900ISO 19139 Metadata Implementation SpecificationMicrosoft Windows Server 2008 R2 Version 6.1 (Build 7601) Service Pack 1; ESRI ArcGIS 10.0.0.2414county002File Geodatabase Feature ClassdatasetEPSG7.4.1Simple4FALSE0TRUEFALSE0countyFeature Class0OBJECTIDOBJECTIDOID400Internal feature number.ESRISequential unique whole numbers that are automatically generated.ShapeShapeGeometry000Feature geometry.ESRICoordinates defining the features.AREAAREADouble800PERIMETERPERIMETERDouble800COUNTY3_COUNTY3_Double800COUNTY3_IDCOUNTY3_IDDouble800COUNTYCOUNTYString2000Shape_LengthShape_LengthDouble800Length of feature in internal units.ESRIPositive real numbers that are automatically generated.Shape_AreaShape_AreaDouble800Area of feature in internal units squared.ESRIPositive real numbers that are automatically generated.20110727 diff --git a/posts/series_2/new_post_2/kenyan-counties/County.shx b/posts/data_stories/kenya_gender_dist/kenyan-counties/County.shx similarity index 100% rename from posts/series_2/new_post_2/kenyan-counties/County.shx rename to posts/data_stories/kenya_gender_dist/kenyan-counties/County.shx diff --git a/posts/series_2/new_post_1/male-female-ratio.png b/posts/data_stories/kenya_gender_dist/male-female-ratio.png similarity index 100% rename from posts/series_2/new_post_1/male-female-ratio.png rename to posts/data_stories/kenya_gender_dist/male-female-ratio.png diff --git a/posts/r_rstudio_basics/arithmetic/arithmetic.qmd b/posts/r_rstudio_basics/arithmetic/arithmetic.qmd new file mode 100644 index 0000000..e25c587 --- /dev/null +++ b/posts/r_rstudio_basics/arithmetic/arithmetic.qmd @@ -0,0 +1,97 @@ +--- +title: "The Basics of R and RStudio" +subtitle: "Part 1: Simple Arithmetic" +author: "William Okech" +date: "2022-06-15" +image: "r_and_rstudio.png" +categories: [RStudio, R, Tutorial, Blog] +toc: true +draft: false +--- + +## Introduction + +This is the first in a series of blog posts looking at the basics of R and RStudio. These programs allow us to perform various basic and complex calculations. + +To get started, first, we will open R or RStudio. In R, go to the console, and in RStudio, head to the console pane. Next, type in a basic arithmetic calculation such as "1 + 1" after the angle bracket (\>) and hit "Enter." + +An example of a basic calculation: + +```{r} +1+1 +``` + +The output will be observed next to the square bracket containing the number 1 (\[1\]). + +![](r_console_1plus1.png){fig-align="center" width="90%"} + +Additionally, to include comments into the code block we use the hash (#) symbol. Anything written after the code block will be commented out and not run. + +```{r} +# A simple arithmetic calculation (which is not run because of the hash symbol) +1+1 +``` + +## Arithmetic operators available in R/RStudio + +Various arithmetic operators (listed below) can be used in R/RStudio. + +| Arithmetic Operator | Description | +|:-------------------:|:----------------------------------:| +| \+ | Addition | +| \- | Subtraction | +| \* | Multiplication | +| / | Division | +| \*\* or \^ | Exponentiation | +| %% | Modulus (remainder after division) | +| %/% | Integer division | + +## Examples + +### Addition + +```{r} +10+30 +``` + +### Subtraction + +```{r} +30-24 +``` + +### Multiplication + +```{r} +20*4 +``` + +### Division + +```{r} +93/4 +``` + +### Exponentiation + +```{r} +3^6 +``` + +### Modulus (remainder with division) + +```{r} +94%%5 +``` + +### Integer Division + +```{r} +54%/%7 +``` + +### Slightly more complex arithmetic operations + +```{r} +5-1+(4*3)/16*3 +``` diff --git a/posts/series_1/new_post_2/main_pic.png b/posts/r_rstudio_basics/arithmetic/main_pic.png similarity index 100% rename from posts/series_1/new_post_2/main_pic.png rename to posts/r_rstudio_basics/arithmetic/main_pic.png diff --git a/posts/r_rstudio_basics/arithmetic/r_and_rstudio.png b/posts/r_rstudio_basics/arithmetic/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/posts/r_rstudio_basics/arithmetic/r_and_rstudio.png differ diff --git a/posts/r_rstudio_basics/arithmetic/r_console_1plus1.png b/posts/r_rstudio_basics/arithmetic/r_console_1plus1.png new file mode 100644 index 0000000..b1ab85f Binary files /dev/null and b/posts/r_rstudio_basics/arithmetic/r_console_1plus1.png differ diff --git a/posts/r_rstudio_basics/data_structures/data_structures.qmd b/posts/r_rstudio_basics/data_structures/data_structures.qmd new file mode 100644 index 0000000..d5c4d70 --- /dev/null +++ b/posts/r_rstudio_basics/data_structures/data_structures.qmd @@ -0,0 +1,292 @@ +--- +title: "The Basics of R and RStudio" +subtitle: "Part 7: Data Structures" +author: "William Okech" +date: "2022-11-16" +image: "r_and_rstudio.png" +categories: [RStudio, R, Tutorial, Blog] +toc: true +draft: false +--- + +## Introduction + +Data structures in R are tools for storing and organizing multiple values. + +They help to organize stored data in a way that the data can be used more effectively. Data structures vary according to the number of dimensions and the data types (heterogeneous or homogeneous) contained. The primary data structures are: + +1. Vectors ([link](/posts/series_1/new_post_6/post_6.html)) + +2. Lists + +3. Data frames + +4. Matrices + +5. Arrays + +6. Factors + +## Data structures + +### 1. Vectors + +Discussed in a previous [post](/posts/series_1/new_post_6/post_6.html) + +### 2. Lists + +Lists are objects/containers that hold elements of the same or different types. They can containing strings, numbers, vectors, matrices, functions, or other lists. Lists are created with the `list()` function + +#### Examples + +#### a. Three element list + +```{r} +list_1 <- list(10, 30, 50) +``` + +#### b. Single element list + +```{r} +list_2 <- list(c(10, 30, 50)) +``` + +#### c. Three element list + +```{r} +list_3 <- list(1:3, c(50,40), 3:-5) +``` + +#### d. List with elements of different types + +```{r} +list_4 <- list(c("a", "b", "c"), 5:-1) +``` + +#### e. List which contains a list + +```{r} +list_5 <- list(c("a", "b", "c"), 5:-1, list_1) +``` + +#### f. Set names for the list elements + +```{r} +names(list_5) +names(list_5) <- c("character vector", "numeric vector", "list") +names(list_5) +``` + +#### g. Access elements + +```{r} +list_5[[1]] +list_5[["character vector"]] +``` + +#### h. Length of list + +```{r} +length(list_1) +length(list_5) +``` + +### 3. Data frames + +A data frame is one of the most common data objects used to store tabular data in R. Tabular data has rows representing observations and columns representing variables. Dataframes contain lists of equal-length vectors. Each column holds a different type of data, but within each column, the elements must be of the same type. The most common data frame characteristics are listed below: + +• Columns should have a name; + +• Row names should be unique; + +• Various data can be stored (such as numeric, factor, and character); + +• The individual columns should contain the same number of data items. + +### Creation of data frames + +```{r} +level <- c("Low", "Mid", "High") +language <- c("R", "RStudio", "Shiny") +age <- c(25, 36, 47) + +df_1 <- data.frame(level, language, age) +``` + +### Functions used to manipulate data frames + +#### a. Number of rows + +```{r} +nrow(df_1) +``` + +#### b. Number of columns + +```{r} +ncol(df_1) +``` + +#### c. Dimensions + +```{r} +dim(df_1) +``` + +#### d. Class of data frame + +```{r} +class(df_1) +``` + +#### e. Column names + +```{r} +colnames(df_1) +``` + +#### f. Row names + +```{r} +rownames(df_1) +``` + +#### g. Top and bottom values + +```{r} +head(df_1, n=2) +tail(df_1, n=2) +``` + +#### h. Access columns + +```{r} +df_1$level +``` + +#### i. Access individual elements + +```{r} + +df_1[3,2] +df_1[2, 1:2] + +``` + +#### j. Access columns with index + +```{r} +df_1[, 3] +df_1[, c("language")] +``` + +#### k. Access rows with index + +```{r} +df_1[2, ] +``` + +### 4. Matrices + +A matrix is a rectangular two-dimensional (2D) homogeneous data set containing rows and columns. It contains real numbers that are arranged in a fixed number of rows and columns. Matrices are generally used for various mathematical and statistical applications. + +#### a. Creation of matrices + +```{r} +m1 <- matrix(1:9, nrow = 3, ncol = 3) +m2 <- matrix(21:29, nrow = 3, ncol = 3) +m3 <- matrix(1:12, nrow = 2, ncol = 6) +``` + +#### b. Obtain the dimensions of the matrices + +```{r} +# m1 +nrow(m1) +ncol(m1) +dim(m1) + +# m3 +nrow(m3) +ncol(m3) +dim(m3) +``` + +#### c. Arithmetic with matrices + +```{r} +m1+m2 +m1-m2 +m1*m2 +m1/m2 +m1 == m2 +``` + +#### d. Matrix multiplication + +```{r} +m5 <- matrix(1:10, nrow = 5) +m6 <- matrix(43:34, nrow = 5) + +m5*m6 + +# m5%*%m6 will not work because of the dimesions. +# the vector m6 needs to be transposed. + +# Transpose +m5%*%t(m6) +``` + +#### e. Generate an identity matrix + +```{r} +diag(5) +``` + +#### f. Column and row names + +```{r} +colnames(m5) +rownames(m6) +``` + +### 5. Arrays + +An array is a multidimensional vector that stores homogeneous data. It can be thought of as a stacked matrix and stores data in more than 2 dimensions (n-dimensional). An array is composed of rows by columns by dimensions. Example: an array with dimensions, dim = c(2,3,3), has 2 rows, 3 columns, and 3 matrices. + +#### a. Creating arrays + +```{r} +arr_1 <- array(1:12, dim = c(2,3,2)) + +arr_1 + +``` + +#### b. Filter array by index + +```{r} +arr_1[1, , ] + +arr_1[1, ,1] + +arr_1[, , 1] +``` + +### 6. Factors + +Factors are used to store integers or strings which are categorical. They categorize data and store the data in different levels. This form of data storage is useful for statistical modeling. Examples include TRUE or FALSE and male or female. + +```{r} +vector <- c("Male", "Female") +factor_1 <- factor(vector) +factor_1 +``` + +OR + +```{r} +factor_2 <- as.factor(vector) +factor_2 +as.numeric(factor_2) +``` diff --git a/posts/r_rstudio_basics/data_structures/r_and_rstudio.png b/posts/r_rstudio_basics/data_structures/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/posts/r_rstudio_basics/data_structures/r_and_rstudio.png differ diff --git a/posts/series_1/new_post_4/blog_series_pt_4.png b/posts/r_rstudio_basics/data_types/blog_series_pt_4.png similarity index 100% rename from posts/series_1/new_post_4/blog_series_pt_4.png rename to posts/r_rstudio_basics/data_types/blog_series_pt_4.png diff --git a/posts/r_rstudio_basics/data_types/data_types.qmd b/posts/r_rstudio_basics/data_types/data_types.qmd new file mode 100644 index 0000000..a8cf867 --- /dev/null +++ b/posts/r_rstudio_basics/data_types/data_types.qmd @@ -0,0 +1,153 @@ +--- +title: "The Basics of R and RStudio" +subtitle: "Part 3: Data Types" +author: "William Okech" +date: "2022-06-23" +image: "r_and_rstudio.png" +categories: [RStudio, R, Tutorial, Blog] +toc: true +draft: false +--- + +## Introduction + +R and RStudio utilize multiple data types to store different kinds of data. + +The most common data types in R are listed below. + +| **Data Type** | **Description** | +|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Numeric | The most common data type. The values can be numbers or decimals (all real numbers). | +| Integer | Special case of numeric data without decimals. | +| Logical | Boolean data type with only 2 values (`TRUE` or `FALSE`). | +| Complex | Specifies imaginary values in R. | +| Character | Assigns a character or string to a variable. The character variables are enclosed in single quotes ('character') while the string variables are enclosed in double quotes ("string"). | +| Factor | Special type of character variable that represents a categorical such as gender. | +| Raw | Specifies values as raw bytes. It uses built-in functions to convert between raw and character (charToRaw() or rawToChar()). | +| Dates | Specifies the date variable. Date stores a date and POSIXct stores a date and time. The output is indicated as the number of days (Date) or number of seconds (POSIXct) since 01/01/1970. | + +## Data types + +### 1. Numeric + +```{r} +89.98 + +55 +``` + +### 2. Integer + +```{r} +5L + +5768L +``` + +### 3. Logical + +```{r} +TRUE + +FALSE +``` + +### 4. Complex + +```{r} +10 + 30i + +287 + 34i +``` + +### 5. Character or String + +```{r} +'abc' + +"def" + +"I like learning R" +``` + +### 6. Dates + +```{r} +"2022-06-23 14:39:21 EAT" + +"2022-06-23" +``` + +## Examining various data types + +Several functions exist to examine the features of the various data types. These include: + +1. `typeof()` -- what is the data type of the object (low-level)? +2. `class()` -- what is the data type of the object (high-level)? +3. `length()` -- how long is the object? +4. `attributes()` -- any metadata available? + +Let's look at how these functions work with a few examples + +```{r} +a <- 45.84 +b <- 858L +c <- TRUE +d <- 89 + 34i +e <- 'abc' +``` + +### 1. Examine the data type at a low-level with `typeof()` + +```{r} +typeof(a) +typeof(b) +typeof(c) +typeof(d) +typeof(e) +``` + +### 2. Examine the data type at a high-level with `class()` + +```{r} +class(a) +class(b) +class(c) +class(d) +class(e) +``` + +### 3. Use the `is.____()` functions to determine the data type + +To test whether the variable is of a specific type, we can use the `is.____()` functions. + +First, we test the variable `a` which is numeric. + +```{r} +is.numeric(a) +is.integer(a) +is.logical(a) +is.character(a) +``` + +Second, we test the variable `c` which is logical. + +```{r} +is.numeric(c) +is.integer(c) +is.logical(c) +is.character(c) +``` + +## Converting between various data types + +To convert between data types we can use the `as.____()` functions. These include: `as.Date()`, `as.numeric()`, and `as.factor()`. Additionally, other helpful functions include factor() which adds levels to the data and `nchar()` which provides the length of the data. + +### Examples + +```{r} +as.integer(a) +as.logical(0) +as.logical(1) +nchar(e) +``` diff --git a/posts/r_rstudio_basics/data_types/r_and_rstudio.png b/posts/r_rstudio_basics/data_types/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/posts/r_rstudio_basics/data_types/r_and_rstudio.png differ diff --git a/posts/r_rstudio_basics/missing_data/missing_data.qmd b/posts/r_rstudio_basics/missing_data/missing_data.qmd new file mode 100644 index 0000000..8b7d4c3 --- /dev/null +++ b/posts/r_rstudio_basics/missing_data/missing_data.qmd @@ -0,0 +1,32 @@ +--- +title: "The Basics of R and RStudio" +subtitle: "Part 6: Missing Data" +author: "William Okech" +date: "2022-11-14" +image: "r_and_rstudio.png" +categories: [RStudio, R, Tutorial, Blog] +toc: true +draft: false +--- + +## Introduction + +R has two types of missing data, NA and NULL.[^1] + +[^1]: Adapted from Lander, J. P. (2014) R for everyone: Advanced analytics and graphics. Addison-Wesley. + +## NA + +R uses `NA` to represent missing data. The `NA` appears as another element of a vector. To test each element for missingness we use `is.na()`. Generally, we can use tools such as `mi`, `mice`, and `Amelia` (which will be discussed later) to deal with missing data. The deletion of this missing data may lead to bias or data loss, so we need to be very careful when handling it. In subsequent blog posts, we will look at the use of imputation to deal with missing data. + +## NULL + +`NULL` represents nothingness or the "absence of anything". [^2] + +[^2]: Adapted from Lander, J. P. (2014) R for everyone: Advanced analytics and graphics. Addison-Wesley. + +It does not mean missing but represents nothing. `NULL` cannot exist within a vector because it disappears. + +## Supplementary Reading + +1. An excellent post from the blog ["Data Science by Design"](https://datasciencebydesign.org/blog/when-we-miss-missingness) on the role of missingness. diff --git a/posts/r_rstudio_basics/missing_data/r_and_rstudio.png b/posts/r_rstudio_basics/missing_data/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/posts/r_rstudio_basics/missing_data/r_and_rstudio.png differ diff --git a/posts/r_rstudio_basics/operators/operators.qmd b/posts/r_rstudio_basics/operators/operators.qmd new file mode 100644 index 0000000..4bb2cc7 --- /dev/null +++ b/posts/r_rstudio_basics/operators/operators.qmd @@ -0,0 +1,153 @@ +--- +title: "The Basics of R and RStudio" +subtitle: "Part 4: Operators" +author: "William Okech" +date: "2022-11-09" +image: "r_and_rstudio.png" +categories: [RStudio, R, Tutorial, Blog] +toc: true +draft: false +--- + +## Introduction + +R has many different types of operators that can perform different tasks. + +Here we will focus on 5 major types of operators. The major types of operators are: + +1. Arithmetic, + +2. Relational, + +3. Logical, + +4. Assignment, and + +5. Miscellaneous. + +## 1. Arithmetic Operators + +Arithmetic operators are used to perform mathematical operations. These operators have been highlighted in [Part 1](/posts/series_1/new_post_2/post_2.html) of the series. + +## 2. Relational Operators + +Relational operators are used to find the relationship between 2 variables and compare objects. The output of these comparisons is Boolean (`TRUE` or `FALSE`). The table below describes the most common relational operators. + +| Relational Operator | Description | +|:-------------------:|:------------------------:| +| \< | Less than | +| \> | Greater than | +| \<= | Less than or equal to | +| \>= | Greater than or equal to | +| == | Equal to | +| != | Not Equal to | + +Assign values to variables + +```{r} +x <- 227 +y <- 639 +``` + +### a. Less than + +```{r} +x < y +``` + +### b. Greater than + +```{r} +x > y +``` + +### c. Less than or equal to + +```{r} +x <= 300 +``` + +### d. Greater than or equal to + +```{r} +y >= 700 +``` + +### e. Equal to + +```{r} +y == 639 +``` + +### f. Not Equal to + +```{r} +x != 227 +``` + +## 3. Logical Operators + +Logical operators are used to specify multiple conditions between objects. Logical operators work with basic data types such as logical, numeric, and complex data types. This returns `TRUE` or `FALSE` values. Numbers greater that `1` are `TRUE` and `0` equals `FALSE`. The table below describes the most common logical operators. + +| Logical Operator | Description | +|:----------------:|:------------------------:| +| ! | Logical NOT | +| \| | Element-wise logical OR | +| & | Element-wise logical AND | + +Assign vectors to variables + +```{r} +vector_1 <- c(0,2) +vector_2 <- c(1,0) +``` + +### a. Logical NOT + +```{r} +!vector_1 +!vector_2 +``` + +### b. Element-wise Logical OR + +```{r} +vector_1 | vector_2 +``` + +### c. Element-wise Logical AND + +```{r} +vector_1 & vector_2 +``` +## 4. Assignment Operators + +These operators assign values to variables. A more comprehensive review can be obtained in [Part 2](/posts/series_1/new_post_3/post_3.html) of the series. + +## 5. Miscellaneous Operators + +These are helpful operators for working in that can perform a variety of functions. A few common miscellaneous operators are described below. + +| Miscellaneous Operator | Description | +|:-------------------:|:-------------------------------------------------:| +| %\*% | Matrix multiplication (to be discussed in subsequent chapters) | +| %in% | Does an element belong to a vector | +| : | Generate a sequence | + +### a. Sequence + +```{r} +a <- 1:8 +a +b <- 4:10 +b +``` + +### b. Element in a vector + +```{r} +a %in% b +9 %in% b +9 %in% a + +``` diff --git a/posts/r_rstudio_basics/operators/r_and_rstudio.png b/posts/r_rstudio_basics/operators/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/posts/r_rstudio_basics/operators/r_and_rstudio.png differ diff --git a/posts/r_rstudio_basics/software_install/r_and_rstudio.png b/posts/r_rstudio_basics/software_install/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/posts/r_rstudio_basics/software_install/r_and_rstudio.png differ diff --git a/posts/r_rstudio_basics/software_install/r_interface.png b/posts/r_rstudio_basics/software_install/r_interface.png new file mode 100644 index 0000000..274e214 Binary files /dev/null and b/posts/r_rstudio_basics/software_install/r_interface.png differ diff --git a/posts/r_rstudio_basics/software_install/rstudio_interface.png b/posts/r_rstudio_basics/software_install/rstudio_interface.png new file mode 100644 index 0000000..fbe05e8 Binary files /dev/null and b/posts/r_rstudio_basics/software_install/rstudio_interface.png differ diff --git a/posts/r_rstudio_basics/software_install/software_install.qmd b/posts/r_rstudio_basics/software_install/software_install.qmd new file mode 100644 index 0000000..b07bfc6 --- /dev/null +++ b/posts/r_rstudio_basics/software_install/software_install.qmd @@ -0,0 +1,64 @@ +--- +title: "Getting Started with R and RStudio" +subtitle: "Software Installation" +author: "William Okech" +date: "2022-06-08" +image: "r_and_rstudio.png" +categories: [RStudio, R, Tutorial, Blog] +toc: true +draft: false +--- + +# Welcome! + +In this 1st post, the reader will be introduced to the R programming language and RStudio software. + +## Introduction + +This blog aims to introduce new R/RStudio users to the fundamentals of R and lay the groundwork for more in-depth statistical analysis, data visualization, and reporting methods. I hope to present the topics in a straightforward manner so that anyone new to programming is not intimidated. + +## What is R? + +R is a programming language and open-source (freely available) software invented by Ross Ihaka and Robert Gentleman in 1993 (published as open-source in 1995) when they were based at the University of Auckland. *Fun fact: R represents the first letter of the first names of the creators*. The software is utilized by individuals working for various organizations ranging from academic institutions and healthcare organizations to financial services and information technology companies. In May 2022, the [TIOBE](https://www.tiobe.com/tiobe-index/) index (a measure of programming language popularity) demonstrated that R was the 13th most popular programming language. R's popularity may result from its highly extensible nature that allows users to perform statistical data analysis, generate visualizations, and report findings. + +### What are the benefits of using R? + +As mentioned in the previous section, R is an open-source software that is highly extensible. Thousands of extensions (also known as packages) can be installed, allowing one to increase the number of available applications. The main advantages of R include: 1. A [large](https://community.rstudio.com/) [community](https://www.r-project.org/help.html) of [users and developers](https://stackoverflow.com/) that can provide learning support and assist with technical challenges, 2. The ability to perform reproducible research. 3. Its cross-platform nature, which means that it can be used on Linux, Windows, and Mac operating systems. 4. The ability to generate [high-quality](https://r-graph-gallery.com/) [graphics](https://r-charts.com/) from datasets of varying dimensions. + +### I'm looking for R. Where can I find it? + +To install R on your personal computer, visit The R Project for Statistical Computing's Comprehensive R Archive Network [(CRAN)](https://cran.r-project.org/), download the most recent version, and install it according to the website's instructions. Once you download R, you can now experiment with some of its features. + +![](r_interface.png){fig-align="center" width="90%"} + +Figure 1: The standard R interface (Windows) + +When you open R, you will notice that it has a basic graphical user interface (GUI), and the console displays a command-line interface (CLI; where each command is executed one at a time). This may be intimidating for new users; however, there is a workaround for those who are not comfortable working at the command line. For those who are not experienced programmers, R can be used with an application called RStudio. + +## What is RStudio and how does it differ from R? + +RStudio is an integrated development environment (IDE) for R that was developed by JJ Allaire. This software contains tools that make programming in R easier. RStudio extends R's capabilities by making it easier to import data, write scripts, and generate visualizations and reports. The RStudio IDE is available for download from the [RStudio](https://www.rstudio.com/products/rstudio/download/) website. + +![](rstudio_interface.png){fig-align="center" width="90%"} + +Figure 2: RStudio interface with four main panes (Windows) + +Once installed, the basic layout of RStudio reveals that there is a script (text editor), console, navigation, and environment/history window pane. The script pane (text editor) in the upper-left allows one to write, open, edit, and execute more extended programs compared with using the standalone R software. The console pane (bottom-left) displays the script's output and offers a command-line interface for typing code that is immediately executed. The environment pane (upper-right) displays information about the created objects, the history of executed code, and any external connections. Finally, the navigation pane (bottom-right) shows multiple tabs. Its primary tabs include the "Plot" tab, which shows graphics created by code, the "Packages" tab where the packages are installed, and the "Help" tab, which provides assistance for all things R and allows one to search the R documentation. + +### What are the primary benefits of RStudio? + +RStudio allows one to create projects (a collection of related files stored within a working directory). Additionally, RStudio can be customized using options available under the "Tools" tab. Lastly, RStudio has Git integration that allows for version control where you can back up your code at different timepoints and effortlessly transfer code between computers.[^1] + +[^1]: Summary of the benefits of R and RStudio obtained from Lander, J. P. (2014). R for everyone: Advanced analytics and graphics. Addison-Wesley. + +## Conclusion + +Hopefully, this was a helpful introduction to R and RStudio. In subsequent blog posts, we will focus on: + +1. [Part 1: Simple arithmetic](/posts/series_1/new_post_2/post_2.html), +2. [Part 2: Variables](/posts/series_1/new_post_3/post_3.html), +3. [Part 3: Data types](/posts/series_1/new_post_4/post_4.html), +4. [Part 4: Operators](/posts/series_1/new_post_5/post_5.html), +5. [Part 5: Vectors](/posts/series_1/new_post_6/post_6.html), +6. [Part 6: Missing data](/posts/series_1/new_post_7/post_7.html) +7. [Part 7: Data Structures](/posts/series_1/new_post_8/post_8.html) diff --git a/posts/series_1/new_post_3/blog_series_pt_2.png b/posts/r_rstudio_basics/variables/blog_series_pt_2.png similarity index 100% rename from posts/series_1/new_post_3/blog_series_pt_2.png rename to posts/r_rstudio_basics/variables/blog_series_pt_2.png diff --git a/posts/r_rstudio_basics/variables/env_pane_1.png b/posts/r_rstudio_basics/variables/env_pane_1.png new file mode 100644 index 0000000..f4d782e Binary files /dev/null and b/posts/r_rstudio_basics/variables/env_pane_1.png differ diff --git a/posts/r_rstudio_basics/variables/post_3.qmd b/posts/r_rstudio_basics/variables/post_3.qmd new file mode 100644 index 0000000..83f660e --- /dev/null +++ b/posts/r_rstudio_basics/variables/post_3.qmd @@ -0,0 +1,119 @@ +--- +title: "The Basics of R and RStudio" +subtitle: "Part 2: Variables" +author: "William Okech" +date: "2022-06-22" +image: "r_and_rstudio.png" +categories: [RStudio, R, Tutorial, Blog] +toc: true +draft: false +--- + +## Introduction + +Variables are instrumental in programming because they are used as "containers" to store data values. + +To assign a value to a variable, we can use `<−` or `=`. However, most R users prefer to use `<−`. + +## Variable assignment + +### 1. Using `<-` + +```{r} +variable_1 <- 5 +variable_1 +``` + +### 2. Using `=` + +```{r} +variable_2 = 10 +variable_2 +``` + +### 3. Reverse the value and variable with `->` + +```{r} +15 -> variable_3 +variable_3 +``` + +### 4. Assign two variables to one value + +```{r} +variable_4 <- variable_5 <- 30 +variable_4 +variable_5 +``` + +## Variable output + +The output of the variable can then be obtained by: + +1. Typing the variable name and then pressing "Enter," +2. Typing "print" with the variable name in brackets, `print(variable)`, and +3. Typing "View" with the variable name in brackets, `View(variable)`. + +Both `print()` and `View()` are some of the many built-in functions[^1] available in R. + +[^1]: Functions are a collection of statements (organized and reusable code) that perform a specific task, and R has many built-in functions. + +In RStudio, the list of variables that have been loaded can be viewed in the environment pane. + +![](env_pane_1.png){fig-align="center" width="90%"} + +Figure 1: A screenshot of the environment pane with the stored variables. + +```{r} +print(variable_1) +``` + +```{r} +View(variable_2) +``` + +Output of `View()` will be seen in the script pane + +## The `assign()` and `rm()` functions + +In addition to using the assignment operators (`<-` and `=`), we can use the `assign()` function to assign a value to a variable. + +```{r} +assign("variable_6", 555) +variable_6 +``` + +To remove the assignment of the value to the variable, either delete the variable in the "environment pane" or use the `rm()` function. + +```{r} +variable_7 <- 159 +``` + +```{r} +rm(variable_7) +``` + +After running `rm()` look at the environment pane to confirm whether `variable_7` has been removed. + +## Naming variables + +At this point, you may be wondering what conventions are used for naming variables. First, variables need to have meaningful names such as current_temp, time_24_hr, or weight_lbs. However, we need to be mindful of the [variable](https://web.stanford.edu/class/cs109l/unrestricted/resources/google-style.html) [style guide](http://adv-r.had.co.nz/Style.html) which provides us with the appropriate rules for naming variables. + +Some rules to keep in mind are: + +1. R is case-sensitive (`variable` is not the same as `Variable`), +2. Names similar to typical outputs or functions (`TRUE`, `FALSE`, `if`, or `else`) cannot be used, +3. Appropriate variable names can contain letters, numbers, dots, and underscores. However, you cannot start with an underscore, number, or dot followed by a number. + +## Valid and invalid names + +### Valid names: + +- time_24_hr +- .time24_hr + +### Invalid names: + +- \_24_hr.time +- 24_hr_time +- .24_hr_time diff --git a/posts/r_rstudio_basics/variables/r_and_rstudio.png b/posts/r_rstudio_basics/variables/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/posts/r_rstudio_basics/variables/r_and_rstudio.png differ diff --git a/posts/r_rstudio_basics/vectors/r_and_rstudio.png b/posts/r_rstudio_basics/vectors/r_and_rstudio.png new file mode 100644 index 0000000..f3e3a34 Binary files /dev/null and b/posts/r_rstudio_basics/vectors/r_and_rstudio.png differ diff --git a/posts/r_rstudio_basics/vectors/vectors.qmd b/posts/r_rstudio_basics/vectors/vectors.qmd new file mode 100644 index 0000000..eb7f4ee --- /dev/null +++ b/posts/r_rstudio_basics/vectors/vectors.qmd @@ -0,0 +1,222 @@ +--- +title: "The Basics of R and RStudio" +subtitle: "Part 5: Vectors" +author: "William Okech" +date: "2022-11-12" +image: "r_and_rstudio.png" +categories: [RStudio, R, Tutorial, Blog] +toc: true +draft: false +--- + +## Introduction + +A vector is a collection of elements of the same data type, and they are a basic data structure in R programming. + +Vectors cannot be of mixed data type. The most common way to create a vector is with `c()`, where "c" stands for combine. In R, vectors do not have dimensions; therefore, they cannot be defined by columns or rows. Vectors can be divided into atomic vectors and lists (discussed in [Part 7](https://www.williamokech.com/posts/series_1/new_post_8/post_8.html)). The atomic vectors include logical, character, and numeric (integer or double). + +Additionally, R is a vectorized language because mathematical operations are applied to each element of the vector without the need to loop through the vector.Examples of vectors are shown below: + +• Numbers: `c(2, 10, 16, -5)` + +• Characters: `c("R", "RStudio", "Shiny", "Quarto")` + +• Logicals: `c("TRUE", "FALSE", "TRUE")` + +## Sequence Generation + +To generate a vector with a sequence of consecutive numbers, we can use `:`, `sequence()`, or `seq()`. + +### Generate a sequence using `:` + +```{r} +a <- 9:18 +a +a_rev <- 18:9 +a_rev +a_rev_minus <- 5:-3 +a_rev_minus +``` + +### Generate a sequence using `sequence()` + +```{r} +b <- sequence(7) +b +c <- sequence(c(5,9)) +c +``` + +### Generate a sequence using `seq()` + +The `seq()` function has four main arguments: seq(from, to, by, length.out), where "from" and "to" are the starting and ending elements of the sequence. Additionally, "by" is the difference between the elements, and "length.out" is the maximum length of the vector. + +```{r} +d <- seq(2,20,by=2) +d +f <- seq(2,20, length.out=5) +f +h <- seq(20,2,by=-2) +h +j <- seq(20, 2, length.out=3) +j +``` + +## Repeating vectors + +To create a repeating vector, we can use `rep()`. + +```{r} +k <- rep(c(0,3,6), times = 3) +k +l <- rep(2:6, each = 3) +l +m <- rep(7:10, length.out = 20) +m +``` + +## Vector Operations + +Vectors of equal length can be operated on together. If one vector is shorter, it will get recycled, as its elements are repeated until it matches the elements of the longer vector. When using vectors of unequal lengths, it would be ideal if the longer vector is a multiple of the shorter vector. + +### Basic Vector Operations + +```{r} +vec_1 <- 1:10 + +vec_1*12 # multiplication +vec_1+12 # addition +vec_1-12 # subtraction +vec_1/3 # division +vec_1^4 # power +sqrt(vec_1) # square root +``` + +### Operations on vectors of equal length + +Additionally, we can perform operations on two vectors of equal length. + +1. Create two vectors + +```{r} +vec_3 <- 5:14 +vec_3 +vec_4 <- 12:3 +vec_4 +``` + +2. Perform various arithmetic operations + +```{r} +vec_3 + vec_4 +vec_3 - vec_4 +vec_3 / vec_4 +vec_3 * vec_4 +vec_3 ^ vec_4 +``` + +## Functions that can be applied to vectors + +The functions listed below can be applied to vectors: + +1. `any()` + +2. `all()` + +3. `nchar()` + +4. `length()` + +5. `typeof()` + +### Examples + +```{r} +any(vec_3 > vec_4) +any(vec_3 < vec_4) +``` + +```{r} +all(vec_3 > vec_4) +all(vec_3 < vec_4) +``` + +```{r} +length(vec_3) +length(vec_4) +``` + +```{r} +typeof(vec_3) +typeof(vec_4) +``` + +Determine the number of letters in a character + +```{r} +vec_5 <- c("R", "RStudio", "Shiny", "Quarto") +nchar(vec_5) +``` + +## Recycling of vectors + +```{r} +vec_3 + c(10, 20) +vec_3 + c(10, 20, 30) # will result in a warning as the longer vector is not a multiple of the shorter one +``` + +## Accessing elements of a vector + +To access the elements of a vector, we can use numeric-, character-, or logical-based indexing. + +### Examples + +#### 1. Name the columns of a vector with `names()`. + +Create the vector. + +```{r} +vec_name <- 1:5 +vec_name +``` + +Name the individual elements. + +```{r} +names(vec_name) <- c("a", "c", "e", "g", "i") +vec_name +``` + +#### 2. Use the vector index to filter + +```{r} +vec_index <- 1:5 +vec_index +``` + +##### a) Logical vector as an index + +```{r} +vec_index[c(TRUE, FALSE, TRUE, FALSE, TRUE)] +``` + +##### b) Filter vector based on an index + +```{r} +vec_index[1:3] +``` + +##### c) Access a vector using its position + +```{r} +vec_index[4] +vec_index[c(2,4)] +``` + +##### d) Modify a vector using indexing + +```{r} +vec_index +vec_index[5] <- 1000 +vec_index +```