-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path05-Stratified-sampling.Rmd
682 lines (490 loc) · 31.5 KB
/
05-Stratified-sampling.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
# Stratified Simple Random Sampling
Stratified simple random sampling is a technique where the study area is divided into different `groups` or `strata` based on certain environmental traits and a number of random samples are taken from within each group. One of the primary advantages of stratified sampling is its ability to capture the diversity within a population by making sure each group is represented. It can provide a more accurate reflection of the entire population compared to random sampling, especially when the groups are distinct and have unique qualities. This approach is particularly beneficial when certain subgroups within the population are specifically noteworthy. It also allows for more precise estimates with a smaller total sample size compared to simple random choice. Stratified sampling presents some disadvantages. Achieving effective categories requires a proper definition and delineation of the initial information to create the `strata`. The classification of the environmental information into categories and ensuring fair portrayal of each can be intricate and time–taking and mislabeling elements into an improper group can lead to skewed outcomes.
## General Procedure
The creation of a stratified simple random sampling design involves the identification of relevant features describing the environmental diversity in the area (soil and landcover are the environmental variables generally used to define strata), delineation of the strata, determination of the number of samples to distribute to each stratum, followed by random sampling within it. By identifying relevant classes, combining them to define strata and allocating an appropriate number of samples to each stratum, a representative sample can be obtained. Random sampling within each stratum helps to ensure that the sample is unbiased and provides a fair representation of the overall conditions in the area.
The first question is about how many samples must be retrieved from each strata. The sampling scheme starts with the definition of the total number of samples to collect. In this case, the determination of the sample size is a complex and highly variable process based, among others, on the specific goals of the study, the variability of environmental proxies, the statistical requirements for accuracy and confidence, as well as additional considerations such as accessibility, costs and available resources. The optimal number of samples can be determined following the method proposed in Chapter 2 of this manual. The number of samples within each stratum is calculated using an area–weighted approach taking into account the relative area of each stratum. The sampling design in this section must also comply with the following requirements:
* All sampling strata must have a minimum size of 100 hectares.
* All sampling strata must be represented by at least 2 samples.
This sampling process ensures the representativeness of the environmental combinations present across the area while maintaining an efficient and feasible field sampling campaign.
### Strata creation
We must determine the kind of information that will be used to construct the `strata`. In this manual, we present a simple procedure to build strata based on data from two environmental layers: soil groups and landcover classification data. The information should be provided in the form of vector shapefiles with associated information databases. The data on both sets often comprises a large number of categories, that would lead to a very large number of `strata`. Thus, it is desirable to make an effort of aggregating similar categories within each input data set, to reduce, as much as possible, the number of categories while still capturing the most of the valuable variability in the area.
The fist step is to set–up the RStudio environment and load the required packages:
```{r strata–setup, eval=TRUE, include=FALSE, warning=FALSE}
# Load packages as a vector objects
packages <- c("sf", "terra", "tidyverse", "rmapshaper", "units","plyr", "mapview", "leaflet","stars","sgsR")
lapply(packages, require, character.only = TRUE) # Load packages
rm(packages) # Remove object to save memory space
# Set working directory to source file location
#setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
```
```{r load_wd_strata, eval=TRUE, include=FALSE, warning=FALSE}
## Set working directory to source file location
#setwd("/Users/luislado/Dropbox/Github/Sampling_Design_GSP")
```
We must define the number of samples to distribute in the sampling design and the soil and landcover information layers to build the strata. We also define a BOUNDARY parameter to account for a reduction of the sampling area according to a certain area using predefined bounding–box, that can be also here defined.
```{r strata–parameters, eval=TRUE, include=FALSE, echo=FALSE}
# Define number of samples
n <- 250
# Load soil data
soil <- st_read(paste0(shp.path,"/soils.shp"),promote_to_multi = FALSE)
# Load Landcover data
lc <- st_read(paste0(shp.path,"/landcover.shp"))
# Define a bounding box (optional). If BOUNDING = TRUE, then the bounding–box shapefile (bb)
# must be specified and the uncomment the line in the code below
BOUNDING = FALSE
# bb <- st_read("your_bounding_box.shp")
distance.buffer = 500 # Buffer distance for samples
```
We proceed with the calculation of soil groups. In this example, soil information is stored in the field `SRG`. We have analysed the extent to which the information in this field can be synthesized to eliminate redundancy when creating the `strata`. ^[This exploratory work is a prerequisite and must be adapted specifically to each soil and landcover dataset].
The results are shown in \@ref(fig:fig-10)
```{r soil–aggregation, eval=TRUE, include=FALSE, warning=FALSE}
# Clip soil data by bounding area if defined
if (BOUNDING) {
soil <- st_intersection(soil, bb)
} else {
soil <- soil
}
# Check geometry
soil <- st_make_valid(soil)
soil <- st_cast(soil, "MULTIPOLYGON")
#Aggregate information by reference soil group (SRG)
#unique(soil$RSG)
# Remove NAs (water bodies are deleted)
soil <- soil[!is.na(soil$RSG),]
#unique(soil$RSG)
# Dissolve polygons by reference soil group
soil <- soil %>% group_by(RSG) %>% dplyr::summarize()
```
```{r fig-10, fig.cap="Plot of the soil classes", eval=TRUE, include=TRUE}
## Plot aggregated soil classes
map <- leaflet(leafletOptions(minZoom = 8)) %>%
addTiles()
mv <- mapview(soil["RSG"], alpha=0, homebutton=T, layer.name = "soils", map=map)
mv@map
#ggplot() + geom_sf(data=soil, aes(fill = factor(RSG)))
```
A similar procedure is performed on the landcover dataset.
```{r landcover–aggregation, eval=TRUE, include=FALSE, warning=FALSE}
# Calculate Landcover Groups
# Clip land landcover by bounding area if exist
if (BOUNDING) {
lc <- st_intersection(lc, bb)
} else {
lc <- lc
}
# Check geometry
lc <- st_make_valid(lc)
# Agregate units
#unique(lc$landcover)
lc$landcover[lc$landcover=="Broadleaf forest"]<- "Forest"
lc$landcover[lc$landcover=="Needleleaf forest"]<- "Forest"
lc$landcover[lc$landcover=="Mixed forest"]<- "Forest"
lc <- lc[lc$landcover!="Water body",]
#unique(lc$landcover)
# Dissolve polygons by reference soil group
lc <- lc %>% group_by(landcover) %>% dplyr::summarize()
```
Figure \@ref(fig:fig-11) shows the landcover classes to build the strata.
```{r fig-11, fig.cap="Plot of the landcover classes", eval=TRUE, include=TRUE}
# Plot map with the land cover information
map <- leaflet() %>%
addTiles()
mv <- mapview(lc["landcover"], alpha=0, homebutton=T, layer.name = "Landcover", map=map)
mv@map
#ggplot() + geom_sf(data=lc, aes(fill = factor(landcover)))
```
To create the soil–landcover `strata` we must combine both classified datasets.
```{r combine–data, eval=TRUE, include=TRUE, warning=FALSE, message=FALSE}
# Combine soil and landcover layers
sf_use_s2(FALSE)
soil_lc <- st_intersection(st_make_valid(soil), st_make_valid(lc))
soil_lc$soil_lc <- paste0(soil_lc$RSG, "_", soil_lc$landcover)
soil_lc <- soil_lc %>% dplyr::select(soil_lc, geometry)
```
Finally, to comply with the initial requirements of the sampling design, we calculate the areas of each polygon, delete all features with extent lesser than 100 has.
```{r select–area, eval=TRUE, include=TRUE}
# Select by Area. Convert to area to ha and select polygons with more than 100 has
soil_lc$area <- st_area(soil_lc)/10000
soil_lc$area <- as.vector(soil_lc$area)
soil_lc <- soil_lc %>%
group_by(soil_lc) %>%
mutate(area = sum(area))
soil_lc <- soil_lc[soil_lc$area > 100,]
# Replace blank spaces with underscore symbol to keep names uniform
soil_lc$soil_lc <- str_replace_all(soil_lc$soil_lc, " ", "_")
# Create a column of strata numeric codes
soil_lc$code <- as.character(as.numeric(as.factor(soil_lc$soil_lc)))
```
```{r write-strata-05, eval=FALSE, include=TRUE}
# Write final sampling strata map
st_write(soil_lc, paste0(results.path,"strata.shp"), delete_dsn = TRUE)
```
The final strata map is shown in Figure \@ref(fig:fig-12).
```{r fig-12, fig.cap="Plot of strata", eval=TRUE, include=TRUE}
# Plot final map of stratum
map <- leaflet(options = leafletOptions(minZoom = 8.3)) %>%
addTiles()
mv <- mapview(soil_lc["soil_lc"], alpha=0, homebutton=T, layer.name = "Strata", map=map)
mv@map
```
## Stratified random sampling
This example demonstrates how to establish a stratified random sampling approach within the previously defined strata polygons. The allocation of sample points is proportionate to the stratum areas, with the condition that each stratum must contain a minimum of 2 samples. The determination of sampling points, referred to as `'target points'`, is made during the initial phase of the sampling design and takes into consideration factors such as the area to be sampled, budget constraints and available personnel. Additionally, a set number of `'replacement points'` must be designated to act as substitutes for 'target points' in cases where some of the original target points cannot be accessed or sampled. These 'replacement points' are systematically indexed, with each index indicating which 'target point' it serves as a substitute for.
```{r random–sampling, eval=TRUE, include=FALSE}
## Stratified Simple Random Sampling over Soil/Landcover Strata
# Read strata shapefile
polygons <- soil_lc
#polygons <- st_read(paste0(results.path,"strata.shp"))
# Calculate the area of each polygon
polygons$area <- st_area(polygons)
# Create a new column to group polygons by a common attribute
polygons$group <- polygons$soil_lc
# Drop units to allow computations
polygons <- drop_units(polygons)
# Calculate the total area of all polygons in each group
group_areas <- polygons %>%
dplyr::group_by(group) %>%
dplyr::summarize(total_area = sum(area))
# Add a code to each group
group_codes <- polygons %>% group_by(group) %>%
dplyr::summarize(code = first(code))
group_areas <- left_join(group_areas,st_drop_geometry(group_codes), by = "group")
# Ensure minimum of 2 samples at each polygon in each group
group_areas$sample_count <- 2
# Calculate the number of samples per group based on relative area
group_areas$sample_count <- group_areas$sample_count+round(group_areas$total_area/sum(group_areas$total_area) *
(n-sum(group_areas$sample_count)))
while (sum(group_areas$sample_count) != n) {
if (sum(group_areas$sample_count) > n) {
# Reduce sample count for the largest polygon until total count is n
max_index <- which.max(group_areas$sample_count)
group_areas$sample_count[max_index] <- group_areas$sample_count[max_index] - 1
} else {
# Increase sample count for the smallest polygon until total count is n
min_index <- which.min(group_areas$sample_count)
group_areas$sample_count[min_index] <- group_areas$sample_count[min_index] + 1
}
}
#sum(group_areas$sample_count)
polygons <- left_join(polygons, st_drop_geometry(group_areas), by = c("soil_lc"="group"))
polygons <- dplyr::select(polygons, soil_lc, code.x, sample_count, geometry)
# Generate random points within each strata of size 3 times the required samples for each strata ----
x <- spatSample(x = vect(group_areas), size = group_areas$sample_count * 3, method = "random")
# Compute sampling points for strata
z <- x %>%
st_as_sf() %>%
dplyr::group_by(code) %>%
dplyr::mutate(sample_count = as.numeric(sample_count),
order = seq_along(code),
ID = paste0(code, ".", order),
type = ifelse(sample_count >= order, "Target", "Replacement")) %>%
vect()
# Find missing samples
missing.data <- left_join(group_areas,data.frame(z) %>%
dplyr::filter(type=="Target") %>%
dplyr::group_by(code) %>%
tally()) %>%
dplyr::mutate(diff=sample_count-n)
# Determine missing sampled strata
missing.strata <- which(is.na(missing.data$diff))
# Determine missing sampling point in strata (undersampled strata)
missing.sample = which(missing.data$diff != 0)
missing.number <- as.numeric(unlist(st_drop_geometry(missing.data[(missing.sample <- which(missing.data$diff != 0)),7])))
# Compute sampling points for missing sampled strata
x.missing.strata <- x[1]
x.missing.strata$sample_count<- 0
for(i in missing.strata){
xx.missing.strata <- x[1]
xx.missing.strata$sample_count<- 0
nn=0
while (sum(xx.missing.strata$sample_count) <
group_areas[group_areas$code==i,][["sample_count"]]*5) {
while(nn < group_areas[group_areas$code==i,][["sample_count"]]*3){
my.missing.strata <- spatSample(x = vect(group_areas[group_areas$code %in% i,]),
size = group_areas[group_areas$code==i,][["sample_count"]]*5,
method = "random")
nn <- nn + nrow(data.frame(my.missing.strata))
}
xx.missing.strata <- rbind(xx.missing.strata,my.missing.strata)
print(sum(xx.missing.strata$sample_count))
}
print(i)
print(xx.missing.strata)
x.missing.strata <- rbind(x.missing.strata,xx.missing.strata)
}
# Join initial sampling with missing sampling strata data
x <- rbind(x, x.missing.strata)
# Compute sampling points for missing samples (random sampling)
x.missing.sample <- x[1]
for(i in missing.sample){
xx.missing.sample <- x[1]
xx.missing.sample$sample_count<- 0
while (sum(xx.missing.sample$sample_count) < (group_areas[group_areas$code==i,][["sample_count"]]*3)) {
my.missing.sample <- spatSample(x = vect(group_areas[group_areas$code %in% i,]),
size = as.numeric(vect(group_areas[group_areas$code %in% i,])[[4]])+
(group_areas[group_areas$code==i,][["sample_count"]]*3), method = "random")
xx.missing.sample <- rbind(xx.missing.sample,my.missing.sample)
print(sum(xx.missing.sample$sample_count))
}
print(i)
print(xx.missing.sample)
x.missing.sample <- rbind(x.missing.sample,xx.missing.sample)
}
# Join initial sampling with missing sampling strata data and with missing samples
x <- rbind(x, x.missing.sample)
# Remove extra artificial replacements
x <- x[x$sample_count > 0,]
# Convert to Shapefile
z <- x %>%
st_as_sf() %>%
dplyr::group_by(code) %>%
dplyr::mutate(sample_count = as.numeric(sample_count),
order = seq_along(code),
ID = paste0(code, ".", order),
type = ifelse(sample_count >= order, "Target", "Replacement")) %>%
vect()
```
```{r write-random–sampling, eval=FALSE, include=FALSE}
# Export data to Shapefile ----
# Write sampling points to shp
writeVector(z, paste0(results.path,"strat_randm_samples.shp"), overwrite=TRUE)
# Check whether the number of initial target points equals the final target points
n;nrow(z[z$type=="Target",])
```
Results are shown in Figure \@ref(fig:fig-13).
```{r fig-13, fig.cap="Plot of strata and random target and replacement points", eval=TRUE, include=TRUE}
map <- leaflet(options = leafletOptions(minZoom = 8.3)) %>%
addTiles()
mv <- mapview(soil_lc["soil_lc"], alpha=0, homebutton=T, layer.name = "Strata") +
mapview(sf::st_as_sf(z), zcol = 'type', color = "white", col.regions = c('royalblue', 'tomato'), cex=3, legend = TRUE,layer.name = "Samples")
mv@map
```
## Stratified simple random sampling for large areas
The implementation of a stratified simple random sampling, along with target and replacement points, can present operating difficulties when dealing with areas of significant size and with locations that are hard to reach. To address this issue, the sampling approach can be modified by excluding areas with limited accessibility.
This modification can streamline fieldwork operations and establish a feasible sampling method while still retaining the essence of the stratified simple random sampling framework. By excluding areas with limited accessibility, the sampling design can be adjusted to ensure a more practical and effective approach to data collection.
* **Delineation of sampling accessibility:** The sampling area can be further limited based on accessibility considerations. Areas with very limited accessibility, defined as regions located more than 1 kilometre away from a main road or access path, may be excluded from sampling areas. To accomplish this, a map of main roads and paths can be used to establish a sampling buffer that includes areas within a 1–kilometre buffer around the road infrastructures. This exclusion helps to eliminate the most remote and challenging–to–access areas. An additional layer of accessibility information can be incorporated based on population distribution in the country, considering that, if population is present, there is a high change that points in the surroundings can be accessible for sampling. In this case, populated nuclei are vectorized into points and a 250–meter buffer is then generated around each point. These resulting areas can be then added to the 1–kilometre buffer around the roads, which collectively defined the final sampling area.
* **Substitution of replacement points with replacement areas in close proximity to the target points:** The sampling design presented before included designated replacement points to serve as substitutes for each target point in the case that it would be inaccessible during fieldwork. However, this approach presented challenges, particularly for large areas, as the replacement point could be located far from the target point, resulting in significant logistical efforts. This limitation posed a risk of delays in completing the sampling campaign within the allocated time frame. To address this challenge, an alternative strategy is to replace the idea of replacement points with replacement areas situated in the immediate vicinity of the target point. The replacement area for each target point is now confined within a 500–meter buffer surrounding the target and falls within the same sampling stratum. This approach concentrates sampling and replacement activities within a specific geographic area, streamlining the overall process. By reducing the need for extensive travel, this method enhances efficiency and facilitates sample collection. Figure 2 illustrates the distribution of sampling points and replacement areas for visualization.
* **Additional area exclusion:** Some areas can be identified as not suitable for sampling purposes. This is the case of certain natural protected areas, conflict regions presenting risks for field operators, etc. These areas must be identified masked at an initial stage of the design to exclude them from the sampling strata.
The procedure is the same as that previously presented, with the difference that buffers and exclusion areas must be masked–out from the strata map before performing the random sampling.
```{r replacement–areas, eval=FALSE, include=TRUE}
# Compute replacement sampling areas -----
# Load strata
soil_lc <- st_read(paste0(results.path,"strata.shp"))
# Read sampling. points from previous step
z <- st_read(paste0(results.path,"strat_randm_samples.shp"))
z <- z[z$type=="Target",]
# Define buffer of 500 meters (coordinate system must be in metric base)
buf.samples <- st_buffer(z, dist=distance.buffer)
# Intersect buffers
samples_buffer = st_intersection(soil_lc, buf.samples)
samples_buffer <- samples_buffer[samples_buffer$type=="Target",]
samples_buffer <- samples_buffer[samples_buffer$soil_lc==samples_buffer$group,]
# Save Sampling areas
st_write(samples_buffer, paste0('../soil_sampling/JAM/replacement_areas_', samples.buffer, '.shp'), delete_dsn = TRUE)
# Write target points only
targets <- z[z$type=="Target",]
st_write(targets, '../soil_sampling/JAM/sampling_points_TAR.shp', delete_dsn = TRUE)
```
## Stratified simple regular sampling
The procedure for creating a stratified simple regular sampling design is identical to that presented for stratified simple random sampling, with the only distinction that the locations of the sampling points are distributed in a regular spatial grid. This transformation is achieved by changing the method from 'random' to 'regular' in the spatSample functions within the script above.
```{r regular–sampling, eval=TRUE, include=FALSE}
# Stratified Regular Sampling over Soil/landcover Strata ----
# Read strata shapefile
# polygons <- st_read(paste0(results.path,"strata.shp"))
polygons <- soil_lc
polygons$area <- st_area(polygons) # calculate the area of each polygon
# Create a new column to group polygons by a common attribute
polygons$group <- polygons$soil_lc
# Drop units to allow computations
polygons <- drop_units(polygons)
# Calculate the total area of all polygons in each group
group_areas <- polygons %>%
dplyr::group_by(group) %>%
dplyr::summarize(total_area = sum(area))
# Add a code to each group
group_codes <- polygons %>% group_by(group) %>%
dplyr::summarize(code = first(code))
group_areas <- left_join(group_areas,st_drop_geometry(group_codes), by = "group")
# Ensure minimum of 2 samples at each polygon in each group
group_areas$sample_count <- 2
# Calculate the number of samples per group based on relative area
group_areas$sample_count <- group_areas$sample_count+round(group_areas$total_area/sum(group_areas$total_area) *
(n-sum(group_areas$sample_count)))
while (sum(group_areas$sample_count) != n) {
if (sum(group_areas$sample_count) > n) {
# Reduce sample count for the largest polygon until total count is n
max_index <- which.max(group_areas$sample_count)
group_areas$sample_count[max_index] <- group_areas$sample_count[max_index] - 1
} else {
# Increase sample count for the smallest polygon until total count is n
min_index <- which.min(group_areas$sample_count)
group_areas$sample_count[min_index] <- group_areas$sample_count[min_index] + 1
}
}
polygons <- left_join(polygons, st_drop_geometry(group_areas), by = c("soil_lc"="group"))
polygons <- dplyr::select(polygons, soil_lc, code.x, sample_count, geometry)
# Generate regular points within each strata of size 3 times the required samples for each strata ----
x <- spatSample(x = vect(group_areas), size = group_areas$sample_count * 3, method = "regular")
# Compute sampling points for strata
z <- x %>%
st_as_sf() %>%
dplyr::group_by(code) %>%
dplyr::mutate(sample_count = as.numeric(sample_count),
order = seq_along(code),
ID = paste0(code, ".", order),
type = ifelse(sample_count >= order, "Target", "Replacement")) %>%
vect()
# Find missing samples
missing.data <- left_join(group_areas,data.frame(z) %>%
dplyr::filter(type=="Target") %>%
dplyr::group_by(code) %>%
tally()) %>%
dplyr::mutate(diff=sample_count-n)
# Determine missing sampled strata
missing.strata <- which(is.na(missing.data$diff))
# Determine missing sampling point in strata (undersampled strata)
missing.sample = which(missing.data$diff != 0)
missing.number <- as.numeric(unlist(st_drop_geometry(missing.data[(missing.sample <- which(missing.data$diff != 0)),7])))
# Compute sampling points for missing sampled strata
x.missing.strata <- x[1]
x.missing.strata$sample_count<- 0
for(i in missing.strata){
xx.missing.strata <- x[1]
xx.missing.strata$sample_count<- 0
nn=0
while (sum(xx.missing.strata$sample_count) <
group_areas[group_areas$code==i,][["sample_count"]]*5) {
while(nn < group_areas[group_areas$code==i,][["sample_count"]]*3){
my.missing.strata <- spatSample(x = vect(group_areas[group_areas$code %in% i,]),
size = group_areas[group_areas$code==i,][["sample_count"]]*5,
method = "random")
nn <- nn + nrow(data.frame(my.missing.strata))
}
xx.missing.strata <- rbind(xx.missing.strata,my.missing.strata)
print(sum(xx.missing.strata$sample_count))
}
print(i)
print(xx.missing.strata)
x.missing.strata <- rbind(x.missing.strata,xx.missing.strata)
}
# Join initial sampling with missing sampling strata data
x <- rbind(x, x.missing.strata)
# Compute sampling points for missing samples (regular sampling)
x.missing.sample <- x[1]
for(i in missing.sample){
xx.missing.sample <- x[1]
xx.missing.sample$sample_count<- 0
while (sum(xx.missing.sample$sample_count) < (group_areas[group_areas$code==i,][["sample_count"]]*3)) {
my.missing.sample <- spatSample(x = vect(group_areas[group_areas$code %in% i,]),
size = as.numeric(vect(group_areas[group_areas$code %in% i,])[[4]])+
(group_areas[group_areas$code==i,][["sample_count"]]*3), method = "regular")
xx.missing.sample <- rbind(xx.missing.sample,my.missing.sample)
print(sum(xx.missing.sample$sample_count))
}
print(i)
print(xx.missing.sample)
x.missing.sample <- rbind(x.missing.sample,xx.missing.sample)
}
# Join initial sampling with missing sampling strata data and with missing samples
x <- rbind(x, x.missing.sample)
# Remove extra artificial replacements
x <- x[x$sample_count > 0,]
# Convert to Shapefile
z <- x %>%
st_as_sf() %>%
dplyr::group_by(code) %>%
dplyr::mutate(sample_count = as.numeric(sample_count),
order = seq_along(code),
ID = paste0(code, ".", order),
type = ifelse(sample_count >= order, "Target", "Replacement")) %>%
vect()
```
```{r write-samples-05, eval=FALSE, include=TRUE}
# Export data to Shapefile ----
# Write sampling points to shp
writeVector(z, paste0(results.path,"sampling_points.shp"), overwrite=TRUE)
# Check whether the number of initial target points equals the final target points
n;nrow(z[z$type=="Target",])
```
```{r fig-14, fig.cap="Plot of strata and regular sampling points", eval=TRUE, include=TRUE}
map <- leaflet(options = leafletOptions(minZoom = 8.3)) %>%
addTiles()
mv <- mapview(soil_lc["soil_lc"], alpha=0, homebutton=T, layer.name = "Strata") +
mapview(sf::st_as_sf(z), zcol = 'type', color = "white", col.regions = c('royalblue', 'tomato'), cex=3, legend = TRUE,layer.name = "Samples")
mv@map
```
## Simple Random Sampling based on a stratified raster
Finally, it is also possible to create a stratified area weighted random sampling using raster strata. The procedure involves the creation of the strata as a raster file and implement a random sampling using the frequencies of the strata as a guideline for distribution of the samples proportionally to their frequencies. This method is easily implemented using the package 'sgsR' [@sgsr].
```{r sgsR-random,eval=TRUE, include=TRUE, message=FALSE, warning=FALSE}
#strata <- st_read(paste0(results.path,"strata.shp"),quiet = TRUE)
strata <- soil_lc
strata$code <- as.integer(strata$code)
# Create stratification raster
strata <- rast(st_rasterize(strata["code"],st_as_stars(st_bbox(strata), nx = 250, ny = 250)))
names(strata) <- "strata"
strata <- crop(strata, nghe, mask=TRUE)
# Create stratified simple random sampling
target <- sample_strat(
sraster = strata,
nSamp = n
)
target$type <- "target"
```
Figure \@ref(fig:fig-14b) show the histogram of frequencies of samples over the strata categories respectively.
```{r fig-14b, fig.cap="Frequencies of strata and random target samples", eval=TRUE, include=TRUE, message=FALSE, warning=FALSE}
# Histogram of frequencies
calculate_representation(
sraster = strata,
existing = target,
drop=0,
plot = TRUE
)
# Add index by strata
target <- target %>%
st_as_sf() %>%
dplyr::group_by(strata) %>%
dplyr::mutate(order = seq_along(strata),
ID = paste0(strata, ".", order)) %>%
vect()
```
The construction of replacement points is straightforward by creating a new stratified set of samples on the same strata. In this case, the sample size has been incremented 3 times to create 3 replacement points for each target point.
```{r replacements-raster, eval=TRUE, include=TRUE, message=FALSE, warning=FALSE}
# Create replacement points
replacement <- sample_strat(
sraster = strata,
nSamp = nrow(target)*3
)
replacement$type <- "replacement"
```
```{r fig-14c, fig.cap="Frequencies of strata and random replacement samples", eval=TRUE, include=TRUE, message=FALSE, warning=FALSE}
# Histogram of frequencies
calculate_representation(
sraster = strata,
existing = replacement,
drop=0,
plot = TRUE
)
# Add index by strata
replacement <- replacement %>%
st_as_sf() %>%
dplyr::group_by(strata) %>%
dplyr::mutate(order = seq_along(strata),
ID = paste0(strata, ".", order)) %>%
vect()
# Merge target and replacement points in a single object
sampling.points <- rbind(target,replacement)
```
Figures \@ref(fig:fig-14c) and \@ref(fig:fig-14d) show the frequencies of replacement samples over the strata categories and the distribution of target and replacement samples respectively.
```{r fig-14d, fig.cap="Plot of raster strata and sampling points", eval=TRUE, include=TRUE}
# Plot samples over strata
map <- leaflet(options = leafletOptions(minZoom = 8.3)) %>%
addTiles()
mv <- mapview(st_as_stars(strata),homebutton=T, layer.name = "Strata") +
mapview(sampling.points, zcol = 'type', color = "white", col.regions = c('royalblue', 'tomato'), cex=3, legend = TRUE,layer.name = "Samples")
mv@map
```
The resulting target and replacement points can finally be stored as shapefiles.
```{r write-raster-pts, eval=FALSE, include=TRUE}
## Export to shapefile
writeVector(sampling.points,
paste0(results.path,"pts_raster.shp"), overwrite=TRUE)
```