living-data-tutorials.github.io/_lessons/2022-04-07-alberta-trees-tutorial/Biodiversity_Alberta.Rmd at 70a2a488f3f04a23aee4ae8114be375bc60ee3cb · Living-Data-Tutorials/living-data-tutorials.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
---
title: "**Alberta Trees Tutorial**"
author: |
  | Contact us:
  | **Zihaohan Sang** (University of Alberta)
  | zihaohan@ualberta.ca
  | **Rolando Trejo** (Université de Montréal)
  | rolando.trejo.perez@umontreal.ca
  | **Jhoan Chavez** (University of Northern British Columbia)
  | chavez@unbc.ca
date: "`r format(Sys.time(), '%B %d, %Y')`"
output: learnr::tutorial
progressive: true
runtime: shiny_prerendered
---


```{r, include=FALSE,message=FALSE, echo=FALSE}
RequiredPackages <- c("learnr","htmlwidgets","vembedr","tidyverse", "readr","ggplot2","multcompView","MuMIn","rpart","rpart.plot","plotly","dplyr","jtools","interactions","shiny")
for (i in RequiredPackages) { #Installs packages if not yet installed
  if (!require(i, character.only = TRUE)) install.packages(i)
}

library(learnr)
library(htmlwidgets)
library(vembedr)
library(tidyverse)
library(readr)
library(ggplot2)
library(multcompView)
library(MuMIn)
library(rpart)
library(rpart.plot)
library(plotly)
library(dplyr)
library(jtools)
library(interactions)
library(shiny)

knitr::opts_chunk$set(echo = FALSE)
```

## **Tutorial objectives**

Forestry and ecology students are the optimal audience to follow this tutorial. Some prerequisites are necessary but not essential, i.e. [an introduction to linear regression](https://www.youtube.com/watch?v=gb4qqX4uhYA), [how to read a boxplot?](https://www.youtube.com/watch?v=7UK2DK7rblw), [the Bell Curve (Normal/Gaussian Distribution)](https://www.youtube.com/watch?v=DJzmb7hGmeM), [what is a residual plot?](https://www.youtube.com/watch?v=J5gRckrv44c).

At the end of this tutorial you will be able to:

1. Contextualize the global and boreal plant diversity distribution.
2. Understand the basis of the linear model approach linking plant diversity, space and environment.
3. Introduce yourself to the non-linear model relationship in ecological data.
4. Debut into the dendrochronology concept in forestry and ecology.
5. Explore the tree analysis over time using multiple datasets through graphic visualization.

We will focus on the plant vascular cover diversity and soil temperature in 2010. We will also use the dendrochronology historical datasets.

## Learning objectives

Through this tutorial you will acquire the following skills:

1. A global understanding of plant biodiversity and dendrochronology concepts.

2. Use real data coming from the Alberta legacy dataset, a project monitoring diversity in Western Canadian forest, to implement a linear model approach between diversity and space and environment.

3. Explore and interpret data patterns using data exploration, model codification, selection and validation.

4. Familiarize and code by yourself by adapting R code proposed in this tutorial.


## **Alberta legacy dataset background**

An interesting project monitoring diversity was the Seasonal and annual dynamics of western Canadian boreal forest plant communities: a legacy dataset spanning four decades. The primary purpose of the Seasonal Dynamics (SEADYN) and later Annual Dynamics (ANNDYN) research projects was to document seasonal changes in the vegetative composition during the snow-free season (May through October) and longer-term changes in vegetation and forest mensuration for boreal forest stands in Alberta, Canada dominated by Pinus banksiana (Lamb.) (see central image in the below figure).

Two regions were used for this study: one in the Hondo-Slave Lake (hereafter, Hondo) region of Alberta, which was surveyed from 1980 to 2015, and a second location in the Athabasca Oil Sands (hereafter, AOS) region in northeastern Alberta, which was surveyed from 1981 to 1984 and thought to have substantial atmospheric pollution due to regional industrial development (oil sands mining and processing).

```{r fig2, echo = FALSE, out.width = "100%", fig.cap = "Photo from "}
htmltools::img(
				src = "https://github.com/RolandoTrejo/website/blob/main/_lessons/2022-04-07-alberta-trees-tutorial/Images/Alberta_Project.png?raw=true", height = "500px", width = "100%", `data-external` = "1"
			)
```
*Photo from Seasonal Dynamics (SEADYN) and later Annual Dynamics (ANNDYN) research projects metadata, Alberta*

### **Experimental design**

The experimental design consisted of plots of 50x50 m subdivided into 50 5x5 m quadrants. Data coming from Hondo monitoring can allow us to state tree questions concerning soil temperature and stands.

```{r fig3, echo = FALSE, out.width = "100%", fig.cap = "Photo from "}
htmltools::img(
				src = "https://github.com/Living-Data-Tutorials/website/blob/main/_lessons/2022-04-07-alberta-trees-tutorial/Location.png?raw=true", height = "500px", width = "100%", `data-external` = "1"
			)
```
*Photo from Seasonal Dynamics (SEADYN) and later Annual Dynamics (ANNDYN) research projects metadata, Alberta*


## **Global and boreal plant diversity distribution.**{data-progressive=TRUE}

How BIG is Canada's Boreal Forest? *Video from Boreal Conservation [https://www.borealconservation.org]*

```{r, include=FALSE,message=FALSE, echo=FALSE}
library(vembedr)
```

```{r, message=FALSE, echo=FALSE}
embed_url("https://www.youtube.com/watch?v=_XjpzlVVdW8")%>%
use_align("center")
```

### **Vascular plant diversity: what it is?**


A key concept in biology is Diversity. If you have wondered what is the connection between species richness of plants versus space  and environment, this is the right place to achieve basic biological and statistical concepts. There are over 352 000 (391 000 according to Jin and Qian, 2019) species of vascular plants in the world. More than 95% of vascular plants are flowering plants, also called angiosperms (e.g. grasses, orchids, maple trees). The other types of vascular plants are gymnosperms (cone-bearing trees, e.g. pine trees, spruce trees) and seedless plants (e.g. ferns, horsetails) (see figure of vascular plants below) . 5111 species of vascular plants have been found in Canada(CESCC, 2010). Such an amazing quantity of types and forms of life definitely invite biologists to wander them selves how diversity works in nature.

```{r fig1, echo = FALSE, out.width = "100%", fig.cap = "Photo from "}
htmltools::img(
				src = "https://github.com/Living-Data-Tutorials/website/blob/main/_lessons/2022-04-07-alberta-trees-tutorial/Vascular_Plants.png?raw=true", height = "500px", width = "100%", `data-external` = "1"
			)
```
*Photos from The Gymnosperm Database and Go Botany (3.7)*

## **Linear model approach linking plant diversity, space and environment.**


Alberta, including 660 000 km2, is a diverse Canadian province. Almost 2000 species of vascular plants have been recorded (almost 1500 native) (Packer and Gould, 2017). In order to reveal how biodiversity connects with space and at least one environmental variable, we will focus on understanding the effect of stands and soil temperature in species richness in 2010 regarding only Hondo stands of this project.Hondo stands are north of Edmonton and east of Lesser Slave Lake, Alberta (AB), Canada (bottom right map panel). 2010 Hondo vascular plant is composed by 131 species. In this sites the maximum number of species found between 1980-2015 was 215. Regarding species diversity, we can state the following questions:

     A. Can we explain vascular diversity regarding soil temperature?

     B. Are stands a better predictor than soil temperature?

     C. Do we need to consider both variables together to understand vascular plant diversity variation?


### **Data exploration**


In order to answer our questions, we will use data from two different datasets from long-term tree and plant surveys in Alberta. We can respond to our questions using data from a specific year. In this tutorial, we will use 2010 data. Now we can better understand the regression line provided by graphing soil temperature and species richness. It basically follows a negative correlation (more species, less soil temperature). Regarding the boxplot graphic, we can see that stand 5 and 6 contain more species than stand 3 and 4. In stands 7 and 8 we can visualize outlines (extreme values).

**Species richness variation by soil temperature (°C)**

```{r,message=FALSE,warning=FALSE}
# Species richness variation by soil temperature (°C)
Hondo_VascularCover_2010_CLEAN<-read.csv("https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/Hondo_VascularCover_2010_CLEAN.csv", sep=";")
Hondo_SoilTemp_2010_CLEAN<- read.csv("https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/Hondo_SoilTemp_2010_CLEAN.csv", sep=";")

SR_SoilTemp <- data.frame(stand=as.factor(Hondo_SoilTemp_2010_CLEAN$stand),
                          SR=Hondo_VascularCover_2010_CLEAN$SR,
                          temp_C=Hondo_SoilTemp_2010_CLEAN$temp_C)

bc <- plot_ly(SR_SoilTemp, x=~temp_C, y=~SR, type = "scatter", size=~SR,
              color = 'Paired')%>% layout(title="",
              xaxis=list(title="Soil temperature (C)", showgrid = F),
              yaxis=list(title="Species richness", showgrid = F))

bc
```

**Species richness variation along stands**

```{r, message=FALSE, echo=FALSE,warning=FALSE}
# Species richness variation along stands
bp <- plot_ly(SR_SoilTemp, y=~SR, color = ~as.factor(stand),symbol="stand",
              type="box",boxpoints = "all",

              jitter = 0.4,pointpos = -1.8) %>%
              layout(title = "",
              xaxis = list(title = "Stand", showgrid = F),
              yaxis=list(title="Species richness", showgrid = F))

bp
```

**Counts of the number of species frequencies in each abundance class**

We already know the patterns among stands, soil temperature, and interaction. But, what do continuous variables (species richness and soil temperature) tell us regarding their frequency distributions?

Species richness clearly follows a normal distribution.

```{r,message=FALSE, echo=FALSE}
SRichness <- data.frame(SR=Hondo_VascularCover_2010_CLEAN$SR)
ab <- table(unlist(SRichness))
barplot(ab, las = 1, # make axis labels perpendicular to axis
        xlab = "Abundance class: species richness", ylab = "Frequency", # label axes
        col = grey(5:0/5)) # 5-colour gradient for the bars
```

**Counts of the number of soil temperature frequencies in each abundance class**

Soil temperature not necessarily follows a normal distribution, but it seems like can assume it.

```{r,message=FALSE, echo=FALSE}
STemp <- data.frame(temp_C=Hondo_SoilTemp_2010_CLEAN$temp_C)
ab <- table(unlist(STemp))
barplot(ab, las = 1, # make axis labels perpendicular to axis
        xlab = "Abundance class: soil temperature", ylab = "Frequency", # label axes
        col = grey(5:0/5)) # 5-colour gradient for the bars
```


### **Model codification: Get's started!**

In order to understand how soil temperature in Celsius and stand (x = independent variable) can affect biodiversity we can create five different models containing Species richness as response variable (y = dependent variable).

```{r,message=FALSE, echo=FALSE}
# Species richness as function of soil temperature (C) alone
M1 <- lm(SR ~ temp_C,data = SR_SoilTemp)
# Species richness as function of stand alone
M2 <-lm(SR ~ stand,data = SR_SoilTemp)
# Species richness as function of soil temperature (C) plus stand, and soil temperature-stand interaction
M3 <- lm(SR ~ temp_C*stand,data = SR_SoilTemp)
# Species richness as function of soil temperature (C) and stand
M4 <- lm(SR ~ temp_C+stand,data = SR_SoilTemp)
# Species richness as function of soil temperature-stand interaction alone
M5 <- lm(SR ~ temp_C:stand,data = SR_SoilTemp)
```

```
# Species richness as function of soil temperature (C) alone
M1 <- lm(SR ~ temp_C,data = SR_SoilTemp)
# Species richness as function of stand alone
M2 <-lm(SR ~ stand,data = SR_SoilTemp)
# Species richness as function of soil temperature (C) plus stand, and soil temperature-stand interaction
M3 <- lm(SR ~ temp_C*stand,data = SR_SoilTemp)
# Species richness as function of soil temperature (C) and stand
M4 <- lm(SR ~ temp_C+stand,data = SR_SoilTemp)
# Species richness as function of soil temperature-stand interaction alone
M5 <- lm(SR ~ temp_C:stand,data = SR_SoilTemp)
```


###  **Model selection**

Regarding  patterns associated to species richness in function of soil temperature and stand, can we use these results to formulate our ecological conclusions? Does putting together both soil temperature and stand can reveal a pattern hidden by modeling both variable independently? We can use AICc and R squared approach to select the best model. Here we can see that model M2 and M4 are the best options following a lm approach with fixed effects. However, M4 model presented a reduced R squared. M2 model, only species richness as function of stand, presented a lower AIC and a higher R squared.

```{r,message=FALSE, echo=FALSE}
R2_adjusted <- c(summary(M1)$adj.r.squared,
                 summary(M2)$adj.r.squared,
                 summary(M3)$adj.r.squared,
                 summary(M4)$adj.r.squared,
                 summary(M5)$adj.r.squared)
r2 <- c(summary(M1)$r.squared,
        summary(M2)$r.squared,
        summary(M3)$r.squared,
        summary(M4)$r.squared,
        summary(M5)$r.squared)
library(MuMIn)
AIC.table  <- MuMIn::model.sel( M1, M2, M3, M4, M5)
AIC.table <- AIC.table[ , c("df", "logLik", "AICc", "delta")]

Model_summary<-data.frame(R2_adjusted,r2,AIC.table)
Model_summary
```
df is the degree of freedom,
logLik is the loglikelihood, and
delta is the AICc difference with the lowest value

Let's use the M2 model use to see how it works species richness in function of stands. We can identify M2 models as the best one according to its lowest AICc and R squared. But, before doing an analysis of variance and a post-hoc mean comparisons we must check the linear regression model assumptions.

```
# Species richness as function of stand
M2 <-lm(SR ~ stand,data = SR_SoilTemp)
```
  </details>


### **Model validation**


**Homogeneity of the variance**

Plot predicted values vs residual values

```{r,message=FALSE, echo=FALSE}
M2 <-lm(SR ~ stand,data = SR_SoilTemp)
plot(resid(M2) ~ fitted(M2),
     xlab = 'Predicted values',
     ylab = 'Normalized residuals')
abline(h = 0, lty = 2)
```

Homogeneity of variance test

```{r,message=FALSE, echo=FALSE}
# Homogeneity of variance test
bartlett.test(SR ~ stand,data = SR_SoilTemp) # We have homogeneity of variance (p-value = 0.1793)
```

There is a homogeneous dispersion of the residuals regarding the graphic and the homogeneity of variance test(p-value = 0.1793). *The assumption is respected!*

**Independence of the model residuals**

Check the independence of the model residuals with stands

```{r,message=FALSE, echo=FALSE}

boxplot(resid(M2) ~ stand, data = SR_SoilTemp,
        xlab = "Stand", ylab = "Normalized residuals")
abline(h = 0, lty = 2)
```

Homogeneous dispersion of the residuals around 0 and no pattern of residuals depending on the variable. *The assumption is respected!!*

**Normality of the model residuals**

Histogram of model residuals
```{r,message=FALSE, echo=FALSE}
hist(resid(M2)) # Histogram of residuals
```

Shapiro test to check residuals normality

```{r,message=FALSE, echo=FALSE}
# Shapiro test to check normality
M2 <-lm(SR ~ stand,data = SR_SoilTemp)
ANOVA_M2=aov(M2)
shapiro.test(ANOVA_M2$residuals) # The errors follow a normal distribution (p-value = 0.2401)
```

The residuals follow a normal distribution regarding the histogram and the Shapiro test(p-value = 0.2401). *The assumption is respected !!!*


### **Model interpretation and visualization**

**Species richness as a function of  stand**

Once we have corroborated the linear regression model assumption we can continue with model interpretation and visualization. There are significant differences among stands (p < 2.2e-16). Stands 3 and 4 are associated with less species richness than stands 5 and 6.

```{r, include=TRUE,message=FALSE, echo=FALSE}

M2 <-lm(SR ~ stand,data = SR_SoilTemp)
ANOVA_M2=aov(M2) # There are significant differences among stands (p < 2.2e-16)
summary(ANOVA_M2)
```

```{r,message=FALSE, echo=FALSE}
ANOVA=aov(M2)
# Tukey test to study each pair of treatment :
TUKEY <- TukeyHSD(x=ANOVA, "stand", conf.level=0.95)

# Group the treatments that are not different each other together.
generate_label_df <- function(TUKEY, variable){

  # Extract labels and factor levels from Tukey post-hoc
  Tukey.levels <- TUKEY[[variable]][,4]
  Tukey.labels <- data.frame(multcompLetters(Tukey.levels)['Letters'])

  # Put the labels in the same order as in the boxplot :
  Tukey.labels$stand=rownames(Tukey.labels)
  Tukey.labels=Tukey.labels[order(Tukey.labels$stand) , ]
  return(Tukey.labels)
}

# Apply the function on the dataset
LABELS <- generate_label_df(TUKEY , "stand")


# A panel of colors to draw each group with the same color :
my_colors <- c(
  rgb(143,199,74,maxColorValue = 255),
  rgb(242,104,34,maxColorValue = 255),
  rgb(111,145,202,maxColorValue = 255),
  rgb(144,108,84,maxColorValue = 255),
  rgb(144,108,84,maxColorValue = 255),
  rgb(143,199,74,maxColorValue = 255),
  rgb(143,199,74,maxColorValue = 255))

# Draw the basic boxplot
a <- boxplot(SR_SoilTemp$SR ~ SR_SoilTemp$stand , ylim=c(min(SR_SoilTemp$SR) , 1.1*max(SR_SoilTemp$SR)) ,
             col=my_colors[as.numeric(LABELS[,1])] , ylab="Species richness" , main="",xlab="Stand")

# It writes the letter over each box. Over is how high letters are written.
over <- 0.1*max( a$stats[nrow(a$stats),] )

#Add the labels
#text( c(1:nlevels(SR_SoilTemp$stand)) , a$stats[nrow(a$stats),]+over , LABELS[,1],
 #     col=my_colors[as.numeric(LABELS[,1])] )
```

Different colours in the stands mean statistical differences.

## **Non-linear model relationship in ecological data**{data-progressive=TRUE}

Run the following code. Then observe the AIC table and the graphic visualization.

```{r Linear_not_linear, exercise=TRUE, exercise.lines = 25, message=FALSE}

# Data
x <- c(1.1,1.2,0.7,3.4,3.6,2.7,5.2,5.3,4.7,7.3,7.5,6.7,9.4,9.1,8.9,11.3,10.7,11)
y_response <- c(0.8,0.7,1.2,9.1,8.7,9.3,25.3,25.8,24.2,50,48.5,51,81.1,80.8,81.2,121.5,121,120.7)
Linear_not_linear <- data.frame(x,y_response)

# Linear model or not?
library(mgcv)
linear_model <- gam(y_response  ~ x, data = Linear_not_linear)
smooth_model <- gam(y_response  ~ s(x), data = Linear_not_linear)
AIC(linear_model, smooth_model)

# Visualization
library(ggplot2)
p1<-ggplot(Linear_not_linear, aes(x = x, y = y_response )) +
  geom_point() +
  geom_line(colour = "red", size = 1.2,
            aes(y = fitted(linear_model))) +
  xlab("x") +
  ylab("y") +
  labs(title = "Linear model or not?")+
  geom_line(colour = "blue", size = 1.2,
            aes(y = fitted(smooth_model))) +
  theme_bw()

p1

```

<details>
  <summary> **Linear model or not?** </summary>
  <p>
  You can see that the AIC and the trend of the smooth GAM justify that adding a smoothing function improves model performance. Non linearity is then supported by these data.
  <p>
## **Exercises**


**Species richness as a function of soil temperature (°C)**

Run the following code and answer the question below. Here, we use model one (M1) corresponding to species richness as a function of soil temperature (C) to verify if there is any relationship between species richness and soil temperature alone. We will consider that model assumption are respected.

```{r M1_plot, exercise=TRUE, exercise.lines = 36, message=FALSE}
# Importing data
Hondo_VascularCover_2010_CLEAN<-read.csv("https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/Hondo_VascularCover_2010_CLEAN.csv", sep=";")
Hondo_SoilTemp_2010_CLEAN<- read.csv("https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/Hondo_SoilTemp_2010_CLEAN.csv", sep=";")

# Merging data
SR_SoilTemp <- data.frame(stand=as.factor(Hondo_SoilTemp_2010_CLEAN$stand),
                          SR=Hondo_VascularCover_2010_CLEAN$SR,
                          temp_C=Hondo_SoilTemp_2010_CLEAN$temp_C)

# Species richness as function of soil temperature (C)
M1 <- lm(SR ~ temp_C,data = SR_SoilTemp)
# Residuals and coefficients of the model
(summ_M1 <- summary(M1)) #
# Simplified ggplot theme
library(ggplot2)
fig <- theme_bw() +
  theme(panel.grid.minor=element_blank(),
        panel.grid.major=element_blank(),
        panel.background=element_blank()) +
  theme(strip.background=element_blank(),
        strip.text.y = element_text()) +
  theme(legend.background=element_blank()) +
  theme(legend.key=element_blank()) +
  theme(panel.border = element_rect(colour="black", fill=NA))
# Plot
plot <- ggplot(aes(temp_C, SR), data = SR_SoilTemp)
Plot_AllData <- plot + geom_point() +
  xlab("Soil temperature (C)") +
  ylab("Species richness") +
  labs(title = "All data") + fig

# Add regression lines with the intercepts specific to each stand

Plot_AllData +
  geom_abline(intercept = 23.8756 ,
              slope     = -0.3346, col = "coral2")


```

Check the graphic, then the p-value of soil temperature (temp_C). Respond to the following question:

```{r quizM1}
quiz(
  question("**What is the relationship beween species richness and soil temperature?**",
    answer("More species richness, more soil temperature"),
    answer("More species richness, less soil temperature", correct = TRUE),
    answer("The relationship is not significant"),
    answer("None of the above answers is correct")
  )
  )
```

Now, let's explore our real species richness and soil temperature data relationship. Follow the next steps:

1. Code by yourself: Use and adapt R code to check linearity.
2. Use code from Hint: Click on Hint to review and paste the code. The run the code.

```{r print-linearityTest, exercise=TRUE,exercise.lines = 25, message=FALSE, exercise.eval=TRUE}

```

```{r print-linearityTest-hint}
# Data
Hondo_VascularCover_2010_CLEAN<-read.csv("https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/Hondo_VascularCover_2010_CLEAN.csv", sep=";")
Hondo_SoilTemp_2010_CLEAN<- read.csv("https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/Hondo_SoilTemp_2010_CLEAN.csv", sep=";")
SR_SoilTemp <- data.frame(stand=as.factor(Hondo_SoilTemp_2010_CLEAN$stand),
                          SR=Hondo_VascularCover_2010_CLEAN$SR,
                          temp_C=Hondo_SoilTemp_2010_CLEAN$temp_C)

# Linear model or not?
library(mgcv)
linear_model <- gam(SR  ~ temp_C, data = SR_SoilTemp)
smooth_model <- gam(SR  ~ s(temp_C), data = SR_SoilTemp)
AIC(linear_model, smooth_model)

# Visualization
library(ggplot2)
Non_linear<-ggplot(SR_SoilTemp, aes(x = temp_C, y = SR )) +
  geom_point() +
  xlab("Soil temperature (C)") +
  ylab("Species richness") +
  labs(title = "Smooth model curve: Linear model or not?")+
  geom_line(colour = "blue", size = 1.2,
            aes(y = fitted(smooth_model))) +
  theme_bw()

Non_linear

Linear<-ggplot(SR_SoilTemp, aes(x = temp_C, y = SR )) +
  geom_point() +
  geom_line(colour = "red", size = 1.2,
            aes(y = fitted(linear_model))) +
  xlab("Soil temperature (C)") +
  ylab("Species richness") +
  labs(title = "Linear model line:Linear model or not?")+
  theme_bw()

Linear
```

```{r quizLinearity}
quiz(
  question("**Can we use linear regression to model species richness and soil temperature relationship?**",
    answer("Non linearity is supported by these data"),
    answer("It is difficul to make conclusions"),
    answer("Linearity is supported by these data", correct = TRUE),
    answer("None of the above answers is correct")
  )
  )
```

**Interaction model**

Situation: we are interested to model the species richness of Hondo as a function of temperature (this is our main predictor of interest) but we don't know if we need to consider the stand as another important variable in the model. Observe the following graphics (M1, M3 and M4 model) and then the r squared and AICc into the summary table. Respond to the questions below.

**Species richness as a function of soil temperature (°C) plus stand, and soil temperature-stand interaction**

```{r, message=FALSE, echo=FALSE,warning=FALSE}
# Species richness as function of soil temperature (C)
M1 <- lm(SR ~ temp_C,data = SR_SoilTemp)
# Simplified ggplot theme
fig <- theme_bw() +
  theme(panel.grid.minor=element_blank(),
        panel.grid.major=element_blank(),
        panel.background=element_blank()) +
  theme(strip.background=element_blank(),
        strip.text.y = element_text()) +
  theme(legend.background=element_blank()) +
  theme(legend.key=element_blank()) +
  theme(panel.border = element_rect(colour="black", fill=NA))
# Plot
plot <- ggplot(aes(temp_C, SR), data = SR_SoilTemp)
Plot_AllData <- plot + geom_point(color="blue") +
  xlab("Soil temperature (C)") +
  ylab("Species richness") +
  labs(title = "M1 model") + fig

Plot_AllData +
  geom_abline(intercept = 23.8756 ,
              slope     = -0.3346, col = "coral2")
```

**Species richness as a function of soil temperature (°C) plus stand, and soil temperature-stand interaction**

```{r, message=FALSE, echo=FALSE,warning=FALSE}
M3 <- lm(SR ~ temp_C*stand,data = SR_SoilTemp)
interact_plot(M3, pred = temp_C, modx = stand, x.label = "Soil temperature (C)", y.label = "Species richness",
              main.title = "M3 model",  legend.main = "Stand",plot.points = TRUE)


```

**Species richness as a function of soil temperature (°C) and stand**

```{r, message=FALSE, echo=FALSE,warning=FALSE}
M4 <- lm(SR ~ temp_C+stand,data = SR_SoilTemp)
interact_plot(M4, pred = temp_C, modx = stand, x.label = "Soil temperature (C)", y.label = "Species richness",
              main.title = "M4 model",  legend.main = "Stand",plot.points = TRUE)
```

```{r,message=FALSE, echo=FALSE}
R2_adjusted <- c(summary(M1)$adj.r.squared,
                 summary(M3)$adj.r.squared,
                 summary(M4)$adj.r.squared)
r2 <- c(summary(M1)$r.squared,
        summary(M3)$r.squared,
        summary(M4)$r.squared)
library(MuMIn)
AIC.table  <- MuMIn::model.sel( M1, M3, M4)
AIC.table <- AIC.table[ , c("df", "logLik", "AICc", "delta")]

Model_summary<-data.frame(R2_adjusted,r2,AIC.table)
Model_summary
```
df is the degree of freedom,
logLik is the loglikelihood, and
delta is the AICc difference with the lowest value

```{r quizInteraction}
quiz(
  question("Regarding R squared and graphics, which is the best model?",
    answer("M3 model",correct = TRUE),
    answer("M1 model"),
    answer("M4 model"),
    answer("All the models are the best")
  )
)
```


### **Reproducibility**

The graphics and results presented in this tutorial were obtained using historical soil temperature data, and vascular diversity datasets from Hondo stands. Data is available at  [https://dataverse.scholarsportal.info/dataset.xhtml?persistentId=doi:10.5683/SP3/PZCAVE]. We imported the original datasets from the Import dataset in R Studio.

```
Hondo_VascularCover_1980_2015 # Historical
str(Hondo_VascularCover_1980_2015)
Hondo_SoilTemp_1980_2010 # Historical soil temperature
str(Hondo_SoilTemp_1980_2010)
```
**Dataset manipulation**

1.Generate a subset of data considering only 2010 data to simplify the statistical analyses. It is important to focus on the heart of species richness ecological concept connected to space and environment.

```
Hondo_VascularCover_2010 <- subset(Hondo_VascularCover_1980_2015,year== "2010" )  # Selecting from one category in rows
Hondo_SoilTemp_2010 <- subset(Hondo_SoilTemp_1980_2010,year== "2010" )
```

2. Save the 2010 subsets data in the computer to clean it and make it proper to work in R.
```
write.csv(x=Hondo_VascularCover_2010,file="Hondo_VascularCover_2010.csv", row.names=FALSE) # Export data in csv format
write.csv(x=Hondo_SoilTemp_2010,file="Hondo_SoilTemp_2010.csv", row.names=FALSE)
```

3. Open the 2010 subsets in excel and order both of them by stand and quad, then corroborate the perfect correspondence in order.

```{r,message=FALSE, echo=FALSE}
Hondo_VascularCover_2010_CLEAN<-read.csv("https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/Hondo_VascularCover_2010_CLEAN.csv", sep=";")
Hondo_SoilTemp_2010_CLEAN<- read.csv("https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/Hondo_SoilTemp_2010_CLEAN.csv", sep=";")

```

4. Generate a new data frame summarizing stand, quadrant, soil temperature and species richness. You can see here that quadrants and stands were merged adequately.

```{r,message=FALSE, echo=FALSE}
SR_SoilTemp <- data.frame(stand=as.factor(Hondo_SoilTemp_2010_CLEAN$stand),
                          stand2=as.factor(Hondo_VascularCover_2010_CLEAN$stand),
                          quad=Hondo_SoilTemp_2010_CLEAN$quad,
                          quad2=Hondo_VascularCover_2010_CLEAN$quad,
                          SR=Hondo_VascularCover_2010_CLEAN$SR,
                          temp_C=Hondo_SoilTemp_2010_CLEAN$temp_C)
```


## **Dendrochonology: what it is?**{data-progressive=TRUE}

Tree Stories: How Tree Rings Reveal Extreme Weather Cycles? *Video from Brigham Young University, Utah*

```{r, message=FALSE, echo=FALSE}
embed_url("https://www.youtube.com/watch?v=xmZO7aRgcW4")%>%
use_align("center")
```

An interesting concept in biology, and more in forestry, is the ring dynamics of trees over time. Dendrochronology is the dating and study of annual rings in trees (see https://ltrr.arizona.edu/about/treerings). Dating and studying annual rings allow us to do inferences in other tree fields of study, for example, linking dendrochronology with the weather. Here, Dendroclimatology studies and uses the growth ring patterns to reconstruct past variations in climate (Fritts. 1987). Since well-defined annual-growth rings can be observed in the wood (rings) from many species of temperate forest trees throughout the world, in certain circumstances, these growth rings contain useful information about varying environmental conditions affecting their growth like temperature changes and humidity as well as tree features (age and size), depending on the species and latitude for what other data analysis (climate data) should be included (Tumajer, J., & Lehejček, J. 2019).


## **Tree ring analysis over time**

Let's plot some graphics. We can plot the average ring width (mm) in axe y in the function of time (year) in axe x (see red line). But, we can also plot the average ring width (mm) in axe y as a function of time (year) in axe x, simultaneously considering the stands (see gray lines). Do you have some ideas about what these trends might be telling us?


**Rings data from Hondo-Slave Lake (hereafter, Hondo) region of Alberta**


```{r AOS_plot, exercise=TRUE, exercise.lines = 37, message=FALSE}

# Libraries
library(tidyverse)
library(readr)
library(dplyr)
library(ggplot2)

# AOS sites
# for BP, JS, LF, ML, OI, SA, WO, WY stands

# Files
aos_files <- c("https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/AOS_Dendrochronology_1983%20-%20BP.csv",
               "https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/AOS_Dendrochronology_1983%20-%20JS.csv",
               "https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/AOS_Dendrochronology_1983%20-%20LF.csv",
               "https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/AOS_Dendrochronology_1983%20-%20ML.csv",
               "https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/AOS_Dendrochronology_1983%20-%20OI.csv",
               "https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/AOS_Dendrochronology_1983%20-%20SA.csv",
               "https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/AOS_Dendrochronology_1983%20-%20WO.csv",
               "https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/AOS_Dendrochronology_1983%20-%20WY.csv"
               )

# Visualization
aos_dt <-
  do.call(rbind,
          lapply(aos_files, read.csv))
summary(aos_dt)
table(aos_dt$stand)
aos_dt %>% group_by(year, stand) %>%
  summarise(avg_ring_width_mm = mean(ring_width_mm),
            n = n()) %>%
  ggplot(aes(x = year, y = avg_ring_width_mm)) +
  geom_point(data = aos_dt, aes(x = year, y = ring_width_mm), shape = 21,
             size= 2, color = 'gray50', alpha = .2) + geom_smooth(aes(group = stand), alpha = .2, color = 'gray40') +
  theme_bw() +
  facet_grid(~stand) +
  geom_smooth(data = aos_dt %>% select(-stand), aes(x = year, y = ring_width_mm),
              color = 'red', linetype = 'dashed')

```


**Rings data from Athabasca Oil Sands (hereafter, AOS) region in northeastern Alberta**

```{r Hondo_plot, exercise=TRUE, exercise.lines = 33, message=FALSE}


# Libraries
library(tidyverse)
library(readr)
library(dplyr)
library(ggplot2)

## same for HONDO
#n=1, 2, 3 stands

# Files
hondo_files <- c("https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/Hondo_Dendrochronology_1983%20-%20STAND%201.csv",
                 "https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/Hondo_Dendrochronology_1983%20-%20STAND%202.csv",
                 "https://raw.githubusercontent.com/Living-Data-Tutorials/website/main/_lessons/2022-04-07-alberta-trees-tutorial/Hondo_Dendrochronology_1983%20-%20STAND%203.csv"
                 )
# Visualization
hondo_dt <-
  do.call(rbind,
          lapply(hondo_files, read.csv)) %>%
  mutate(stand = factor(stand))

hondo_dt %>% group_by(year, stand) %>%
  summarise(avg_ring_width_mm = mean(ring_width_mm),
            n = n()) %>%
  ggplot(aes(x = year, y = avg_ring_width_mm)) +
  geom_point(data = hondo_dt, aes(x = year, y = ring_width_mm, group = stand), shape = 21,
             size= 2, color = 'gray50', alpha = .2) +
  geom_smooth(aes(group = stand), alpha = .2, color = 'gray40') +
  theme_bw() +
  facet_grid(~stand) +
  geom_smooth(data = hondo_dt %>% select(-stand), aes(x = year, y = ring_width_mm),
              color = 'red', linetype = 'dashed')


```

## **Exercises**

<details>
  <summary> **What can we deduce from the graphics?** (Open responses) </summary>
  <p> BINGO!!!:

     1. Tree ring width decreases over time, and patterns change following an oscillating behaviour, suggesting that external (temperature, humidity) and internal (age, latitude, species) factors affect tree growth.

     2.  It seems that stands follow different patterns, perhaps they have a different composition, or why not, they can be more or less diverse in vascular species affecting overall tree growth.

  As you can see, the information observed in the graphics can bring us some insights about what is going on with ring tree dynamics.

</details>

<details>
  <summary> **Multiple choice questions** </summary>
  <p>
```{r quiz1}
quiz(
  question("**What dendrochronology is?**",
    answer("It studies the plant height growth"),
    answer("It studies the growht ring patterns regarding past variation in climate"),
    answer("It studies and dates annual rings in trees", correct = TRUE),
    answer("It studies the variation of tree species over time")
  ),
  question("What does it gray lines mean in graphics used in this tutorial?",
    answer("It is the average ring width trend varying over time"),
    answer("It is the average ring width trend varying over stand"),
    answer("It is the average ring width trend varying over time and stand", correct = TRUE),
    answer("None of the above answers is correct")
  )
)
```
</details>

## **References**

Hesketh, A., Loesberg, J., Bledsoe, E., Karst, J., & Macdonald, E. (2021). Seasonal and annual dynamics of western Canadian boreal forest plant communities: A legacy dataset spanning four decades [Data set]. Scholars Portal Dataverse. https://doi.org/10.5683/SP3/PZCAVE

Canadian Endangered Species Conservation Council(CESCC). 2010. Wild Species 2010: The general status of Species in Canada.

Jin, Y., and Qian, H. 2019. V.PhyloMaker: an R package that can generate very large phylogenies for vascular plants. Ecography, 42: 1353: 1359.

Packer, J.G., and Gould, A.J. 2017. Vascular plants of Alberta, part 1: Ferns, Fern Allies, Gymnosperms, and monocots. University of Calgary Press. 281 pages.

Earle, C.J. 2021.The Gymnossperm Database. Consulted on April 7, 2022:[https://www.conifers.org/zz/gymnosperms.php].
Go Botany (3.7). 2022. Native Plant Trust. Consulted on April 7, 2022: [https://gobotany.nativeplanttrust.org]

Fritts, H. C. (1987). TREE-RING ANALYSISTree-ring analysis. In Climatology (pp. 858–875). Springer US. https://doi.org/10.1007/0-387-30749-4_182

Tumajer, J., & Lehejček, J. (2019). Boreal tree-rings are influenced by temperature up to two years prior to their formation: A trade-off between growth and reproduction? Environmental Research Letters, 14(12), 124024. https://doi.org/10.1088/1748-9326/ab5134

NOAA. . Picture Climate: How Can We Learn from Tree Rings? | National Centers for Environmental Information (NCEI) formerly known as National Climatic Data Center (NCDC). (n.d.). Retrieved 8 April 2022, from https://www.ncdc.noaa.gov/news/picture-climate-how-can-we-learn-tree-rings

Rivet, A., Payette, S., Berteaux, D., & Girard, F. (2017). Pines and porcupines: A tree-ring analysis of browsing and dynamics of an overmature pine forest. Canadian Journal of Forest Research, 47, 257–268. https://doi.org/10.1139/cjfr-2016-0214

## **Glossary**

## **Français**

Version en français disponible sur https://rtrejo.shinyapps.io/Biodiversite_Arbres_Alberta