Wilke et al. immigration and (local) extinction dynamics of non-endemic diatoms

Three hundred and eighty diatom samples were analysed from the DEEP sediment succession, taken every 128 cm (successive samples are around 2,000–4,000 years apart) between 0 and 406.96 m c.d. and every 64 cm (around 2,000 years) between 406.96 and 446.65 m c.d. The higher sampling frequency in the early phase of Lake Ohrid was chosen because this period was environmentally very dynamic (Wagner et al. 2017, Panagiotopoulos et al. 2020). For comparison, we also analysed 123 samples from the phase before 1.36 Myr ago (446.65–583.92 m c.d.).

# Load required packages
library(coda)
library(simDES)

Input for richness analysis of non-endemic species

First part of processing the supplementary data is the same than for the endemic diatoms

# Read data
Diatoms <- read.table("Data/Supplementary material_Diatoms.csv", 
                      sep = "\t", header = TRUE, check.names = FALSE,
                      stringsAsFactors = FALSE)
# Exclude the endemic species we haven't found
Diatoms <- Diatoms[, -c(373:421)]
# Data cleaning
TaxonomyLifestyle <- Diatoms[1:12, -c(1:4)]
Diatoms <- Diatoms[-c(1:12), - c(1:3, 5)]
# Convert to numeric
Diatoms <- as.data.frame( apply(Diatoms, 2, as.numeric) ) 
# PyRate does not like special characters for species names:
SpeciesChangeNames <- data.frame(Species = colnames(Diatoms)[-1], 
                                 SpeciesPyRate = sprintf("SP_%03d", 1:(ncol(Diatoms)-1)),
                                 stringsAsFactors = FALSE)
colnames(Diatoms) <- c("Age", SpeciesChangeNames$SpeciesPyRate)
TaxonomyLifestyle <- rbind(TaxonomyLifestyle, rep(NA, ncol(TaxonomyLifestyle)))
TaxonomyLifestyle[13, ] <- c("PyRateCode", SpeciesChangeNames$SpeciesPyRate)

For the 202 non-endemic species that were found 1.36−0 Myr ago, we did not use a birth-death model because the taxa are likely to have originated elsewhere, thus making their first appearance in Lake Ohrid through dispersal, not speciation. Using taxa occurrences binned into intervals of 0.3 Myr, their richness through time was reconstructed by estimating per-lineage dispersal (d), local extinction (µ) and sampling (q) rates by a variant of the PyRateDES model for fossil biogeography (Silvestro et al. 2016), assuming dispersal into Lake Ohrid from a source pool of constant size.

SpNonEndemic <- colnames(Diatoms)[-1][TaxonomyLifestyle[12, -1] == "Non-endemic"]
WidespreadPresent <- colnames(Diatoms)[-1][TaxonomyLifestyle[11, -1] == "Extant" & 
                                           TaxonomyLifestyle[12, -1] == "Non-endemic"] 
DiaDes <- Diatoms[, c("Age", SpNonEndemic)]
DiaDes <- rbind(rep(0, ncol (DiaDes)), DiaDes)
DiaDes[1, colnames(DiaDes) %in% WidespreadPresent] <- 1 # Add extant species
DiaDes <- DiaDes[which(DiaDes$Age < 1365), ]
DiaDes <- DiaDes[, colSums(DiaDes) > 0] # Exclude species without counts
TimeDes <- seq(0, max(DiaDes$Age)+0.1, length.out = 50)
IntTimeDes <- findInterval(DiaDes$Age, TimeDes)
IntervalIndex <- unique(IntTimeDes)
BinnedDiaDes <- sapply(IntervalIndex, function(x) colSums(DiaDes[IntTimeDes == x, -1]))
BinnedDiaDes <- BinnedDiaDes[, ncol(BinnedDiaDes):1]
SpecimensBinnedDiaDes <- colSums(BinnedDiaDes)
BinnedDiaDesAbun <- t( apply(BinnedDiaDes, 1, function(x) x/SpecimensBinnedDiaDes) )
BinnedDiaDes <- ifelse(BinnedDiaDes == 0, 2, 3) # 2: Absent 3:Present in Lake Ohrid
BinnedDiaDes <- data.frame(scientificName = rownames(BinnedDiaDes), BinnedDiaDes)
colnames(BinnedDiaDes)[-1] <- rev(TimeDes)[-1] / 100
write.table(BinnedDiaDes, file = "Results/DES/DESin.txt", 
            sep = "\t", row.names = FALSE, quote = FALSE, na = "NaN")

We extended the PyRateDES implementation by adding a Gamma distributed heterogeneity in sampling (mirroring the model we used in the BDS analyses), continuous time-variable dispersal and local extinction rates, and parameter estimation via maximum likelihood (available on https://github.com/dsilvestro/PyRate). The same shifts in sampling rate as for the endemic species and an exponential relationship of dispersal and extinction rates with time were specified.

Time <- data.frame(Time = seq(13.7, 0, length.out = 138),
                   Cov = seq(0, 1, length.out = 138))
write.table(Time, "Results/DES/Time.txt",
            sep = "\t", quote = FALSE, row.names = FALSE, col.names = TRUE)
# Maximize likelihood
system(paste0("python2 PyRate/PyRateDES2.py",
              " -d Results/DES/DESin.txt",
              " -mG ", # Preservation heterogeneity
              " -qtimes 11.5 8.66 4.55", # Shifts in sampling rate
              " -data_in_area 1", # Observations only in Ohrid
              " -varD Results/DES/Time.txt", # Time-variable dispersal
              " -varE Results/DES/Time.txt", # Time-variable (local) extinction
              " -A 3")) # Maximum likelihood
DesResult <- read.table("Results/DES/DESin_0_q_4.55_8.66_11.5_Dexp_Eexp_G_Mlsbplx.log",
                        header = TRUE, sep = "\t")
DesResult[, c("likelihood", "d21_t0", "e1_t0", "cov_d21", "cov_e1")]

##   likelihood    d21_t0     e1_t0    cov_d21    cov_e1
## 1  -3087.139 0.6258339 0.2556223 -0.7241003 -2.944525

Using the Maximum likelihood estimates as start parameters for a Bayesian inference informed on the uncertainty in dispersal, local extinction and sampling rates after sampling every 100th of 50,000 MCMC iterations and omitting the first 1,000 iterations as burn-in.

Due to computational limits we will run less iterations for this reproducible example.

system(paste0("python2 PyRate/PyRateDES2.py",
              " -d Results/DES/DESin.txt",
              " -mG ", # Preservation heterogeneity
              " -qtimes 11.5 8.66 4.55", # Shifts in sampling rate
              " -data_in_area 1", # Observations only in Ohrid
              " -varD Results/DES/Time.txt", # Time-variable dispersal
              " -varE Results/DES/Time.txt", # Time-variable (local) extinction
              " -n 2501 -s 100 -p 1000"))

Cov <- Time
Cov[, 2] <- Cov[, 2] - mean(Cov[, 2])

DesBi <- read.table("Results/DES/DESin_0_q_4.55_8.66_11.5_Dexp_Eexp_G.log",
                    header = TRUE, sep = "\t")
DesBi <- DesBi[DesBi$it >= 500, ] # Burn-in
DisThroughTimeBi <- apply(DesBi[, c("d21_t0", "cov_d21")], 1, function(x) x[1] * exp(x[2] * Cov$Cov))
DisThroughTimeBi <- 10 * DisThroughTimeBi
ExThroughTimeBi <- apply(DesBi[, c("e1_t0", "cov_e1")], 1, function(x) x[1] * exp(x[2] * Cov$Cov))
ExThroughTimeBi <- 10 * ExThroughTimeBi
Prob <- seq(0.95, 0.05, by = -0.05)
ExThroughTimeHpd <- DisThroughTimeHpd <- vector(mode = "list", length(Prob)) 
for(i in 1:length(Prob)){
  DisThroughTimeHpd[[i]] <- apply(DisThroughTimeBi, 1, 
                                  function(x) HPDinterval(coda:::mcmc(x), prob = Prob[i]))
  ExThroughTimeHpd[[i]] <- apply(ExThroughTimeBi, 1, 
                                 function(x) HPDinterval(coda:::mcmc(x), prob = Prob[i]))
} 

par(las = 1, mar = c(5.1, 4.5, 0.1, 0.8))
plot(1, 1, type = "n",
     xlim = c(14, 0), ylim = c(0, 12), xaxs = "i", xaxt = "n",
     ylab = expression(paste("Rate (events ", lineage^{-1}, " ", Myr^{-1}, ")")), 
     xlab = "Age (Myr ago)")
axis(side = 1, at = seq(14, 0, by = -2), labels = seq(1.4, 0, by = -0.2))
for(i in 1:length(DisThroughTimeHpd)){
  polygon(c(Time$Time, rev(Time$Time)), 
          c(DisThroughTimeHpd[[i]][1, ], rev(DisThroughTimeHpd[[i]][2, ])), 
          col = adjustcolor("dodgerblue", alpha = 1/length(Prob)), border = NA)
}
for(i in 1:length(DisThroughTimeHpd)){
  polygon(c(Time$Time, rev(Time$Time)), 
          c(ExThroughTimeHpd[[i]][1, ], rev(ExThroughTimeHpd[[i]][2, ])), 
          col = adjustcolor("red", alpha = 1/length(Prob)), border = NA)
}
lines(Time$Time, apply(DisThroughTimeBi, 1, median), col = "dodgerblue", lwd = 2)
lines(Time$Time, apply(ExThroughTimeBi, 1, median), col = "red", lwd = 2)
legend("topright", lty = c(1,1), col = c("dodgerblue", "red"), 
       legend = c("immigration", "(local) extinction"), bty = "n")

We then inferred the average non-endemic richness through time by running 1,000 stochastic diversity simulations based on the estimated dispersal, local extinction and sampling parameters, using the R package ‘simDES’ 0.1 (Hauffe 2019).

Due to computational limits we will only perform 100 stochastic diversity simulations here.

Boot <- boot_DES(100, Time = 13.7, Step = 0.1, BinSize = 0.5, Nspecies = 202,
                 # ML parameters
                 SimD = c(0, DesResult$d21_t0), 
                 SimE = c(DesResult$e1_t0, 0), 
                 SimQ = c(1, 1),
                 VarD = c(0, DesResult$cov_d21), 
                 VarE = c(DesResult$cov_e1, 0), 
                 Covariate = Time,
                 # First time-bin with almost perfect preservation
                 # and thus we start with the observed richness of 91 species
                 Observation = data.frame(time = rep(13.7, 202),
                                          area = c(rep(2, 111), rep(3, 91))),
                 DataInArea = 1, 
                 ConfInt = seq(0.95, 0.5, by = -0.05),
                 Ncores = 2)

par(las = 1, mar = c(5.1, 4.5, 0.1, 0.8))
plot(1, 1, type = "n",
     xlim = c(1.4, 0), ylim = c(50, 200), xaxs = "i",
     ylab = "Non-endemic species richness (N)", 
     xlab = "Age (Myr ago)")
LenCi <- length(Boot[[3]])
TimeSim <- as.numeric(rownames(Boot[[2]])) / 10
for (i in 1:LenCi) {
  polygon(c(TimeSim, rev(TimeSim)),
          c(Boot[[3]][[i]][[3]][, 1], rev(Boot[[3]][[i]][[3]][, 2])),
          col = adjustcolor("magenta", alpha = 0.5/LenCi),
          border = NA)
}
lines(TimeSim, Boot[[2]][, 3], col = "magenta")

Safe non-endemic richness as biotic covariate for the drivers of diversification.

TimeSeq <- seq(13.66, 0, length.out = 2000)
DivAppr <- approx(TimeSim, Boot[[2]]$MeanSimDivAB, xout = TimeSeq)$y
NonEnd <- data.frame(time = TimeSeq, NonEnd = DivAppr)
write.table(NonEnd, 
            file = "Results/Covariates/NonEnd.txt", 
            sep = "\t", row.names = FALSE, quote = FALSE)

Plattform and resources

Sys.time()

## [1] "2020-01-03 18:33:13 CET"

sessionInfo()

## R version 3.6.2 (2019-12-12)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] simDES_0.1.0 coda_0.19-3  knitr_1.25  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.2        codetools_0.2-16  mvtnorm_1.0-11   
##  [4] msm_1.6.7         lattice_0.20-38   foreach_1.4.7    
##  [7] digest_0.6.21     grid_3.6.2        magrittr_1.5     
## [10] evaluate_0.14     rlang_0.4.0       stringi_1.4.3    
## [13] doParallel_1.0.15 Matrix_1.2-17     rmarkdown_1.15   
## [16] splines_3.6.2     iterators_1.0.12  tools_3.6.2      
## [19] stringr_1.4.0     parallel_3.6.2    survival_2.44-1.1
## [22] xfun_0.10         yaml_2.2.0        compiler_3.6.2   
## [25] htmltools_0.4.0   expm_0.999-4