^{1}

^{*}

^{1}

^{2}

^{3}

^{1}

We used Species Distribution Modeling to predict the probability of Iberian pine (

Tree species distribution is mainly determined by climate and soil (

The Iberian pine (

Nowadays, forest harvesting has been reduced but the main risk faced by these populations is climate change. Previous studies have shown that black pine is sensitive to an increase in temperature and decrease in precipitation (

Species Distribution Modeling (SDM) forecasts the probability of species occurrences in response to environmental variables, and could predict the future habitat suitability of species under climate change scenarios (

The modeling of the distribution of endangered species, like

The study areas are located in Andalusia, southern Spain, a characteristic region of the Mediterranean Basin located in the south of the Iberian Peninsula (Fig. S1 in Supplementary material). The climate in this region is typically Mediterranean, with hot and dry summers and precipitation concentrated in spring and autumn. Forests of

Information on species occurrences and bioclimatic variables was compiled from the Andalusian Regional Government database (

The climate data consisted of 29 environmental variables (Tab. S1 in Supplementary material) averaged for the period 1960-2000 and the future predictions were carried out for three periods: 2011-2040; 2041-2070, and 2071-2099, following regional scenarios (

The initial 29 environmental variables were reduced by the variable inflation factor (VIF<10 -

Variable importance was quantified using the Variable Importance function within biomod2, which is implemented by calculating the Pearson’s correlation (

To deal with model selection, we used ensemble models from 10 SDM algorithms (

We used ensemble models calculated using the mean, median, coefficient of variation, upper and lower confidence interval, committee averaging and probability mean weight decay of the single model predictions (

The consensus methods employed in this study make up a representative sample of the most commonly used techniques. Four methods (Median, Mean, Coefficient of Variation, and Confidence Intervals) are based on their statistical functions, whereas the Probability Mean Weight Decay (WD) and Committee Averaging (CA) methods preselect the single models based on certain predefined criteria. In WD, half of the single-model outputs are preselected on the basis of the True Skills Statistic (TSS) values. The selected single models are combined using an averaging function. In CA the probabilities of the selected models are first converted into binary, according to the maximal TSS threshold, and the CA score is calculated as the average of the binary predictions (

We randomly divided our dataset into two subsets, which we used to train the model (70% of the data) and to evaluate it independently (the remaining 30%). This yielded 100 different fits per model, each with its own accuracy indicator, mean, and quantiles of model accuracy calculated. Models with higher mean values and smaller variations were considered as being the most accurate ones (

Model performance was evaluated by the AUC of the Receiver Operating Characteristic plot, Kappa, and TSS. The AUC represents the models’ discriminative capacity with regard to the data and is obtained by plotting the commission error (1-specificity; false positives) on the horizontal axis,

Cohen’s Kappa (κ) was derived from the 2 × 2 confusion matrix to measure the rate of agreement between actual and predicted values in the spatial space for categorical Kappa values; however, the matrix depends on the defined threshold for presence. Values of κ near to 0.5 indicate no discrimination capacity (random agreement), whereas a value of 1 represents the perfect discrimination model (

The TSS is concerned with omission and commission errors and is also prevalence independent. It ranges from -1 to +1, where +1 indicates perfect agreement and <0 indicates a random performance. The TSS measures the difference between the actual agreement and the randomly expected agreement; it is particularly useful for the modeling of rare species with limited point locations and it can be used to compare different modeling techniques. The TSS is defined as (

Finally, Kappa and TSS were calculated and evaluated, considering a threshold equal to prevalence (

We generated continuous probabilistic maps with values of between 0 and 1 for each grid point, for the present and future habitat distributions of

The results of the variable selection highlighted that 17 predictors out of 29 were affected by collinearity problems (VIF<10), of which eight were selected with high correlations with the first two principal components of the PCA and six were highlighted by AIC model selection. The final set of the six variables selected included the: average reference evapotranspiration (ETO); aridity index (IAR); average number of days with a minimum temperature equal to or below 0 °C (NDF); annual precipitation (PRC); annual sum of the positive differences between precipitation and reference evapotranspiration (SSUP); and average snow precipitation (SNOW -

Our results show that the probability of

The single model predictions were compared by their accuracy as given by TSS, Kappa, and AUC; overall, these showed good model accuracy (

All ensemble models presented TSS>0.77, Kappa> 0.90, and AUC> 0.98, the ensemble models committee averaging (CA) and probability mean weight decay (WD) being the best ones (

The ensemble SDM of

In the forecasted habitat suitability, the general probability of

The future predictions forecast a reduction in the potential habitat of

Species distribution models are a simple yet efficient statistics-based method used to map the spatial range of species and to forecast climate change impacts on species ranges (

The six main environmental variables used, listed in order of importance, were: NDF, PRC, IAR, SSUP, ETO, and SNOW (

The dependence of

The biomod2 software has been used to predict the habitat of forest species of different forms and parameterizations (

Recent studies have predicted a reduction in

Our results suggest a dramatic reduction in locations representing a suitable habitat for

Alternatively, the populations could extend northwards to central Spain in response to future conditions, and we assumed here that the species fully achieves its potential changes in distribution (meaning that we did not assume any dispersal limitation). That is, however, an unlikely and optimistic assumption. The fact that

However, projections of potential future distributions also need to be interpreted with caution. Even if the models presented in this study are quite accurate and are commonly used to assess the impact of global change (

Our results have serious implications for species adaptation and forest management (

Given the accumulating evidence of recent climate changes, an evaluation of the current and future distributions of

We acknowledge the financial support of ESPECTRAMED (CGL2017-86161-R), University of Córdoba - Campus de Excelencia CEIA_{3}, and AEMET (

RMNC conceived the study, analyzed data, performed research, and wrote the paper; JDL analyzed data and contributed new methods; RDM conceived the study and performed research; RSS and GPR wrote the paper.

Response curve of the ensemble model Committee Averaging for the explanatory variables: average reference evapotranspiration (ETO), aridity index (IAR), average number of days with a minimum temperature equal to or below 0 °C (NDF), annual precipitation (PRE), annual sum of the positive differences between precipitation and reference evapotranspiration (SSUP), and average snow precipitation (SNOW).

Adjustment values obtained for 10 different distribution models in Andalusia, including Artificial Neural Networks (ANN), Boosted Regression Trees (BRT), Classification and Regression Tress (CART), Flexible Discriminate Analysis (FDA), Generalized Additive Models (GAM), Generalized Linear Models (GLM), Multivariate Adaptive Regression Splines (MARS), Maximum Entropy (MaxEnt), Random Forest (RF), and Surface Range Envelop (SRE).

Present probability of occurrence of

Percentage loss of the area of habitat suitability of

Variable importance ranking and parametric characterization of

Rank | Variable | Min | 1^{st} Qtl |
Mean | 3^{rd} Qtl |
Max | Units |
---|---|---|---|---|---|---|---|

1 | Average number of days with a minimum temperature equal to or below 0 °C (NDF) | 35.30 | 65.40 | 76.30 | 90.00 | 158.90 | days |

2 | Annual precipitation (PRC) | 306.00 | 558.00 | 760.00 | 919.00 | 1513.00 | mm |

3 | Aridity index (IAR) | 35.00 | 94.00 | 116.00 | 160.00 | 296.00 | - |

4 | Annual sum of the positive differences between precipitation and reference evapotranspiration (SSUP) | 1.00 | 142.00 | 312.00 | 453.00 | 1046.00 | mm |

5 | Average reference evapotranspiration (ETO) | 92.00 | 840.00 | 891.00 | 934.00 | 1087.00 | mm |

6 | Average snow precipitation (SNOW) | 27.00 | 134.00 | 266.00 | 446.00 | 1170.00 | mm |

- | Aspect (ASPECT) | -453.00 | -27.00 | 8.00 | 49.00 | 333.00 | degree |

- | Sum of water balances at the end of each month (BH) | 1.00 | 685.00 | 1894.00 | 3011.00 | 7707.00 | mm |

- | Digital elevation model (DEM) | 7.00 | 1164.00 | 1423.00 | 1705.00 | 2995.00 | m |

- | Average net primary production (DF) | 182.00 | 1223.00 | 1928.00 | 2471.00 | 3371.00 | hours |

- | Average number of days with a maximum temperature equal to or above 35 °C (NDC) | 0.10 | 5.70 | 10.20 | 16.20 | 44.40 | days |

- | Annual radiation (RN) | 26.00 | 95.00 | 103.00 | 107.00 | 126.00 | Julian m^{-2} |

- | Annual sum of the negative differences between precipitation and reference evapotranspiration (SDEF) | 84.00 | 394.00 | 440.00 | 506.00 | 657.00 | mm |

- | Slope (SLOPE) | 1.00 | 11.00 | 17.00 | 24.00 | 100.00 | % |

- | Average maximum temperature (T_MAX) | 12.20 | 16.60 | 17.60 | 18.60 | 22.20 | °C |

- | Average mean temperature (T_MED) | 4.10 | 10.20 | 11.20 | 12.20 | 15.60 | °C |

- | Average minimum temperature (T_MIN) | -4.50 | 3.70 | 5.00 | 6.10 | 9.50 | °C |

- | Maximum of the monthly average maximum temperatures (TMAXC) | 24.40 | 28.80 | 30.00 | 31.10 | 34.70 | °C |

- | Average maximum temperature of all months (TMC) | 14.30 | 20.20 | 21.20 | 22.50 | 26.00 | °C |

- | Average minimum temperature of all months (TMF) | -1.80 | 3.10 | 4.00 | 4.90 | 8.50 | °C |

- | Minimum of the monthly average minimum temperatures (TMINF) | -8.40 | -2.10 | -1.00 | -0.10 | 3.30 | °C |

Mean variable importance values for 100 runs, for each selected variable and the habitat prediction model techniques. Generalized Linear Models (GLM), Generalized Additive Models (GAM), Classification and Regression Trees (CART), Flexible Discriminate Analysis (FDA), Artificial Neural Networks (ANN), Multivariate Adaptive Regression Splines (MARS), Maximum Entropy Modelling (MaxEnt), Random Forest (RF) and Boosted Regression Trees (BRT) and one similar to BIOCLIM, Surface Range Envelop (SRE). Variable description in

Variable | ANN | BRT | CART | FDA | GAM | GLM | MARS | MaxEnt | RF | SRE |
---|---|---|---|---|---|---|---|---|---|---|

ETO | 0.19 | 0.02 | 0.03 | 0.06 | 0.30 | 0.18 | 0.28 | 0.01 | 0.03 | 0.31 |

IAR | 0.07 | 0.01 | 0.03 | 0.06 | 0.12 | 0.04 | 0.13 | 0.01 | 0.04 | 0.11 |

NDF | 0.73** | 0.83** | 0.90** | 0.96** | 0.67* | 0.82** | 0.73* | 0.75* | 0.58* | 0.57* |

PRC | 0.56* | 0.00 | 0.03 | 0.98** | 0.99** | 0.80** | 0.97** | 0.01 | 0.02 | 0.08 |

SNOW | 0.24 | 0.00 | 0.05 | 0.38 | 0.39 | 0.32 | 0.39 | 0.00 | 0.02 | 0.07 |

SSUP | 0.07 | 0.00 | 0.01 | 0.03 | 0.04 | 0.04 | 0.03 | 0.07 | 0.03 | 0.03 |

Pearson’s correlation matrix of the concordance of the different spatial distribution models analyzed: Generalized Linear Models (GLM), Generalized Additive Models (GAM), Classification and Regression Trees (CART), Flexible Discriminate Analysis (FDA), Artificial Neural Networks (ANN), Multivariate Adaptive Regression Splines (MARS), Maximum Entropy Modelling (MaxEnt), Random Forest (RF) and Boosted Regression Trees (BRT) and one similar to BIOCLIM, Surface Range Envelop (SRE). All correlation were significant with p<0.001 (2-tailed).

Model | ANN | BRT | CART | FDA | GAM | GLM | MARS | MaxEnt | RF | SRE |
---|---|---|---|---|---|---|---|---|---|---|

ANN | 1.000 | - | - | - | - | - | - | - | - | - |

BRT | 0.885 | 1.000 | - | - | - | - | - | - | - | - |

CART | 0.843 | 0.966 | 1.000 | - | - | - | - | - | - | - |

FDA | 0.890 | 0.888 | 0.865 | 1.000 | - | - | - | - | - | - |

GAM | 0.920 | 0.832 | 0.782 | 0.856 | 1.000 | - | - | - | - | - |

GLM | 0.891 | 0.878 | 0.841 | 0.862 | 0.971 | 1.000 | - | - | - | - |

MARS | 0.831 | 0.876 | 0.828 | 0.844 | 0.873 | 0.875 | 1.000 | - | - | - |

MaxEnt | 0.821 | 0.881 | 0.856 | 0.751 | 0.762 | 0.799 | 0.872 | 1.000 | - | - |

RF | 0.921 | 0.950 | 0.912 | 0.867 | 0.851 | 0.889 | 0.889 | 0.801 | 1.000 | - |

SRE | 0.603 | 0.666 | 0.651 | 0.623 | 0.588 | 0.601 | 0.620 | 0.685 | 0.642 | 1.000 |

Adjustment values obtained with the ensemble models of

Ensemble model | Kappa | TSS | AUC | Sensitivity | Specificity | Threshold |
---|---|---|---|---|---|---|

Mean | 0.786 | 0.904 | 0.986 | 0.9725 | 0.9313 | 0.869 |

Lower Confident interval | 0.786 | 0.904 | 0.986 | 0.9714 | 0.9332 | 0.846 |

Upper Confident interval | 0.784 | 0.904 | 0.986 | 0.9735 | 0.9311 | 0.881 |

Median | 0.774 | 0.903 | 0.983 | 0.9724 | 0.9307 | 0.913 |

Committee averaging | 0.802 | 0.905 | 0.987 | 0.9756 | 0.9298 | 0.800 |

Probability mean weight decay | 0.786 | 0.904 | 0.986 | 0.9725 | 0.9319 | 0.806 |

Tab. S1 - Environmental data used for the habitat prediction of

Tab. S2 - Total (ha) and percentage (in parentheses) area loss, relative to the values for 1961-2000, of

Fig. S1 - Iberian (a), Spanish (b), and Andalusian (c) distribution of

Fig. S2 - Future