vol. 10, pp. 348-352
Copyright © 2016 by the Italian Society of Silviculture and Forest Ecology
doi: 10.3832/ifor1884-009

Short Communications

# Determining Pleiades satellite data capability for tree diversity modeling

Hassan Akbari, Siavash Kalbi

# Introduction

The exponential increase of the world population in the last decades has brought about an unparalleled human exploitation of natural resources worldwide, leading to a global reduction of the naturalness of many environments. This may result in the reduction of biodiversity as well as environmental functions and ecological processes which act to generate and maintain soils, convert solar energy into plant tissue, regulate climatic parameters, and provide multiple forest products ([14]). The Hyrcanian forest is the natural ecosystem with the highest plant and animal diversity in Iran. However, these forests are being more and more impoverished by degradation and conversion to other land uses. Moreover, increasing and maintaining the structural diversity in forest stands has become an important forest management strategy to face the climate change.

Accurate and practical methods for estimating biodiversity are needed to develop effective strategies for the conservation and management of forests ([12]). In recent years, the increased spatial and spectral resolution of remote sensors has made it increasingly feasible to conduct direct mapping of biodiversity through mapping plant and tree canopies and assemblages, and in some cases, through the identification of individual species of trees. Recent studies have indicated that remote sensing has a high capability of providing useful information on biodiversity ([12], [17], [24], [22]). Numerous remotely-detected parameters can be used as proxies of the occurrence, distribution and abundance of species. Remotely sensed data allow to quantify net primary productivity (NPP) through the Normalized Difference Vegetation Index (NDVI). This index has shown to be either positively or negatively associated with NPP, depending on the scale ([31]). Also, NDVI has been used in several studies for modeling species occurrence ([18]) or species richness at regional and local scales ([25], [7]). A further factor related to species richness is the structural and compositional complexity of habitats, which can be measured by either the variability of spectral indices (such as NDVI) or a band in the immediate neighborhood of each sampling unit, or using approaches that calculate spectral variability using multiple bands ([28], [26]). Such a spectral variability is directly related to the heterogeneity of resource distribution (mixed habitats), which may be assessed by analyzing the variability in the reflectance values among pixels using the texture of a remotely sensed image ([10]). Mohammadi & Shataee ([22]) investigated the possibility to estimate tree diversity using Landsat Enhanced Thematic Mapper Plus (ETM+) satellite data in the Hyrcanian forests of the Golestan Province in Iran. Kalbi et al. ([16]) compared two nonparametric methods such as the classification and regression tree (CART) and Random forest (RF) for predicting tree diversity distribution using the High Resolution Geometric sensor (HRG) of earth-observing satellites (SPOT-HRG) data. Bawa et al. ([4]) reported a statistically significant relationship between species diversity and the normalized difference vegetation index (NDVI) derived from Indian Remote Sensing Satellite (IRS-1C) imagery, so that NDVI may be used to characterize areas of high and low tree species richness in tropical forests where biodiversity loss is high. Moreover, regression analysis approaches have been broadly applied in ecological surveys ([19], [5], [32]). Regression approaches have proven particularly useful in modeling the spatial distribution of species and communities ([9]). The development of advanced nonparametric regression and machine learning techniques are opening up many opportunities for modeling tree diversity of greater accuracy compared with linear regression ([2]). Recently, generalized linear models ([19]) and generalized additive models ([11]) using presence-absence survey data have also been given much more attention. Presence-absence niche models use algorithms that can model the presence or the absence of a species as binary response. Moisen & Frescino ([23]) investigated the performance of non-parametric techniques such as CART, generalized additive models (GAM) and artificial neural networks (ANN) and compared them to parametric techniques for the prediction of several species-independent forest characteristics in the interior western United States. MARS and ANN models seemed to work best when applied to simulated data, though their application to real data appeared less suitable, in which case a linear modeling (LM) approach often provided comparable results. Overall, GAMs and MARS were marginally best for modeling forest characteristics.

The aim of this study was to: (i) investgate the relationships between field-based tree diversity and the spectral and textural features of remote-sensed data (multispectral images of Pleiades satellite); (ii) compare two statistical non-parametric techniques (GAM and MARS) for modeling tree species diversity.

# Materials and methods

## Study area

The study area is located within the Hyrcanian forests, District 1 of Darabkola’s forests, Sari, northern Iran (lat. 36° 28’ - 36° 33′ N, long. 53° 16′ - 53° 20′ W - Fig. 1). The Darabkola forest covers about 2.600 hectares and consists of natural temperate and uneven aged stands. The main tree species are Quercus castaneafolia (chestnut-leaved oak), Carpinus betulus (hornbeam), Acer velutinum (velvet maple), Alnus subcordata (Caucasian alder), Tilia begonifolia (linden tree), Parrotia persica (Persian parotia), Ulmus glabra (wych elm), Acer platanoides (Norway maple), Diospyros lotus (date pulm), Zelkova carpinifolia (Siberian elm), Fagus orientalis (Oriental beech) and Acer cappadocicum (coliseum maple).

Fig. 1 - Location of the study area in the Mazandaran Province, northern Iran (left panel) and distribution of the sample plots in the study area (right panel).

## Field data

As species richness and diversity indices depend on the size of the sample plot, phytosociological data were collected based on a systematic sampling method during the period 5 June to 15 July 2010. The size and the number of quadrats were determined based on the species area curve ([21]). The choice of the sample size and the number of sampling units to select is a key part of planning a survey. One hundred square-shaped sample plots (60 × 60 m) were placed over the study area using a stratified systematic sampling and an overlying grid of 450 × 500 m (see Fig. 1, right panel). For each plot, the main characteristics (species, tree health, etc.) of trees with a diameter at breast height (DBH) greater than 7.5 cm were measured. The geographic coordinates of each plot center were recorded using a Trimble 3 DGPS receiver.

The Simpson’s diversity index (D), the Shannon’s diversity index (H′), and the reciprocal of Simpson’s diversity index (1/D) were calculated for each sampling plot based on the proportion of tree species recorded during the field survey. Other indices commonly used for describing the forest structural diversity or the dissimilarity of species across the landscape (e.g., mingling index, coefficient of segregation, etc. - [27], [3], [13]) were not considered in this study.

## Preprocessing and processing of satellite images

Multispectral images of the Pleiades satellite (Airbus Defence and Space, Munich, Germany - ⇒ http:/­/­www.­intelligence-airbusds.­com/­pleiades/­) were acquired in April 2013. All images had a 16-bit radiometric resolution. Geometric correction and orthorectification were applied to images before their use. The geometrical correction of the images was optimized by comparing the image data with vector layers of the roads in the studied area.

Pleiades satellite images have four spectral bands (Blue, Red, Green and NIR) with a spatial resolution of 2 m and a panchromatic (PAN) band with a resolution of 0.5 m. In this study, we considered the aforementioned four spectral bands, the PAN band, the Normalized Difference Vegetation Index (NDVI), and the derived texture features of these bands as predictors in the model analysis (Tab. 1).

Tab. 1 - Overview of the predictor variables selected by the tree biodiversity models developed in this study.

Texture analysis is one of the most suitable processing methods to estimate the characteristics of the forest structure from remote-sensed data ([15]). It can be applied to estimate local softness, roughness, smoothness and regularity of each variable based on the spatial variation of its pixel values ([9]). For each variable considered in this study, several parameters were derived by texture analysis, such as mean, variance, entropy, dissimilarity, homogeneity, angular second moment, correlation and grey-level-difference vector ([6], [29], [30]). Texture analysis was carried out over the study area using multiple window sizes (9×9, 11×11, 13×13, 15×15 and 17×17 pixels), in order to define the area used for statistical calculations.

All the above derived parameters were calculated over the whole study area at the original pixel resolution.

## Spectral signature extraction of the plots

All the images (for both the original variables and the derived parameters) were aggregated at a resolution of 60 m (consistently with the size of the field plots - 60×60 m) and their pixel values averaged. Average values for each variable were then extracted from the location of each field plot.

## Statistical models

Generalized additive models (GAMs) are semi-parametric regression models ([11]). Response curves in GAMs are estimated with smoothing functions, allowing a wide range of response curves to be fitted to the input data ([33]). The main advantage of GAM is its ability to deal with highly non-linear and non-monotonic relationships between the response variable and a set of explanatory variables. Thus, GAMs are sometimes referred to as a data-driven rather than model-driven process ([9]). Being N the number of observations and p the number of explanatory variables, the generalized additive model is defined as (eqn. 1):

$$G \left [ \mu(x) \right ] =y_{i} = \beta_0 +\sum_{j=1}^{p} f_{j} (x_{ij})$$

where fj are smoothed functions estimated from input data, μ represents the expected value, G is the link function, and i = 1, 2, 3, …, N. These functions are based on the assumption that E[fj(xj)] = 0 (j = 1, 2, …, N). In such a model, the dependent variable can be non-Gaussian and not necessarily continuous in nature, thereby allowing the construction of more flexible models.

### Multivariate adaptive regression splines (MARS)

The multivariate adaptive regression spline (MARS) method was first introduced by Friedman & Stueltze ([8]) to overcome some limitations of the regression trees. MARS is a regression method that is suitable to high dimensional data. The MARS procedure builds flexible regression models by using basis functions to fit separate splines to distinct intervals (ranges) of the input predictor variables. Both the variables in use and the end points of the intervals (knots), are located using an exhaustive search procedure that relies on a special class of basic functions ([1]). MARS models the target variable using a linear combination of splines, which are automatically built (matching the boundaries of each region) from an increasing set of piecewise-defined linear basic functions ([23]). MARS automatically selects the amount of smoothing required for each predictor as well as the interaction order of the predictors. It is considered a projection method where variable selection is not of concern, though the maximum level of interaction has to be determined. We considered 15 variants of these models formed by different combinations of the parameter nk that sets the maximum number of terms before pruning; two variants (1 and 2) of the parameter degree that sets the maximum degree of interaction; and four variants (0.01, 0.001, 0.005 and 0.0005) of the parameter thresh specifying the forward stepwise stopping threshold.

## Model evaluation and performance assessment

The available data were randomly split into two subsets, 70% of the data for modeling and 30% for validation and testing. For each tested model, several statistics were recorded, including the squared coefficient of determination (R2) and the adjusted coefficient of determination (adjusted R2). The latter was used to estimate the expected shrinkage in R2 due to over-fitting and the inclusion of too many independent variables in the regression model. Thus, when the adjusted R2 value is much lower than the R2 value, the regression may be over-fitted to the sample, and therefore poorly generalizable.

Model performances were assessed on the validation subset using several regression diagnostics metrics such as the root mean square error (RMSE), relative RMSE, bias and relative bias, calculated as follows (eqn. 2 to eqn. 5):

$$RMSe = \sqrt {\sum_{i=1}^{m} (est_{i} -obs_{i})} / {m}$$
$$RMSe \text{%} = {\frac{ \sqrt {\sum_{i=1}^{m} (est_{i} -obs_{i})} / {m}} { \sqrt {\sum_{i=1}^{m}obs_{i}} }} \cdot 100$$
$$Bias \text{%} = 100 \cdot {\frac{ \sqrt {\sum_{i=1}^{m} (est_{i} -obs_{i})} / {m}}{ \sqrt {\sum_{i=1}^{m}obs_{i}} }/m}$$
$$Bias = \sqrt {\sum_{i=1}^{m} (est_{i} -obs_{i})} / {m}$$

where esti is the value predicted by the model at the i-th pixel of the validation subset, obsi is the observed values at the same pixel and m is the number of pixels in the validation subset. In addition, some common graphical diagnostic tools ([20]) were used to evaluate the quality of model performances.

# Results

A high tree species diversity was observed in the study area, as inferred from the three diversity indices obtained from the field survey. The main descriptive statistics of the Simpson’s diversity index (D), the Shannon’s diversity index (H’), and the reciprocal of the Simpson’s diversity index (1/D) in the two datasets (training and validation subsets) are reported in Tab. 2. Overall, D varied between 0.105 to 0.86 across the analyzed sample plots, while H’ ranged from 0.12 to 2.56 and 1/D ranged from 1.105 to 4.02.

Tab. 2 - Descriptive statistics of model and validation samples for indices. (SD): standard deviation.

Regarding the window size used for texture analysis, the highest correlation between texture-derived parameters and all tree diversity indexes was found using the window size 9×9 pixels, which was then used in modeling to extrapolate the texture features of the analyzed spectral variables.

All the models were critically investigated for confounding factors and checked for all basic assumptions. The number of predictor variables entering the models ranged from two to eight, differing among both the models and the diversity indices considered. For example, regarding the index D, the best predictors were: NIR (mean and contrast), Red, and NDVI (variance) using the GAM models; and NIR (mean and entropy) and NDVI using the MARS model (Tab. 1). Overall, NIR (both as variable and its texture-analysis derived parameters) was the most represented predictor across models and indices (12 times), followed by Red (6 times), and NDVI and Green band (both 2 times).

The performance statistics for each model are summarized in Tab. 3. The proportion of the total variance accounted for by the different models (as inferred from the adj-R2 values) ranged from 54.2% (MARS, 1/D) to 73.1% (MARS, D), indicating a fairly good predicting ability of the models.

Tab. 3 - Performance indices of all SI-models for the three tree species and five modelling techniques. (*): best model performance for every evaluation measure.

The best model performance was evaluated based on the highest R2, highest adjusted R2 and lowest RMSE, RMSEr Bias and Biasr values. In the most cases, the best goodness-of-fit between the predicted and the observed tree diversity index values at the field plots was obtained by the GAM, which had the lowest values for RMSE and Bias and the highest adjusted R2. However, the best fitting was obtained when the tree diversity MARS model was used to predict the Simpson’s index D.

# Discussion

Hyrcanian forests of northern Iran comprise a highly diverse vegetation cover and are increasingly degraded and converted to other land uses. Understanding the main factors that influence the spatial distribution of both local species richness and spatial species turnover is important to adequately map tree diversity. In this study, we assessed the utility of Pleiades satellite image data and two regression techniques for modeling tree diversity in a Hyrcanian Forest. These results are similar to those obtained by other studies aimed at identifying broad patterns of tree species diversity by satellite data ([12], [22], [32]).

All the statistical models applied in this study provided fairly successful predictions of forest tree diversity based on remote-sensed data. In particular, GAM and MARS modeling regressions were successfully applied to identify those parts of the study area where tree species richness is above the average. In a comparable study, Wang et al. ([32]) report that GAM performed better and showed a better adaptability to extreme observations than other non-linear and non-parametric techniques, according to previous studies ([2]). Brenning ([5]) showed that the application of simple models, such as GAM, are similarly successful as compared with complex machine learning techniques. Our study confirm these findings, as GAM models provided in most cases a better fitting than MARS based on most evaluation measures.

The results of this study are not directly comparable with other relevant researches in particular regarding the use of variables derived by texture analysis as predictors. Furthermore, most studies published in the literature used satellite imagery with different spectral/spatial resolutions and/or were conducted in different forest conditions.

Our results showed that Pleiades satellite data and non-parametric regression models could be conveniently used by resource manager to achieve useful indications on tree diversity distribution over large areas in northeastern Iran, as well as to assess and monitor the status of tree diversity of Hyrcanian forests.

A strong limitation faced by conservation biologists and managers of natural resources is the lack of information concerning species distribution patterns. To this purpose, precise biodiversity mapping produced by accurate modeling could help in the selection and effectiveness of protected natural areas.

# References

(1)
Abraham A, Steinberg D (2001). MARS: still an alien planet in soft computing? International Conference on Computational Science, Springer-Verlag Germany, pp. 235-244.
(2)
Aertsen W, Kint V, Orshoven JV, Ozkan K, Muys B (2010). Comparison and ranking of different modelling techniques for prediction of site index in Mediterranean mountain forests. Ecological Modelling 221: 1119-1130.
(3)
Aguirre O, Hui G, Gadow KV, Jiménez J (2003). An analysis of spatial forest structure using neighborhood-based variables. Forest Ecology Management 183: 137-145.
(4)
Bawa K, Rose J, Ganeshaiah KN, Barve N, Kiran MC, Umashaanker R (2002). Assessing biodiversity from space: an example from the Western Ghats India. Conservation Ecology 6 (2): 7.
(5)
Brenning A (2005). Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazard and Earth System Science 5: 538-862.
(6)
Carr JR, Miranda FP (1998). The semivariogram in comparison to the co-occurrence matrix for classification of image texture. IEEE Transactions on Geosciences and Remote Sensing 36 (6): 1945-1952.
(7)
Fairbanks HK, McGwire KC (2004). Patterns of floristic richness in vegetation communities of California: regional scale analysis with multi-temporal NDVI. Global Ecology and Biogeography 13: 221-235.
(8)
Friedman J, Stueltze W (1981). Projection pursuit regression. Journal of American Statistical Association 76: 817-823.
(9)
Guisan A, Lehmann A, Ferrier S, Austin M, Overton JMC, Aspinall R, Hastie T (2006). Making better biogeographical predictions of species’ distributions. Journal of Applied Ecology 43: 386-392.
(10)
Haralick RM, Shanmugam K, Dinstein I (1973). Textural features for image classification.IEEE Transactions on Systems, Man, and Cybernetics 3 (6): 610-621.
(11)
Hastie TJ, Tibshirani RJ (1990). Generalized additive models. Chapman and Hall, New York, USA, pp. 352.
(12)
Hernandez-Stefanoni JL, Dupuy J (2007). Mapping species density of trees, shrubs and vines in a tropical forest, using field measurements, satellite multispectral imagery and spatial interpolation. Biodiversity and Conservation 16 (13): 3817-3833.
(13)
Hui G, Zhao X, Zhao Z, Gadow KV (2011). Evaluating tree species spatial diversity based on neighborhood relationships. Forest Science 57: 292-300.
(14)
Isik K, Yaltikik F, Akesen A (1997). The interrelationship of forests, biological diversity and the maintenance of natural resources. Unasylva FAO 48: 190-191.
(15)
Kayitakire F, Hamel C, Defourny P (2006). Retrieving forest structure variables based on image texture analysis and IKONOS-2 imagery. Remote Sensing of Environment 102: 390-401.
(16)
Kalbi S, Fallah A, Hojjati M (2014). Using and comparing two nonparametric (CART and RF) and SPOT-HRG satellite data to predictive tree diversity distribution methods. Nusantara Bioscience 6 (1): 57-62.
(17)
Kerr JT, Southowood TRE, Chilar J (2001). Remotely sensing habitat diversity predicts butterfly species richness and community similarity in Canada. Proceeding of the National Academy of Sciences USA 98: 11365-11370.
(18)
Laurent EJ, Shi H, Gatziolis D, LeBoutonc JP, Walters MB, Liu J (2005). Using the spatial and spectral precision of satellite imagery to predict wildlife occurrence pattern. Remote Sensing of Environment 97 (2): 249-262.
(19)
McCullagh P, Nelder JA (1997). Generalized linear models (2nd edn). Chapman and Hall, London, UK, pp. 194.
(20)
McRoberts RE, Tomppo EO, Finley AO, Heikkinen J (2007). Estimating areal means and variances using the k-nearest neighbors technique and satellite imagery. Remote Sensing Environment 111: 466-480.
(21)
Misra R (1968). Ecology workbook. Oxford and IBH Publishing Co., New Delhi, India, pp. 244-245.
(22)
Mohammadi J, Shataee S (2007). Forest stand density mapping using Landsat-ETM+ data, Loveh’s forest, north of Iran. In: Proceedings of the “28th Asian Conferences of Remote Sensing”. Malaysia, 12-16 Nov 2007, pp. 10-27.
(23)
Moisen GG, Frescino TS (2002). Comparing five modeling techniques for predicting forest characteristics. Ecological Modeling 157: 209-225.
(24)
Nagendra H (2001). Using remote sensing to assess biodiversity. International Journal of Remote Sensing 22 (12): 2377-2400.
(25)
Oindo BO, Skidmore AK (2002). Inter annual variability of NDVI and species richness in Kenya. International Journal of Remote Sensing 23: 285-298.
(26)
Oldeland J, Wesuls D, Rocchini D, Schmidt M, Jürgens N (2010). Does using species abundance data improve estimates of species diversity from remotely sensed spectral heterogeneity? Ecological Indicators 10 (2): 390-396.
(27)
Pretzsch H (1997). Analysis and modeling of spatial stand structures. Methodological considerations based on mixed beech-larch stands in lower Saxony. Forest Ecology Management 97: 237-253.
(28)
Rocchini D, Ricotta C, Chiarucci A (2007). Using satellite imagery to assess plant species richness: The role of multispectral systems. Applied Vegetation Science 10 (3): 325-331.
(29)
Soh LK, Satsoulis C (1999). Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices. IEEE Transaction on Geosciences and remote sensing 37 (2): 780-795.
(30)
Solberg AHS (1999). Contextual data fusion applied to forest map revision. IEEE Transaction on Geosciences and remote sensing 37 (3): 1234-1243.
(31)
Waide RB, Willig MR, Steiner SF, Mittelbach G, Gough L, Dodson SI, Juday GP, Parmenter R (1999). The relationship between productivity and species richness. Annual Review of Ecology and Systematics 30: 257-300.
(32)
Wang YH, Raulier F, Ung CH (2005). Evaluation of spatial predictions of site index obtained by parametric and nonparametric methods - a case study of Lodgepole pine productivity. Forest Ecology Management 214: 201-211.
(33)
Yee TW, Mitchell ND (1991). Generalized additive models in plant ecology. Journal Vegetation Science 2: 587-602.

### Cited by

 Akbari H, Kalbi S (2016). Determining Pleiades satellite data capability for tree diversity modeling iForest - Biogeosciences and Forestry 10: 348-352. - doi: 10.3832/ifor1884-009 Close
 Original Size Reduce the image size Enlarge the image < > > < First Previous Next Last
Close