The Sierra Madre Occidental mountain range (Durango, Mexico) is of great ecological interest because of the high degree of environmental heterogeneity in the area. The objective of the present study was to estimate the biomass of mixed and unevenaged forests in the Sierra Madre Occidental by using Landsat5 TM spectral data and forest inventory data. We used the ATCOR3^{® }atmospheric and topographic correction module to convert remotely sensed imagery digital signals to surface reflectance values. The usual approach of modeling stand variables by using multiple linear regression was compared with a hybrid model developed in two steps: in the first step a regression tree was used to obtain an initial classification of homogeneous biomass groups, and multiple linear regression models were then fitted to each node of the pruned regression tree. Crossvalidation of the hybrid model explained 72.96% of the observed stand biomass variation, with a reduction in the RMSE of 25.47% with respect to the estimates yielded by the linear model fitted to the complete database. The most important variables for the binary classification process in the regression tree were the albedo, the corrected readings of the shortwave infrared band of the satellite (2.082.35 µm) and the topographic moisture index. We used the model output to construct a map for estimating biomass in the study area, which yielded values of between 51 and 235 Mg ha^{1}. The use of regression trees in combination with stepwise regression of corrected satellite imagery proved a reliable method for estimating forest biomass.
Regression TreesStepwise RegressionRemote SensingATCOR3Terrain FeaturesImage TextureRemote Sensing and Information SystemIntroduction
The Sierra Madre Occidental is considered an area of special ecological interest because of the high levels of biodiversity, which are attributed to diverse physiographic and climatic conditions (Challenger 1998). The area is also important because of the presence of some of the most important commercial species of pine and oak in Mexican ecosystems (Sánchez et al. 2003).
Quantification of forest biomass and carbon sequestration is a relevant issue in the management of these forest stands. Reliable information is required for accurate biomass estimation, which should also take into account variable external factors that can be modeled, e.g., climate change (IPCC 2003, Ryu et al. 2004, Sun & Ranson 2009). However, given the diversity of environmental, topographical and biophysical conditions in forest ecosystems in different locations, there is no universal, transferable technique for estimating biomass (Keller et al. 2001, Foody et al. 2003, Lu 2005, Cutler et al. 2012).
In general, forest biomass can be measured directly (destructive analysis) or it can be estimated indirectly (Brown & Lugo 1984). The direct method is usually accurate, but it is expensive and timeconsuming and can only be used in small areas (Ketterings et al. 2001, Zianis & Mencuccini 2004, Walker et al. 2011). These difficulties have largely been resolved by the appearance and further development of quantitative satellite systems and aerial remote sensing, together with the development of parametric and nonparametric statistical methods for modeling variables of interest. The stand variables usually measured in traditional forest inventories can now be estimated faster, at lower cost and over larger areas (Liang 2007). The application of spatial technologies has allowed estimation of biomass in different ecosystems (Muukkonen & Heiskanen 2005, Fuchs et al. 2009, HernándezStefanoni et al. 2011, AguirreSalado et al. 2014).
Preprocessing of satellite imagery is important for improving the quality and interpretation of data. Forest ecosystems generally cover rough terrain where the topographic conditions lead to variations in reflectance values because of the position of the sun (Meyer et al. 1993, Richter & Schläpfer 2011, Richter 2013). Thus, the quality of the final product largely depends on accurate calibration of the sensors and on radiometric correction to minimize distortion and atmospheric effects (Li et al. 2010). In this respect, the use of atmospheric and topographic correction is therefore essential to counteract such effects, and digital elevation model (DEM) parameters such as slope, orientation, shadows cast, sky view and altitude can be used for such purposes (Balthazar et al. 2012, Richter 2013). These primary parameters, together with biophysical parameters such as vegetation indexes (Gilabert et al. 1997) and indexes derived by the analysis of the image texture (by quantification of the spatial variation in grey tones using a grey level cooccurrence matrix  GLCM), are very useful for identifying areas or objects of interest in the image (Haralick et al. 1973, Botero & Restrepo 2010). They can also be combined with terrain parameters to model vegetation characteristics (Lu 2005, Kayitakire et al. 2006, DíazVarela et al. 2011), as well as to describe hydrological, geomorphological and ecological processes (Moore & Nieber 1989, Wilson & Gallant 2000).
One of the methods most commonly used for this purpose is the classification and regression trees method, initially proposed by Breiman et al. (1984). This is a nonparametric, multivariate, supervised, inductive learning method that basically searches for classification and prediction rules by recursive partitioning. In this technique, a series of binary combinations (yes, no) expressed in terms of a single independent variable is used to identify certain profiles and vectors that enable description of the individual parameters under study (Hu et al. 2010).
The objective of the present study was to model the forest biomass in mixed and unevenaged forests in the Sierra Madre Occidental (Mexico) by using remote sensing Landsat5 TM imagery, terrain parameters and forest inventory data obtained from a network of permanent plots sampled in a traditional (ground based) survey. Two different approaches were compared: the usual modeling method of fitting a linear relationship to stand biomass and site variables obtained from remote sensing images, and a new approach consisting of a hybrid model combining regression trees and linear models for the final tree nodes. As far as we know, this is the first time this hybrid approach has been used to model stand biomass with remote sensing data.
Material and methodsStudy area
The study area is located in the UMAFOR1001 (Unidad de Manejo Forestal Regional or regional forest management unit) in the Sierra Madre Occidental, in the north of the state of Durango (Mexico), and covers an area of 1 142 916 ha (Fig. 1). The vegetation comprises pine, oak, Douglas fir, pineoak and oakpine forest, according to the description in the Land Use and Vegetation Cover Chart, scale 1:250.000, Series V (INEGI 2012).
Field data
A network of 99 permanent sampling plots was established during the winter of 2011, following the method described by CorralRivas et al. (2009). The plots were located by systematic sampling (with some exceptions to avoid non forested areas) with a grid of equidistant points separated by three to five kilometers, depending on the orography of the study area. In each plot (50 × 50 meters), the tree species were recorded and the diameter at breast height > 7.5 cm and total height (m) of all standing trees were measured.
Speciesspecific statistical models developed by VargasLarreta (2013) were used to estimate individual (at tree level) aboveground biomass. The goodness of fit for such statistical models ranged from 0.87 to 0.99 (R^{2}), and the root mean square error (RMSE) varied from 22.8 to 95.2 kg. Once the tree aboveground biomass was estimated, all values from each sampling plot were summed and expressed on a per hectare basis. Summary statistics including number of observations, mean, standard deviation, minimum, and maximum values of the main stand variables are shown in Tab. 1.
DatasetsSource of spectral data
Three Landsat5 TM (Thematic Mapper) satellite images (path/row: 31/42, 32/41 and 32/42), obtained in April and May 2011 and covering the entire study area, were examined (available from the US Geological Service webpage  http://glovis.usgs.gov/). The available images are subjected to cubic convolution geometric correction for discrete data (level L1T), with a RMSE of the geometric residuals lower than 1 pixel, and they are therefore suitable for image processing (Keys 1981).
Landsat5 TM data have spatial resolution of 30 m with a 16 day revisit period. The swath width is 185 km with seven spectral bands in the following wavelength regions of the electromagnetic spectrum: blue (0.450.52 µm), green (0.520.60 µm), red (0.630.69 µm), near infrared (0.780.89 µm), shortwave infrared (1.551.75 µm) and shortwave infrared (2.082.35 µm). These wavelength regions correspond respectively to bands 1, 2, 3, 4, 5 and 7 of the Landsat5 TM satellite (NASA 2011). Given its thermal characteristics, band 6 was not used in the present study.
Atmospheric and topographic correction (ATCOR3<sup>®</sup>)
The satellite images were subjected to radiometrical, atmospherical and topographical correction using the ATCOR3^{®} module, regarded as particularly suitable for mountainous zones (Geosystems 2013) and implemented in the ERDAS^{®} IMAGINE^{®} 2013 software (ERDAS 2013). After correction, the original image digital levels (DL) were converted to ground reflectance values for each band. A number of vegetation indexes and other derived parameters (Tab. 2) were computed from the atmospherically and topographically corrected image bands and then included in the biomass estimation models for their evaluation as possible regressor variables. Vegetation indexes are regarded as good indicators of vegetation cover “greenness” (understood as a combination of attributes such as leaf chlorophyll content, leaf area, canopy cover and structure  Glenn et al. 2008) and are good indicators of vegetation canopy biomass. Hence, the Normalized Difference Vegetation Index (Rouse et al. 1974) and Soil Adjusted Soil Vegetation Index (Huete 1988) were included as indexes correlated with green biomass content, with the former being particularly suited for scattered vegetation land cover. The Leaf Area Index (Baret & Guyot 1991) derived from NDVI was also included as a good indicator of green biomass. Albedo (Asrar 1989), photosynthetically active radiation (Asrar et al. 1984) and absorbed shortwave solar radiation (Brutsaert 1975) were also included as comprehensive indicators of the interaction between land cover and solar radiation in the visible and nearinfrared regions of the electromagnetic spectrum.
The ATCOR3^{®} module (Geosystems 2013) first calculates the radiance at sensor level (Wsr^{1} m^{2}) from the image pixels DL. Several input parameters were required for this calculation and were retrieved from the image metadata (header file): date of acquisition, scale factors, geometry (solar zenith angle and solar azimuth) and other information about the sensor calibration file (“gain and bias”). Other parameters were adjusted by taking into account particularities of the input datasets and the conditions of the imagery dates, e.g., visibility (35 km), pixel size of the DEM (15 m), aerosol type (rural), among others. As the image was cloudless and no suitable water vapor bands were available, dehazing/ cloud removal and atmospheric water retrieval settings were retained as “default”, which, in this case, is recommended by the ATCOR3^{®} User Manual (Geosystems 2013).
As a prior requisite for application of the ATCOR3^{®} module, three topographic parameters (namely slope, orientation, skyview and shadows cast  Richter 2013) were computed from a DEM of the study area with a spatial resolution of 15 m (INEGI 2014). Prior to these calculations, a low pass filter with a 5×5 moving window was applied to the original DEM in order to reduce the banding effects present in the original file.
After radiometric correction, the three scenes corresponding to the study area were mosaicked using the ERDAS® IMAGINE® 2013 software (ERDAS 2013).
Texture parameters
With the aim of including information that combines the spatial and spectral domain of the remote sensed imagery in the biomass estimation models, the following texture parameters were calculated from the NDVI image based on grey level cooccurrence matrices (Tab. 3): homogeneity, contrast, dissimilarity, mean, standard deviation, entropy, second order angular moment and correlation (Haralick et al. 1973). Calculations were done at three different analysis scales, corresponding to window sizes of 3×3, 5×5 and 7×7 pixels, respectively. The original NDVI image values were resampled to a grey level depth of 256 (8 bits) to reduce computational costs (Haralick et al. 1973). This procedure was carried out using the software package PCI Geomatica 2013^{®} (PCI Geomatics 2013).
Terrain variables
Terrain variables are directly related to forest species composition, tree height growth, and other forest stand variables, and enable these to be modeled (McNab 1989, Roberts & Cooper 1989). Therefore, first and second order terrain parameters (Tab. 3) were derived from the 5×5 low pass filtered DEM (INEGI 2014) and included as candidate variables in the models. The selected first order terrain parameters were elevation, slope, transformed aspect, profile curvature, plan curvature and curvature, while second order terrain parameters were terrain shape index and wetness index. These parameters are potentially related to key features for forest stand development, such as overall climate characteristics, insolation, evapotranspiration, runoff, infiltration, wind exposure and site productivity. Some of these terrain features are widely used in hydrological, geomorphological and ecological studies (Wilson & Gallant 2000), whereas others are used more specifically for vegetation and forest assessment (McNab 1989, Roberts & Cooper 1989).
Dataset integration
The sample plots were geopositioned with the aim of extracting the pixel value average with an associated buffer of 25 m for each potential predictor. This extraction was carried out using the statistical software R (R Core Team 2014) and the “raster” package. Finally, a database was constructed with the mean biomass values for each plot: the corrected bands of the Landsat5 TM sensor (6 bands: 1, 2, 3, 4, 5 and 7), the vegetation indexes (6 indexes), the texture variables derived from the Normalized Difference Vegetation Index (NDVI  24 variables), and the terrain variables derived from the DEM (8 variables).
Models fitted
The biomass of the sample plots was estimated using two different methods. In the first, the ordinary least squares (OLS) method was used to fit a linear model to estimate stand biomass. The best set of independent variables was selected by using the stepwise variables selection method. The second method consisted of a twostep hybrid approach. In the first step, a regression tree was used to classify the sample plots in homogeneous groups according to their biomass values by a binary rulebased method. In the second step, the ordinary least square method was used to fit linear models to estimate stand biomass for each group by using the stepwise variable selection method to select the best set of independent variables. In both methods, the 45 spectral, texture and terrain variables were taken into account as possible independent variables.
Regression tree analysis (CaRT) is a nonparametric technique for the sequential partitioning of a data set composed of a continuous response variable and any number of potential continuous or categorical predictor variables, by using dichotomous criteria (Breiman et al. 1984). After each split, the algorithm identifies the predictor variable that provides the most effective binary separation of the range in the response variable. As a result, predictor variables can be used more than once. The regression tree analysis was performed using the “rpart” package in R (Therneau & Atkinson 2012, R Core Team 2014). This approach partitions the data set sequentially, considering twoway splits at each tree node. The best split at each node t is the split that maximizes the following quantity (eqn. 1):
where P_{L} and P_{R} are the proportions of sample plots that fall respectively to the left and right branch of node t; Err(t_{L}) and Err(t_{R}) are the error of the left and right branches; Err(t) is the mean square error at node t given by (eqn. 2):
and (bar)y_{t} is the stand biomass assigned to node t, calculated as the mean of the stand biomass of all the sample plots in node t.
Instead of applying stopping rules, a sequence of subtrees was generated by growing a large tree and pruning it back until only the root node was left. The error of each subtree was then estimated by crossvalidation, and the subtree with the lowest error was chosen by analyzing the values of the complexity parameter defined by Breiman et al. (1984).
Once the sample plots of each final node were obtained, a multiple linear model was fitted to estimate the stand biomass, using stepwise selection methods to select the best set of independent variables, with the SAS/STAT^{®} software package (SAS Institute Inc 2007). Two criteria were considered for the evaluation of model performances: the coefficient of determination (R^{2}) and the RMSE. The expressions of these statistics are summarized as follows (eqn. 3, eqn. 4):
where y_{i}, (hat)y_{i} and (bar)y are the observed, estimated and mean biomass values, n is the total number of observations used to fit the model, and p is the number of model parameters.
The main problem associated with such multiple linear models is the multicollinearity. This refers to the existence of high correlations between certain independent variables representing or measuring similar phenomena. Although the leastsquares estimates of regression coefficients remain unbiased and consistent under the presence of multicollinearity, they are no longer efficient (Myers 1990). This may seriously affect the standard errors of the coefficients, thus invalidating statistical tests and confidence intervals (Neter et al. 1990). One of the main sources of multicollinearity is the use of overfitted models that include several polynomial and crossproduct terms. To evaluate the presence of multicollinearity between variables in the models, the condition number, defined as the square root of the ratio of the largest to the smallest eigenvalue of the correlation matrix, was used. According to Belsey (1991), condition numbers between 5 and 10 indicate that multicollinearity will not be a major problem, while those in the range 30100 indicate moderate multicollinearity and those in the range 10003000 indicate severe multicollinearity. Therefore, independent variables with condition numbers higher than 30 were not used in the models.
Finally, since the quality of fit does not necessarily reflect its predictive performance (Myers 1990), an assessment of the validity of the models with an independent dataset is recommended (Kozak & Kozak 2003). Due to the difficulties associated with collecting such data, crossvalidation was applied in this study. Validation of the model fitted to the complete database (method 1) and of the model of each final node was thus based on the values of coefficient of determination and root mean square error, using the predicted residual sum of squares (PRESS), i.e., each sample plot is removed in turn and the model is refitted using the remaining sample plots. The outofsample predicted value is calculated for the omitted sample plot in each case, and the PRESS statistic is calculated as the sum of the squares of all the resulting prediction errors.
The equations obtained with the best method were finally used to generate a map of biomass by means of the map algebra and conditional tools of the GIS package ArcGIS 10^{®} (ESRI 2012) from the vector vegetation layer (INEGI 2012).
Results
The parameter estimates of the linear model fitted to the complete database using OLS and the stepwise variables selection method is shown in Tab. 4. All the parameters were significant at a 5% level, and up to 5 independent variables were included in the model without generating multicollinearity problems. The model explained 58.83% of the observed stand biomass variability with a RMSE of 27.88 Mg ha^{1} (31.32% of the mean stand biomass). Based on the results of crossvalidation, the model explained 51.33% of the total observed variation in stand biomass with a RMSE value of 30.31 Mg ha^{1} (34.04% of the mean stand biomass).
The regression tree shown in Fig. 2 was generated in the first step of the second methods. This tree has a root node that contains all 99 sample plots with an assigned mean biomass value of 89.03 Mg ha^{1}. The limiting value of 121.5, for the variable albedo, divided these samples into two groups of plots. Each subgroup was then sequentially divided by the limiting values of the variables band 7, band 5, LAI, contrast texture with a 5×5 window and wetness index.
However, the problem with the regression trees method is that it tends to overfit the data, and therefore the most general model may not be obtained when a new set of independent data is used (Breiman et al. 1984). These authors suggested that once the tree is constructed, it should be exhaustively pruned by successively removal of branches or terminal nodes that contribute little to explaining the response variable, to yield an appropriatelysized tree. The mean value of the complexity parameter (CP) defined by Breiman et al. (1984) and obtained by crossvalidation, was used in this case to select the number of branches on the final tree, and the result is shown in Fig. 3 (the number of tree input variables was reduced to three, namely albedo, band 7 and wetness index).
Direct application of the regression tree to the data from the 99 permanent sample plots used in this study resulted in 56.76% of the observed variability in stand biomass being explained by the model, with an RMSE value of 28.57 Mg ha^{1} (32.09% of the mean stand biomass). Once the four groups shown in Fig. 3 were obtained from the tree, linear models were fitted to each. The parameter estimates, their standard errors and the goodnessoffit statistics obtained by crossvalidation are shown in Tab. 5.
The intercepts of models for groups A, B and D were not significant at the 5% level, and therefore the models were refitted without this term. In all cases, the parameters were significant and the condition number values indicate no problems associated with multicollinearity. Analysis of the graph of the residuals plotted against the predicted values also indicated the absence of problems associated with variance heterogeneity or lack of normal distribution of the residuals (Fig. 4).
Crossvalidation of the hybrid model comprising the pruned regression tree (Fig. 3) and the linear models for each terminal node explained 72.96% of the observed variation in stand biomass, with a RMSE value of 22.59 Mg ha^{1} (25.37% of the mean stand biomass).
The spatial distribution of the biomass estimation (Mg ha^{1}) in the study area obtained by application of the classification rules included in the regression tree model and the posterior estimations yielded by multiple linear regression are shown in Fig. 5. The grey and green pixels reflect biomass contents lower than 80 Mg ha^{1} and between 80 and 157 Mg ha^{1}, respectively, whereas the yellow and red pixels represent the highest biomass values found in temperate forest (mainly pine and pineoak cover) in the study area, in accordance with the INEGI’s Land Use and Vegetation chart, series V (INEGI 2012).
Discussion
The results of the present research showed that integration of spectral information, texture variables derived from the NDVI and terrain indexes (DEM) was essential for forest biomass estimation. Indeed, these variables were reported in previous studies as being closely related to the development and growth of this type of ecosystem and are also useful for ecosystem evaluation and monitoring (McNab 1993, Chen et al. 2004, DíazVarela et al. 2011).
The combined use of regression trees and linear models including spectral, texture and terrain variables proved to be a good method for identifying patterns and defining biomass trends in the study area. The independent variable albedo, defined as the average solar reflectance (Liang 2000), was the main discriminating factor in the regression tree, and highest values occurred in the areas with the lowest forest biomass. Kuusinen et al. (2014) obtained similar results and observed an inverse relationship between stand age and albedo, so that the value of this variable was lower in mature stands because of the lower level of incident radiation absorbed in such stands. This relationship can be used to discriminate zones with different levels of forest biomass. The other two discriminant variables were spectral band 7 (shortwave infrared) and the topographical wetness index. Because of its spectral characteristics, band 7 is directly related to the moisture content of soil and vegetation recorded by the image. Thus, the reflectivity in this band increased as the surface wetness captured by the sensor decreased. In afforested areas, this band displays low reflectivity; as moisture levels are high in forest stands, the highest values for this band represent lower amounts of biomass in the classification tree. These results are similar to those reported by García et al. (2005) for pure stands of Pinus halepensis and P. sylvestris in Spain, i.e., there was an inverse relationship between the values for band 7 of the Landsat5 TM sensor and the moisture content of the residual forest biomass. Finally, inclusion of the topographic wetness index in the model confirms the previous findings, as high values of this index are associated with high levels of moisture, which coincide with zones with high amounts of biomass. In various studies, use of the relationship between the wetness index and the vegetation biomass has enabled identification of the distribution of vegetation (Moore et al. 1993, Zinko et al. 2005) and of potential areas for establishing forest plantations (Holmgren 1994). These results indicate water availability as a key factor controlling biomass production in arid and semiarid environments such as the Sierra Madre Occidental (SalinasZavala et al. 2002, MéndezBarroso et al. 2009, Zhao & Running 2010, Forzieri et al. 2014).
The hybrid model combining the nonparametric method of regression trees and multiple linear models yielded a reduction in the RMSE (25.47%) and an increase in R^{2} (42.14%) with respect to the same statistics obtained by crossvalidation of the linear model fitted to the complete database. According to the results shown in Tab. 5, the main RMSE reduction was obtained in group D (46.14%), probably because this group is associated with sample plots with higher stand biomass values. On the other hand, the reduction in RMSE in group C was only 1.05%, possibly because this is the group with the lowest coefficient of variation of stand biomass (27.23% compared with a mean value of 48.56%).
The parameters selected by the hybrid model included single band values in the visible (Band 1blue) and mid infrared (Band 5) regions of the electromagnetic spectrum. The mid infrared regions have already been included and discussed in the initial (Band 5 and Band 7) and pruned (Band 7) regression tree model, indicating a relationship between the biomass and spectral response of forest cover in the imagery bands related to water content. As expected, vegetation indexes, such as NDVI, and other indicators of the radiationland cover interaction, such as the Absorbed Shortwave Solar Radiation, also emerged as valuable predictors of biomass due to their potential relationship with biomass. The relationship between remote sensed NDVI and biomass content, which has been the matter of discussion as strongly dependent on the scale of analyses and characteristics of the imagery, has nonetheless been regarded in the literature as one of the most widely used predictors of biomass content (Foody et al. 2003, GonzálezAlonso et al. 2006).
Interestingly, apart from these variables in the pure spectral domain, up to five variables of the spectralspatial domain (i.e., texture variables Entropy 3×3, Correlation 3×3, Dissimilarity 5×5, Mean 7×7, Stand. Dev. 7×7) were included in the mixed model. This indicates the importance of the spatial arrangement of spectral values at different spatial scales (from a 3×3 kernel corresponding with an area of 0.81 ha to a 7×7 kernel corresponding with an area of 4.41 ha) for forest stand characterization, as reported in previous studies (Franklin et al. 2001, Kayitakire et al. 2006, DíazVarela et al. 2011, Nichol & Sarker 2011).
The value of R^{2} finally obtained with the hybrid model (0.7296) is slightly higher than that obtained by Sun et al. (2011) in a study carried out in the US, with high resolution LIDAR sensors and SAR data, to model fieldmeasured biomass by linear models and stepwise selection of variables (R^{2}, 0.71 and RMSE, 31.33 Mg ha^{1}).
Estimates obtained with sensors of medium spatial resolution usually display a low predictive power for each band of the sensor. Thus, Foody et al. (2003) found the strongest predictive relationship for biomass with a sampling network specifically designed for different sites (r > 0.71) based on indexes obtained for tropical forest in Brazil by using Landsat TM data. On the other hand, Houghton et al. (2007) estimated the biomass of Russian forests by using data derived from a MODIS sensor and regression trees in 500×500 m plots, in which the percentage of variance explained by regression trees ranged from 1 to 67%.
In the present study, the consideration of biophysical variables derived from satellite images along with other complementary data and the use of nonparametric multivariate techniques, improved the quality of the estimates, thus indicating that this is a promising line of research.
Conclusions
This study explored possible improvements in forest biomass prediction involving use of field data and geodata derived from atmospherically and topographically corrected satellite images (provided by the Landsat5 TM sensor), texture indexes and DEMderived terrain variables. A new approach combining the nonparametric regression trees method and multiple regression analysis of the groups defined in the pruned tree was compared with the usual method of fitting a linear model to the complete database. Crossvalidation of both methods indicated that the proposed new approach improved the performance of the linear model. Moisture content was an important covariate in the final model and was directly related to biomass distribution in the temperate forest under study. The proposed approach deserves further attention in future studies aimed at estimating stand variables by using remote sensing data, especially for more complicated stand structures, such as mixed and uneven aged forests, in which the use of a mean value for each node cannot accurately represent the intranode stand variation.
Acknowledgements
This research was supported by SEPPROMEP (Project: Seguimiento y Evaluación de Sitios Permanentes de Investigación Forestal y el Impacto Socioeconómico del Manejo Forestal en el Norte de México). We thank three anonymous reviewers for their useful comments and suggestions, which helped to improve an earlier version of the manuscript.
ReferencesAguirreSalado CA, AguirreSalado EJ, TreviñoGarza OA, AguirreCalderón J, JiménezPérez MA, GonzálezTagle JR, ValdézLazalde G, SánchezDíaz RH, AguirreSalado AI, MirandaAragón LMapping aboveground biomass by integrating geospatial and forest inventory data through a knearest neighbour strategy in North Central Mexico. Journal of Arid Land 1: 8096.2014Asrar GTheory and applications of optical remote sensing. John Wiley and Sons, New York, USA, pp. 734.1989Asrar G, Fuchs M, Kanemasu ET, Hatfield JLEstimating absorbed photosynthetically active radiation and leaf area index from spectral reflectance in wheat. Agronomy Journal 76 (2): 300306.1984Balthazar V, Veerle V, Eric FLEvaluation and parameterization of ATCOR3 topographic correction method for forest cover mapping in mountain areas. International Journal of Applied Earth Observation and Geoinformation 18: 436450.2012Baret F, Guyot GPotentials and limits of vegetation indices for LAI and APAR assessment. Remote Sensing of Environment 35: 161173.1991Belsey DAConditioning diagnostics, collinearity and weak data in regression. John Wiley and Sons Inc, New York, USA, pp. 396.1991Botero VJS, Restrepo MAAnálisis de textura en panes usando la matriz de coocurrencia [Analysis of texture in breads using the cooccurrence matrix]. Revista Politécnica 6 (10): 7480. [in Spanish]2010Breiman L, Friedman JH, Olshen RA, Stone CJClassification and regression trees. Chapman and Hall, New York, USA, pp. 368.1984Brown S, Lugo AEBiomass of tropical forests: a new estimate based on forest volumes. Science 223 (4642): 12901293.1984Brutsaert WOn a derivable formula for longwave radiation from clear skies. Water Resources Research 11: 742744.1975Challenger AUtilización y conservación de los ecosistemas terrestres de México: pasado, presente y futuro [Use and conservation of terrestrial ecosystems of Mexico: past, present and future]. Comisión Nacional para el Conocimiento y Uso de la Biodiversidad/Instituto de Biología, UNAM/Agrupación Sierra Madre, México, DF, pp. 847. [in Spanish]1998Chen D, Stow DA, Gong PExamining the effect of spatial resolution and texture window size on classification accuracy: an urban environment case. International Journal of Remote Sensing 25 (11): 21772192.2004CorralRivas J, Vargas B, Wehenkel C, Aguirre O, Alvarez JG, Rojo AGuía para el establecimiento de sitios de inventario periódico forestal y de suelos del estado de Durango [Guidelines for the establishment of permanent sample plots in forests of Durango State]. Facultad de Ciencias Forestales, Universidad Juárez del Estado de Durango, Mexico, pp. 81. [in Spanish]2009Cutler MEJ, Boyd DS, Foody GM, Vetrivela AEstimating tropical forest biomass with a combination of SAR image texture and Landsat TM data: an assessment of predictions between regions, ISPRS Journal of Photogrammetry and Remote Sensing 70: 6667.2012DíazVarela RA, Álvarez P, DíazVarela E, Calvo SPrediction of stand quality characteristics in sweet chestnut forests in NW Spain by combining terrain attributes spectral textural features and landscape metrics. Forest Ecology and Management 261: 19621972.2011ERDASErdas Imagine 2013. Hexago AB, Intergraph Corporation, Madison, AL, USA.2013ESRIArcGIS for Desktop 10. Redwoods, CA, USA.2012Foody GM, Boyd DS, Cutler MEJPredictive relations of tropical forest biomass from Landsat TM data and their transferability between regions. Remote Sensing of Environment 85 (4): 463474.2003Forzieri G, Feyen L, Cescatti A, Vivoni ERSpatial and temporal variations in ecosystem response to monsoon precipitation variability in southwestern North America. Journal of Geophysical Research: Biogeosciences 119 (10): 19992017.2014Franklin SE, Maudie AJ, Lavigne MBUsing spatial cooccurrence texture to increase forest structure and species composition classification accuracy. Photogrammetric Engineering and Remote Sensing 67: 849856.2001Fuchs H, Magdon P, Kleinn C, Flessa HEstimating aboveground carbon in a catchment of the Siberian forest tundra: combining satellite imagery and field inventory. Remote Sensing of Environment 113: 518531.2009García MA, Pérez CF, De la Riva J, Fernández J, Pascual PE, Herranz AEstimación de la biomasa residual forestal mediante técnicas de teledetección y SIG en masas puras de Pinus halepensis y P. sylvestris [Estimation of residual forest biomass using remote sensing technology and GIS in pure stands of Pinus halepensis and P. sylvestris]. In: Proceedings of the “IV Congreso Forestal Español de la Sociedad Española de Ciencias Forestales”. Paper no. 4CFE05342T1, 308, Sociedad Española de Ciencias Forestales, Palencia, Spain, pp. 5. [in Spanish]2005GeosystemsHaze reduction, atmospheric and topographic correction. User Manual ATCOR2 and ATCOR3, Geosystems GmbH, Germering, Germany, pp. 238.2013Gilabert MA, Piqueros JG, Haro JGAcerca de los índices de vegetación [About vegetation indices]. Revista de Teledetección 8: 110. [in Spanish]1997Glenn EP, Huete AR, Nagler PL, Nelson SGRelationship between remotelysensed vegetation indices, canopy attributes and plant physiological processes: what vegetation indices can and cannot tell us about the landscape. Sensors 8: 21362160.2008GonzálezAlonso F, MerinoDeMiguel S, RoldánZamarrón A, GarcíaGigorro S, Cuevas JMForest biomass estimation through NDVI composites. The role of remotely sensed data to assess Spanish forests as carbon sinks. International Journal of Remote Sensing 27: 54095415.2006Haralick RM, Shanmugam K, Dinstein ITextural features for image classification. IEEE Transactions of Systems, Man, and Cybernetics 6: 610621.1973HernándezStefanoni JL, GallardoCruz JA, Meave JA, Dupuy JMCombining geostatistical models and remotely sensed data to improve tropical plant richness mapping. Ecological Indicators 11: 10461056.2011Holmgren PTopographic and geochemical influence on the forest site quality, with respect to Pinus sylvestris and Picea abies in Sweden. Scandinavian Journal of Forest Research 9: 7582.1994Houghton RA, Butman D, Bunn AG, Krankina ON, Schlesinger P, Stone TAMapping Russian forest biomass with data from satellites and forest inventories. Environmental Research Letters 2 (4): 045032.2007Hu W, Mengersen K, Tong SRisk factor analysis and spatiotemporal CART model of cryptosporidiosis in Queensland, Australia. BMC infectious diseases 10 (1): 311.2010Huete ARA soiladjusted vegetation index (SAVI). Remote Sensing of Environment 25: 295309.1988INEGIUso del suelo y vegetación escala 1:250.000 [Land use and vegetation scale 1:250.000]. Serie V, Información vectorial, Instituto Nacional de Estadística Geográfica e Informática, México. [in Spanish]2012INEGIContinuo de elevaciones mexicano 3.0 (CEM 3.0) [Mexican continuous elevation 3.0 (CEM3.0)]. Instituto Nacional de Estadística Geográfica e Informática, México. [in Spanish]2014IPCCGood practice guidance for land use, landuse change and forestry. IPCC National Greenhouse Gas Inventories Programme, Hayama, Japan, pp. 590.2003Kayitakire F, Hamel C, Defourny PRetrieving forest structure variables based on image texture analysis and IKONOS2 imagery. Remote Sensing of Environment 102 (34): 390401.2006Keller M, Palace M, Hurtt GBiomass estimation in the Tapajos National Forest, Brazil: examination of sampling and allometric uncertainties. Forest Ecology and Management 154 (3): 371382.2001Ketterings QM, Coe R, van Noordwijk M, Ambagau Y, Palm CAReducing uncertainty in the use of allometric biomass equations for predicting aboveground tree biomass in mixed secondary forests. Forest Ecology and Management 146 (13): 199209.2001Keys RGCubic convolution interpolation for digital image processing. IEEE Transactions on Acoustics, Speech, and Signal Processing 29: 11531160.1981Kozak A, Kozak RADoes cross validation provide additional information in the evaluation of regression models? Canadian Journal of Forest Research 33: 976987.2003Kuusinen N, Tomppo E, Shuai Y, Berninger FEffects of forest age on albedo in boreal forests estimated from MODIS and Landsat albedo retrievals. Remote Sensing of Environment 145: 145153.2014Li F, Jupp D.L, Reddy S, Lymburner L, Mueller N, Tan P, Islam, AAn evaluation of the use of atmospheric and BRDF correction to standardize Landsat data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 3 (3): 257270.2010Liang SNarrowband to broadband conversions of land surface albedo I algorithms. Remote Sensing of Environment 76: 213238.2000Liang SRecent developments in estimating land surface biogeophysical variables from optical remote sensing. Progress in Physical Geography 31: 501516.2007Lu DAboveground biomass estimation using Landsat TM data in the Brazilian Amazon Basin. International Journal of Remote Sensing 26 (12): 25092525.2005McNab WHTerrain shape index: quantifying effect of minor landforms on tree height. Forest Science 35: 91104.1989McNab HWA topographic index to quantify the effect of mesoscale landform on site productivity. Canadian Journal of Forest Research 23: 11001107.1993MéndezBarroso LA, Vivoni ER, Watts CJ, Rodríguez JCSeasonal and interannual relation between precipitation, surface soil moisture and vegetation dynamics in the North American monsoon region. Journal of Hydrology 377 (12): 5970.2009Meyer P, Itten KI, Kellenberger T, Sandmeier S, Sandmeier RRadiometric corrections of topographically induced effects on Landsat TM data in an Alpine environment. ISPRS Journal of Photogrammetry and Remote Sensing 48 (4): 1728.1993Moore ID, Nieber JLLandscape assessment of soil erosion and nonpoint source pollution. Journal of the Minnesota Academy of Science 55 (1): 1824.1989Moore ID, Norton TW, Williams JEModelling environmental heterogeneity in forested landscapes. Journal of Hydrology 150 (24): 717747.1993Muukkonen P, Heiskanen JEstimating biomass for boreal forests using ASTER satellite data combined with standwise forest inventory data. Remote Sensing of Environment 99: 434447.2005Myers RHClassical and modern regression with applications (2nd edn). Duxbury Press, Belmont, CA, USA, pp. 488.1990NASALandsat 7 science data users handbook. National Aeronautics and Space Administration, NASA, Washington, DC, USA, pp. 3031.2011Neter J, Wasserman W, Kutner MHApplied linear statistical models: regression, analysis of variance and experimental designs (3rd edn). Irwin, Boston, MA, USA, pp. 842.1990Nichol JE, Sarker MLRImproved biomass estimation using the texture parameters of two highresolution optical sensors. IEEE Transactions on Geoscience and Remote Sensing 49: 930948.2011PCI GeomaticsPCI Geomatics 2013. PCI Geomatics Inc, Ontario, Canada.2013Qi J, Chehbouni A, Huete AR, Kerr YHModified soil adjusted vegetation index (MSAVI). Remote Sensing of Environment 48 (2): 119126.1994R Core TeamR: a language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria, pp. 3551.2014Richter RUser manual ATCOR2 and ATCOR3, ATCOR for IMAGINE 2013: haze reduction, atmospheric and topographic correction. DLR Oberpfaffenhofen, Institute of Ptoelectronics, D82234, Version 15.01.2013, Wessling, Germany, pp. 240.2013Richter R, Schläpfer DAtmospheric/topographic correction for satellite imagery. ATCOR 2/3 User Guide, Version 8.0.2, ReSe, Wil, Switzerland, pp. 252.2011Roberts DW, Cooper SVConcepts and techniques of vegetation mapping. In: “Land classifications based on vegetation: applications for resource management” (Ferguson D, Morgan P, Johnson FD eds). Gen. Tech. Rep. INT257, Intermountain Forest and Range Experiment Station, USDA Forest Service, Ogden, UT, USA, pp. 9096.1989Rouse JW, Haas RH, Schiell JA, Deferino DW, Harlan JCMonitoring the vernal advancement of retrogradation of natural vegetation. NASA/OSFC, Type III Final Report, Remote Sensing Center, Texas A&M University, College Station, TX, USA, pp. 87.1974Ryu S, Chen J, Crow TR, Saunders SCAvailable fuel dynamics in nine contrasting forest ecosystems in North America. Environmental Management 33 (1): 87107.2004SalinasZavala CA, Douglas AV, Diaz HFInterannual variability of NDVI in northwest Mexico: associated climatic mechanisms and ecological implications. Remote Sensing of Environment 82 (23): 417430.2002Sánchez O, Vega E, Peters E, MonrroyVilchis OConservación de ecosistemas templados de montaña en México [Conservation of temperate mountain ecosystems in Mexico]. Instituto Nacional de Ecología, INESEMARNAT, México, DF, pp. 315. [in Spanish]2003SAS Institute IncSAS OnlineDoc (version 9.2). SAS Institute Inc, Cary NC, USA.2007Sun G, Ranson JKRadiometric slope correction for forest biomass estimation from SAR data in the Western Sayani Mountains, Siberia. Remote Sensing of Environment 79 (23): 279287.2009Sun G, Ranson KJ, Guo C, Zhang Z, Montesano P, Kimes DForest biomass mapping from lidar and radar synergies. Remote Sensing of Environment 115: 290629162011Therneau TM, Atkinson BPackage “rpart” (version 3:152). Web site.2012VargasLarreta BEstimación del potencial de los bosques de Durango para la mitigación del cambio climático. Modelización de la biomasa forestal [Assessing the potential of forests of Durango for mitigating climate change. Modelling of forest biomass]. Proyecto FOMIXDGO2011C01165681, Comisión Nacional de Ciencia y Tecnología, CONACYT, Mexico, pp. 53. [in Spanish]2013Walker W, Baccini A, Nepstad M, Horning N, Knight D, Braun E, Bausch AGuía de campo para la estimación de biomasa y carbono forestal [Field guide to estimate forest biomass and carbon] (version 1.0). Woods Hole Research Center, Falmouth, MA, USA, pp. 53. [in Spanish]2011Wilson JP, Gallant JCDigital terrain analysis. In: “Terrain Analysis: Principles and Applications” (Wilson JP, Gallant JC eds). John Wiley and Sons, Inc, New York, USA, pp. 128.2000Zevenbergen LW, Thorne CRQuantitative analysis of land surface topography. Earth Surface Processes and Landforms 12: 4756.1987Zhao M, Running SWDroughtinduced reduction in global terrestrial net primary production from 2000 through 2009. Science 329 (5994): 940943.2010Zianis D, Mencuccini MOn simplifying allometric analyses of forest biomass. Forest Ecology and Management 187(2): 311332.2004Zinko U, Seibert J, Dynesius, M, Nilsson CPlant species numbers predicted by a topography based groundwaterflow index. Ecosystems 8: 430441.2005
Geographical location of the study area and sample plots used in the study.
Classification tree obtained by the regression tree method. n is the number of sample plots in each node and W is the biomass value for each node (Mg ha^{1}).
Classification tree obtained by pruning the regression tree (left) and plot of the relationships between the costcomplexity parameter (CP), the cross validation error (xval Relative Error) and tree size (number of nodes). The dashed vertical line represents the maximum number of nodes (corresponding costcomplexity parameter) for which the cross validation error is greater than the standard error (right).
Plot of residual values against estimated biomass for groups obtained from the classification tree.
Biomass distribution in the study area.
Statistics of the main stand variables. Dominant height was calculated as the mean height of the 100 thickest trees per hectare.
Variable
Mean
Standarddeviation
Minimumvalue
Maximumvalue
Number of stems per ha
655.47
322.25
224.00
2264.00
Stand basal area (m^{2} ha^{1})
20.30
6.42
8.22
41.12
Dominant height (m)
14.62
3.72
6.87
24.81
Stand biomass (Mg ha^{1})
89.03
43.45
2.70
234.03
Vegetation indexes analyzed in the present study. (NIR): Nearinfrared band (0.83 μm); (RED): Red band (0.63 μm); (ρ): Reflectance; (1ρ(λ)): Absorbed part of radiation; (E_{g}(λ)): the global (direct plus diffuse) solar flux on the ground; (C): Constant value 0.8; (A): Constant value 1; (B): Constant value 0.4; (int_{0.32.5μm}): extrapolation for region of the 0.32.5 μm (bands) of most satellite sensors; (dλ): adjustable parameter used to derive direct albedo on solar zenith angle.
Additional variables for biomass modelling. (N): number of grey levels; (P): normalized symmetric GLCM of dimension N×N; (V): vector difference normalized grey level of dimension N; P(i, j): matrix of cooccurrence normalized, so that Σ_{(i,j=0;N1) Σ(i,j=0;N1) }P(ij); V(k): normalized grey level difference vector Σ_{(i,j=0;N1)}Σ_{(i,j=0;N1)}P(ij) ij  = k; (Z): Average elevation; (R): Point radio altitude units; (As): Drainage area specified; (tan(β)): Local slope angle; (VA): Variance; (ME): Mean. D, F, G and H were derived according to equation of Zevenbergen & Thorne (1987).
Group variable
Variable
Formula
Reference
Texture (NDVI)
Homogeneity (HO)
HO=Σ_{(i,j=0;N1) }i[P_{i,j} /1+(ij)^{2}]
Haralick et al. (1973)
Contrast (CO)
CO=Σ_{(i,j=0;N1) }i P_{i,j} (ij)^{2}
Dissimilarity (DI)
DI=Σ_{(i,j=0;N1)} i P_{i,j} (ij)
Mean (ME)
ME=Σ_{(i,j=0;N1)} i P_{i,j}
Standard Deviation (Sdt)
Sdt=√VA
Entropy (EN)
EN=Σ_{(i,j=0;N1)} i P_{i,j} [ln(i)P_{i,j}]
Angular Second Moment (ASM)
ASM=Σ_{(i,j=0;N1)} i P^{2}_{i,j}
Correlation (CR)
CR=Σ_{(i,j=0;N1)} i P^{i,j} [(iME)(jME)/√(VAi VAj)]
Terrain (DEM)
Elevation
Digital Elevation Model

Slope (β)
β= arctan(G^{2} + H^{2)1/2}

Transformed Aspect (Trasp)
Trasp=1cos[(π /180)(α 30)]/2
Roberts & Cooper (1989)
Terrain Shape Índex (TSI)
TSI=Z/R
McNab (1989)
Wetness Index (WI)
W= ln[As/tan(β)]
Moore & Nieber (1989)
Profile curvature (Ø)
Ø=2[DG^{2}+EH^{2}+FGH]/(G^{2}+H^{2})
Wilson & Gallant (2000)
Plan curvature (ω)
ω=2[DH^{2}+EG^{2}+FGH]/(G^{2}+H^{2})
Curvature (x)
x= ω  Ø
Model obtained for the total database by OLS with stepwise selection of independent variables (*): RMSE expressed as a percentage of mean stand biomass.
Parameter
Estimate
Standard error
OLS
Crossvalidation
RMSE(Mg ha^{1})
R^{2}
RMSE(Mg ha^{1})
R^{2}
Intercept
572.4939
172.2304
27.88 (31.32%)*
0.5833
30.31 (34.04%)*
0.5133
SAVI
0.3673
0.1566
Band 7
2.7166
0.6893
Abs. Shortw. solar rad.
0.0797
0.0245
LAI
0.0166
0.0037
Modified SAVI
2215.8205
717.3984
NDVI
2072.8881
628.6811
Wetness index
3.4027
1.5666
Contrast 7×7
0.2073
0.0926
Model obtained for the final nodes of the regression tree by OLS and stepwise selection of independent variables (*): RMSE expressed as a percentage of mean stand biomass; (**): percentage difference between the crossvalidation RMSE of the hybrid model compared with the same statistic obtained by crossvalidation of the linear model fitted to the complete database.