This paper focuses on the problem of regionalization of the height-diameter model at the stand level. To this purpose, we selected two different modeling techniques. As a parametric method, we chose a linear mixed effects model (LME) with calibrated conditional prediction, whose calibration was carried out on randomly selected trees either close to mean diameter or within three diameter intervals throughout the diameter range. As a nonparametric method, the technique of classification and regression trees (CART) was chosen. These two methods were also compared with the local model created by ordinary least squares regression. The results show that LME with calibrated conditional prediction based on measurements of height at three diameter intervals provided results very close to the local model, especially when six to nine trees are measured. We recommend this technique for the regionalization of the global model. The CART method provided worse results than LME, with the exception of parameters of the residual distribution. Nevertheless, the latter approach is very user-friendly, as the regression tree creation and especially its interpretation are relatively simple, and could be recommended when larger deviations are allowed.
Modeling the relationship between the tree height and its diameter at breast height is one of the most common and oldest paradigms in forestry, especially in forest inventory (
The use of models with mixed effects is a successful approach to overcome the problem of spatial variability in the data. Mixed models include a fixed component, which covers the entire analyzed dataset (
The main advantage of using a mixed model is the possibility to calibrate an already existing model (
Random parameters have a value of 0. There are no measured values of the dependent variable, so the calibration is done with only fixed effects (often referred to as “fixed effects marginal prediction”), which is related to the entire population.
Random parameters are estimated separately for each sampling unit. It includes at least one (or better several) measured values of the dependent variable. It is a calibration with fixed and random effects (also referred to as “calibrated conditional prediction”), which applies to individual areas.
These calibration techniques have a wide range of applications, though so far they have been most frequently used in modeling growth (
When trees in the experimental area are not enough for applying the calibrated conditional prediction, LME with fixed effects marginal predictions has to be used for tree height modeling, instead of the LME model with calibrated conditional predictions. In such cases, a suitable alternative could be the method of classification and regression trees (CART), which is a nonparametric technique that effectively use local data (
Although far less used in forestry than the LME, the CART method has been successfully applied for modeling tree mortality (
The aim of this paper is to compare the use of the LME (parametric) and the CART (nonparametric) models to regionalize the global model of height-diameter relationship.
The study site is located in the Training Forest Enterprise Masaryk Forest Krtiny (lat. 49° 17′ 43″ N, long. 16°45′ 01″ E), southern Moravia (Czech Republic), north of Brno. Forty-six circular sample plots were selected in 23 different Norway spruce (
All the analyses were carried out using the software packages R (
(ii) the Petterson function (
(iii) the Näslund function (
(iv) the Levakovic function (
(v) the Meyer function (
where
To select the best function for the global model, the following criteria were considered: (i) mean of residuals; (ii) residual standard deviation; (iii) standard error of residuals; (iv) Akaike’s information criterion (AIC -
We built the actual mixed model as a two-level model, according to the methodology proposed by
where
Making the first-level model consists of three steps: (1) the global model; (2) the model with random effect of intercept only; and (3) the model with random effect of intercept and regression parameter. Through these steps, we tested the significance of the structure in the data, the importance of the predictor, and the random effect of the model. The significance of each model was tested using the likelihood ratio test and the AIC. After the first-level model was obtained, we fitted the second-level model by testing the contributions of the following stand variables: number of trees per ha, age of the forest stand, stand basal area, mean diameter, and the site index. In some cases, the logarithmic transformation was applied to stand variables because of the non-linear relationships observed between stand variables and the model parameter estimates.
The inclusion of the stand variables in the second-level model was done using the following procedure: the stand variable showing the highest correlation with the parameter estimates was included in the model; after the inclusion, the relation of the remaining stand variables on parameter estimation was tested again. We repeated the above procedure until the parameter estimates showed a statistically significant relationship (correlation coefficient) with some of the tested stand variables.
The mixed model obtained was calibrated using the calibrated conditional prediction in order to extend its predictability to the other plots. This step requires at least one dependent variable value to be measured in a given plot (in this case, the tree height) in order to calculate the random component of the parameter estimates. We can write the calibrated conditional prediction using the equation by Calama & Montero (
where the meaning of the symbols is the same as in
For the calculation of the random components of parameter estimation, we used the best linear unbiased predictor (BLUP) proposed by
where
The calibration process was carried out using two different procedures of tree selection. In the first procedure, we measured the height of 1 to 5 randomly selected trees that were in the range of ± 2 cm of the mean diameter. In the second case, three diameter (
The calibration was performed on six forest stands that differed significantly in age and other stand characteristics. To assess the quality of the mixed model as compared with local models, we used the following criteria: (i) coefficient of determination (
where
Because tree height is a continuous variable, the regression tree method was selected, whose results are homogeneous groups of individuals sharing the same predicted height (
The regression tree model was tested on the same data used for the mixed model described above, but the original dataset was divided into a training (860 records) and a validation (730 records) subsets. The training subset was used to compile the regression tree model, while the validation subset to verify the accuracy of the model.
In the process of binary data splitting of the tree nodes, the algorithm tries to reduce the residual sum of squares. To prevent model overfitting and ensure a good estimate of
Ten-fold cross-validation pruning technique was chosen to reduce the number of branching of the regression trees achieved. After cross-validation, the regression tree models were tested on the validation subset. The quality of the resulting trees was determined using both subsets based on the following criteria: (i) coefficient of determination; (ii) mean of residuals; (iii) standard error of residuals; (iv) residual standard deviation.
We compared the results of the mixed model and regression tree using data from six forest stands for which the mixed model calibration was carried out. In addition, the results of both methods were compared with those obtained with a local linear model of the height-diameter function fitted using the classical ordinary least squares (OLS) method. As comparison criteria, we have used the coefficient of determination (
The resulting global linear model using the Petterson H-D function can be represented by the following formula (
where
All subsequent stages of mixed model (global model, the model with random effect of intercept only and model with random effect of intercept and regression parameter) were compared using the likelihood ratio test.
When compiling the second-level model, we tested the contribution of individual stand variables. The obtained results suggest that the estimates of the parameter
where (
and (
in which
The second-level mixed-effects model showed a better fit on data as compared with the first-order model (likelihood ratio test: p < 0.001, AIC = -10927.7). This was confirmed by its lower dispersion of residuals, as depicted in
The calibrated conditional prediction of tree height based on a random sample of 1 to 5 trees within ± 2 cm around the mean diameter provided biologically inconsistent results. Indeed, an inverse relationship was found, thereby tree height was predicted to decrease when diameter increases. Such inconsistency could not be eliminated even by using a larger number of measurements (max 5). Due to the poor quality of this calibration, a low coefficient of determination was obtained, as well as a high average deviation of fitted values of the mixed model and fitted values of the local model.
The second calibration method (tree height measured in three diameter intervals - see above) yielded significantly better results.
In the CART model, all the selected variables were significant in the resulting regression tree, with a degree of significance of 100 for dbh, 97 for the mean diameter, 94 for stand age, and 57 for the site index. The resulting tree consists of 19 splitting nodes and 20 homogeneous groups (final nodes).
Further, based on the results from the CART model, trees from stands with lower site index were taller than those from stands with higher site index, which also seems illogical. A deeper analysis of the stands with lower site index revealed that the selected trees in these stands were mostly dominant and growing in better conditions than trees with mean diameter and height. Indeed, the site index in this study was not determined according to the height of dominant trees but to that of trees with mean height. The regression tree therefore correctly ranked these individuals into one homogeneous group based on more predictors than only site index (especially according to diameter at breast height). All the independent variables used in the CART model are thus biologically justified.
The goodness-of-fit statistics obtained with each regression technique are compared in
The LME technique provided good parameter estimates of the residual distribution, though their values were higher than those of the CART model. However, its predictions were very close to those achieved with the local model using the OLS method (
The results of this study revealed that tree height can be accurately predicted based on diameter at the breast height by including the stand age and its logarithmic transformation as random factors in the mixed model considered. Moreover, the mixed model removed the heteroscedasticity occurring in the global model (
The applicability of the proposed mixed model in practice has been assessed by comparing the effect of different ways of selecting trees for height measurement to be used in the calibrated conditional prediction. Using the first method, 1 to 5 trees within ± 2 cm from the mean diameter were selected, while the second was based on 3, 6, or 9 trees randomly selected within 4 cm intervals from the thinnest, medium and thickest trees. Other papers also included two ways of tree selection, namely random selection of different number of trees and a selection focused only on a portion of the diameter range. For example, random selection of trees has been applied by
Calibration results showed that random selection of trees close to the mean diameter is not feasible, as it provides predictions too biased compared to local models or not biologically justified (height decreasing as the diameter increases).
The second method, based on the selection of trees from more diameter intervals, proved to yield better results in this study. Similarly,
The results obtained using the CART model indicated that tree height depends on the diameter at breast height, mean diameter, stand age, and site index. Strongly nonlinear relationships exist between the height and the above variables. According to
Indeed, the strong dependence among independent variables (
Our results showed that tree height can be quite easily predicted using the CART method. CART models have been previously applied to estimate different variables in forestry, such as tree mortality (
In this study, the comparison of mixed model, regression tree, and the local model indicated that LME performed better than CART in predicting tree height based on dbh. The CART method also provided results significantly worse than those obtained with the OLS method (local model), except for parameters of the residual distribution.
Regarding the comparison between LME and OLS models, we found that predictions obtained with LME only slightly differs from those of the local model.
The authors thank two anonymous reviewers whose comments helped to improve an earlier version of the manuscript. We would also like to thank Jan Kadavý for his valuable comments.
Comparison of residuals (grey circles) of the global model (a) and the second-level mixed effects model (b) for the prediction of tree height. Black dots represent the mean of residuals in each diameter class, while the black lines represent its confidence intervals.
Basic characteristics of tree and stand variables. (DBH): diameter at breast height of a tree; (H): height; (BA): basal area; (V): volume; (DBHm): mean diameter; (Hm): mean height; (BAs): stand basal area; (Vm): mean tree volume; (N): number of trees; (SI): site index; (A): stand age; (95% CI): 95% confidence interval of mean value; (min): minimal value; (Max): maximal value; (STD): standard deviation.
Level | Variable | Mean ± 95% CI | Min | Max | STD |
---|---|---|---|---|---|
Tree | DBH (cm) | 32.2 ± 0.5 | 9.0 | 75.0 | 10.9 |
H (m) | 28.4 ± 0.3 | 10.0 | 41.5 | 6.4 | |
BA (cm2) | 911.0 ± 29.7 | 63.6 | 4417 | 603 | |
V (m3) | 1.17 ± 0.04 | 0.01 | 6.55 | 0.91 | |
Stand | DBHm (cm) | 33.1 ± 0.4 | 18.6 | 47.4 | 8.3 |
Hm (m) | 28.8 ± 2.8 | 14.6 | 36.5 | 6.4 | |
BAs (m2 ha-1) | 35.7 ± 0.3 | 22.2 | 46.3 | 6.34 | |
Vm (m3) | 1.17 ± 0.03 | 0.16 | 2.56 | 0.66 | |
N (n ha-1) | 498.0 ± 13 | 221 | 1240 | 272 | |
SI (m) | 33.6 ± 0.1 | 32 | 38 | 1.7 | |
A (years) | 81.6 ± 1.6 | 30 | 136 | 33.0 |
Comparison of global height-diameter models. (
Function | AIC | |||
---|---|---|---|---|
Michailov ( |
0.0103 | 2.8101 | 0.0705 | 7802.8 |
Petterson ( |
-0.0006 | 2.8023 | 0.0703 | 7794.0 |
Näslund ( |
-0.0071 | 2.8081 | 0.0704 | 7800.6 |
Levakovic ( |
0.0085 | 2.8069 | 0.0704 | 7799.3 |
Meyer ( |
-0.0313 | 2.8244 | 0.0708 | 7819.2 |
Comparison of first-level models. (
Model |
|
|
AIC | p | |||
---|---|---|---|---|---|---|---|
Global | 0.2648 | 2.0848 | - | - | 0.0149 | -8861 | - |
Random effect of |
0.2952 | 1.2624 | 0.0192 | - | 0.0076 | -10863 | < 0.001 |
Random effect of |
0.2944 | 1.2919 | 0.0194 | 0.1726 | 0.0001 | -10881 | < 0.001 |
Goodness-of-fit of the calibrated second-level mixed effects model using 3 to 9 tree height measurements. (N): number of measured trees; (
Statistics | N | Forest stand no. | Mean(±95% CI) | |||||
---|---|---|---|---|---|---|---|---|
3 | 5 | 11 | 14 | 19 | 22 | |||
3 | 0.549 | 0.752 | 1.144 | 1.032 | 0.694 | 1.315 | 0.914 ± 0.310 | |
6 | 0.542 | 0.339 | 0.665 | 0.764 | 0.590 | 1.115 | 0.669 ± 0.295 | |
9 | 0.400 | 0.239 | 0.632 | 0.642 | 0.299 | 0.493 | 0.451 ± 0.177 | |
3 | 0.763 | 0.702 | 0.730 | 0.200 | 0.534 | 0.390 | 0.553 | |
6 | 0.777 | 0.817 | 0.812 | 0.265 | 0.569 | 0.433 | 0.612 | |
9 | 0.809 | 0.835 | 0.818 | 0.307 | 0.623 | 0.512 | 0.651 | |
local | 0.848 | 0.854 | 0.860 | 0.368 | 0.644 | 0.534 | 0.685 | |
AIC | 3 | 18.421 | 42.177 | 98.792 | 120.580 | 103.958 | 131.355 | 85.880 |
6 | 17.199 | 11.372 | 76.200 | 110.891 | 99.130 | 127.379 | 73.695 | |
9 | 10.095 | 6.847 | 74.243 | 106.892 | 87.499 | 119.566 | 67.524 | |
local | -1.140 | -6.145 | 54.581 | 100.960 | 82.563 | 117.055 | 57.979 | |
RMSE | 3 | 1.191 | 1.333 | 1.901 | 2.458 | 1.785 | 3.186 | 1.976 |
6 | 1.166 | 1.060 | 1.615 | 2.256 | 1.728 | 3.074 | 1.817 | |
9 | 1.084 | 1.012 | 1.591 | 2.190 | 1.617 | 2.858 | 1.725 | |
local | 0.971 | 0.952 | 1.401 | 2.093 | 1.573 | 2.794 | 1.631 | |
|
3 | 0.921 | 1.022 | 1.530 | 1.889 | 1.417 | 2.461 | 1.540 |
6 | 0.893 | 0.789 | 1.288 | 1.808 | 1.394 | 2.376 | 1.425 | |
9 | 0.825 | 0.952 | 1.261 | 1.757 | 1.288 | 2.200 | 1.380 | |
local | 0.768 | 0.698 | 1.074 | 1.636 | 1.251 | 2.176 | 1.267 | |
3 | 7.003 | 9.597 | 6.695 | 11.070 | 15.330 | 14.798 | 10.749 | |
6 | 7.500 | 6.420 | 8.304 | 10.867 | 17.376 | 17.215 | 11.280 | |
9 | 5.565 | 6.510 | 8.097 | 9.912 | 13.852 | 15.041 | 9.830 | |
local | 5.794 | 6.611 | 9.888 | 14.152 | 12.311 | 17.593 | 11.058 |
Results of the regression tree construction. (LB): left branch; (RB): right branch; (N): number of trees in node; (
Node | LB of the node | RB of the node | N | SP | Value of SP | |
---|---|---|---|---|---|---|
1 | 2 | 3 | 860 | 27.97 | dbhm | 30.1 |
2 | 4 | 5 | 380 | 22.09 | A | 48 |
4 | 6 | 7 | 197 | 19.09 | dbh | 20.5 |
6 | 8 | 9 | 93 | 17.04 | dbh | 12.5 |
8f | - | - | 11 | 12.82 | - | - |
9 | 10 | 11 | 82 | 17.60 | dbh | 16.5 |
10f | - | - | 26 | 16.29 | - | - |
11 | 12 | 13 | 56 | 18.21 | A | 39 |
12f | - | - | 41 | 17.54 | - | - |
13f | - | - | 15 | 20.07 | - | - |
7 | 14 | 15 | 104 | 20.92 | A | 39 |
14f | - | - | 47 | 19.16 | - | - |
15f | - | - | 57 | 22.37 | - | - |
5 | 18 | 19 | 183 | 25.33 | dbh | 23.5 |
18 | 20 | 21 | 78 | 22.54 | dbh | 16.5 |
20f | - | - | 16 | 19.63 | - | - |
21f | - | - | 62 | 23.29 | - | - |
19 | 24 | 25 | 105 | 27.40 | SI | 32 |
24f | - | - | 46 | 29.01 | - | - |
25 | 26 | 27 | 59 | 26.14 | dbh | 30.5 |
26f | - | - | 42 | 25.62 | - | - |
27f | - | - | 17 | 27.44 | - | - |
3 | 28 | 29 | 480 | 32.62 | dbh | 32.5 |
28 | 30 | 31 | 126 | 28.99 | dbh | 23.5 |
30f | - | - | 16 | 24.56 | - | - |
31 | 32 | 33 | 110 | 29.64 | A | 75 |
32f | - | - | 23 | 27.85 | - | - |
33 | 34 | 35 | 87 | 30.11 | dbhm | 39.3 |
34f | - | - | 18 | 31.81 | - | - |
35f | - | - | 69 | 29.67 | - | - |
29 | 40 | 41 | 354 | 33.91 | dbh | 46.5 |
40 | 42 | 43 | 269 | 33.18 | A | 75 |
42f | - | - | 25 | 29.70 | - | - |
43 | 44 | 45 | 244 | 33.53 | dbhm | 39.3 |
44f | - | - | 45 | 35.52 | - | - |
45 | 46 | 47 | 199 | 33.08 | dbh | 40.5 |
46f | - | - | 111 | 32.51 | - | - |
47f | - | - | 88 | 33.80 | - | - |
41f | - | - | 85 | 36.22 | - | - |
Goodness-of-fit statistics obtained using the CART model. (
Statistics | Training data | Validation data |
---|---|---|
|
0.940 | 0.804 |
|
0.000 | -0.585 |
|
0.054 | 0.102 |
|
1.590 | 2.746 |
Comparison of goodness-of-fit statistics for LME (mixed model), CART (regression tree model) and OLS (local model) techniques. (
Model | Stats | Forest stand no. | Mean | |||||
---|---|---|---|---|---|---|---|---|
3 | 5 | 11 | 14 | 19 | 22 | |||
LME | 0.809 | 0.835 | 0.818 | 0.307 | 0.623 | 0.512 | 0.651 | |
0.825 | 0.952 | 1.261 | 1.757 | 1.288 | 2.200 | 1.380 | ||
5.565 | 6.510 | 8.097 | 9.912 | 13.852 | 15.041 | 9.830 | ||
0.400 | 0.239 | 0.632 | 0.642 | 0.299 | 0.493 | 0.451 | ||
CART | 0.783 | 0.727 | 0.767 | 0.202 | 0.570 | 0.465 | 0.586 | |
|
0.089 | 0.094 | 0.148 | -0.530 | -0.021 | 0.144 | -0.013 | |
1.146 | 1.293 | 1.788 | 2.272 | 1.718 | 2.963 | 1.863 | ||
3.546 | 2.303 | 2.220 | 1.258 | 1.179 | 1.216 | 1.954 | ||
OLS | 0.848 | 0.854 | 0.860 | 0.368 | 0.644 | 0.534 | 0.685 | |
|
0.768 | 0.698 | 1.074 | 1.636 | 1.251 | 2.176 | 1.267 | |
5.794 | 6.611 | 9.888 | 14.152 | 12.311 | 17.593 | 11.058 |