## Estimation of canopy attributes of wild cacao trees using digital cover photography and machine learning algorithms

iForest - Biogeosciences and Forestry, Volume 14, Issue 6, Pages 517-521 (2021)
doi: https://doi.org/10.3832/ifor3936-014

Short Communications

Surveying canopy attributes while conducting fieldwork in the rain forest is time-consuming. Low-cost imagery such as digital cover photography is a potential source of information to speed up the process of vegetation assessments and reduce costs during expeditions. This study presents an image-based non-destructive method to estimate canopy attributes of wild cacao trees in two regions of the rain forest in Colombia, using digital cover photography and machine learning algorithms. Upward-looking photography at the base of each cacao tree and machine learning algorithms were used to estimate gap fraction (GF), foliage cover (FC), crown cover (CC), crown porosity (CP), clumping index (Ω), and leaf area index (LAI) of the canopy cover. Here we used the cacao wild trees found on forestry plots as a case study to test the application of low-cost imagery on the extraction and analysis of canopy attributes. Canopy attributes were successfully extracted from the canopy cover imagery and provided 92% of classification accuracy for the structural attributes of the canopy. Canopy cover attributes allowed us to differentiate between canopy structures of the Amazon and Pacific rainforests sites suggesting that wild cacao trees are associated with different vegetation types. We also compare classification results for the computer extraction of canopy attributes with a digital canopy cover benchmark. We conclude that our approach was effective to quickly survey canopy features of vegetation associated with and of crop wild relatives of cacao. This study allows highly reproducible estimates of canopy attributes using cover photography and state-of-the-art machine learning algorithms such as deep learning Convolutional Neural Networks.

# Introduction

Colombia is considered as one of the main centers of diversity for crop wild relatives of cacao ([14]). The genus Theobroma and Herrania, as well as wild species of Theobroma cacao L., are the main taxonomic entities of cacao ([11]). They grow in remote areas of rainforests where much of its diversity is present, but accessing those regions is challenging. Studying crop wild relatives is a priority for the conservation of genetic resources ([20]). Unfortunately, the available information about these crop wild relatives of high agricultural, economic, and cultural importance is limited. Accurate estimate of forest canopy structure is central for a wide range of ecological studies and applications. Because of the difficulty of direct measurements, indirect methods have been widely used. Canopy photographic methods are among the most widely used on account of their simple, fast, and cost-effective procedures.

In the past, tree crown attributes have been estimated using vertical digital photography ([3], [22]). Digital cover photography (DCP) is a high resolution, restricted-view angle method, that provides mainly vertical sampling of the canopy ([25], [8], [9], [1], [10]) and is an emerging method to estimate canopy attributes ([6], [10]). Accurate estimates of canopy attributes using DCP overcomes the difficulties of hemispherical photography, which are sensitive to image processing, which are tedious and time-consuming ([7], [1]). Cover photographs also provide higher resolution than hemispherical photographs. In terms of image processing, machine learning algorithms have been used to estimate forest canopy imputation using Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data ([24]), as well as other remote sensing images ([21]). For instance, canopy hemispherical photography (CHP) segmentation and gap fraction (GF) calculation were performed using deep learning neural networks ([19]). Deep learning regression has also been used to make hemispherical photography independent of sunlight illumination conditions ([12]). Here, we use upward-looking DCP (as in [7]) rather than downward-looking photographs because of load constraints during fieldwork in the expeditions as well as constraints in human resources.

This work addresses the issue of how to compute canopy cover properties using DCP and machine learning algorithms. Classification accuracy is estimated using cross-validation and comparison to a digital cover photography benchmark ([15]).

# Materials and methods

## Study area

The study of crop wild relatives (CWR) of cacao trees was performed in the rainforest of three Colombian departments: Caquetá, Putumayo, and Chocó (Fig. S1 on Supplementary material) between 2018 and 2019. The first cacao-BIO expedition traveled to the Caguán and upper Caquetá rivers in Caquetá and Putumayo departments, where five parcels surrounding a wild Theobroma tree were examined across different landscapes (flooded, firm ground, and riverbanks). Later, the second expedition took place in La Victoria municipality part of the Canton de San Pablo (Chocó department) where three parcels around a wild cacao tree in the rain forest were examined. The expedition collected a total of eight upward-looking DCPs, one for each parcel.

## Estimation of canopy properties

A key point to estimate canopy attributes is to separate canopy large gaps from normal size gaps ([25], [6], [7], [1]). One of the latest tools provides a free Python library called “Canopy Cover” (CaCo - ⇒ https:/­/­github.­com/­alivernini/­caco) that performs a segmentation of the canopy and its gaps, as well as a segmentation of large gaps and normal size gaps using statistical methods, from DCP ([1] - see Fig. 1 for an illustration of the canopy, small and large gaps). CaCo does not currently provide all the canopy attributes computed here, since it centers more on providing statistics of the gaps found. In this work, six machine learning algorithms are used to classify upward-looking DCPs into the sky (gaps), leaves, and trunks: K-Nearest Neighbors (KNN - [2]), Support Vector Machines (SVM - [4]), Random Forests (RF - [16]), Extreme Gradient Boost (XGBoost - [5]), Multilayer Perceptron (MLP - [23]) and deep learning Convolutional Neural Networks (CNN - [18]). The algorithm in CaCo that statistically separates large gaps from normal size canopy gaps was used here, based on the sky class, detected using supervised machine learning algorithms. The Python code used here is available in GitHub (⇒ https:/­/­github.­com/­julioduarte2020/­CanopyCover), where a modified version of CaCo is included that estimates all canopy attributes. The innovation of this work is the use of DCP and supervised machine learning algorithms to classify the images and then estimate the canopy attributes.

Fig. 1 - Canopy large and small gaps (black: canopy; white: small gaps; gray: large gaps) in the rain forest in Colombia.

Samples of the sky, trunk, and leaves were selected on each DCP image using the free software MultiSpec (⇒ https:/­/­engineering.­purdue.­edu/­~biehl/­MultiSpec/­) to form the training data. Five-fold cross-validation was used to estimate the performance of each classifier, i.e., the samples selected are randomly split into training samples (80%) and testing samples (20%) five times, covering the training data. The performance of each classifier was measured in terms of classification accuracy, sensitivity, and specificity. KNN, SVM, and RF classifiers were implemented in python using the Sklearn library. XGBoost was implemented in python using the XGBoost library. MLP and CNN classifiers were implemented in python using the Keras library with Tensorflow under the hood. Encouraging results were obtained by setting 5 neighbors and leaf size of 100 for the KNN classifier. The linear SVM classifier was used with default settings. As for RF classifiers, best results were obtained using 100 estimators and default settings. Regarding XGBoost, encouraging results were obtained using 100 estimators and trees as the booster. For the MLP classifier, the best results were obtained using two dense layers of size three with batch normalization ([17]) and Relu activation ([13]). Also, encouraging results were obtained for the CNN classifier using a sliding window of size 9 pixels around each pixel, a first convolutional layer with a kernel of size 3 and 20 filters, batch normalization and dropout layer ([26]) of 0.2; a second convolutional layer with a kernel of size 5 and 40 filters, batch normalization, dropout of 0.2 and max-pooling of size 2×2. After the two previous convolutional layers, a flatten layer is added, followed by two MLP layers of size half of the input of the previous layer, batch normalization, dropout of 0.2, and Relu activation.

With each DCP image classified into trunk, leaves, and sky, the following canopy attributes can be estimated ([1] - eqn. 1 to eqn. 5):

$$FC=1 -{\frac{g_{T}}{p_{C}}} = \frac{L}{p_{C}}$$
$$CC = 1 -{\frac{g_L} {p_C}}$$
$$CP=1 -{\frac{FC} {CC}}$$
$$\Omega = \frac{(1-CP) \ln (1-FC)} {FC \ln (CP)}$$
$$LAI = -CC \cdot {\frac{ \ln CP} {k}}$$

where FC is the foliage coverage, CC is the crown cover, CP is the crown porosity, Ω is the clumping index, LAI is the leaf area index; gT is the total number of pixels of gaps (sky); L is the number of pixels of all leaves; pC is the number of pixels in the image minus the number of pixels of the trunk, i.e., the number of pixels of the canopy; gT/pC= GF is the total gap fraction; gL is the number of pixels of large gaps, estimated as those gaps which size is larger than one standard deviation above the mean of all gaps ([1]); and k is the coefficient of extinction, which is assumed to be 0.5 as in Alivernini et al. ([1]).

## Benchmark

Besides cross-validation classification accuracy and accuracy with respect to the training data, we also tested the best two classifiers: CNN and RF as well as CaCo using a digital canopy cover benchmark ([15]) that consists of 315 DCP images distributed on seven test sites (45 images on each test site), taken in a hemispherical way (zenith angles between 2.5° and 72.5° at intervals of 5°), and using a Terrestrial Laser Scanning (TLS) to estimate the total gap fraction (GF) from 3D point data cloud available from the TLS. From these 315 images, we select 63 DCP images that are upward looking (zenith angles between 2.5° and 12.5°) with their respective GF and effective leaf area index (LAIe) measures. The LAIe can be computed as ([10] - eqn. 6):

$$LAI_{e} = {\frac{ \ln (GF)}{k}} = LAI \cdot \Omega$$

The LAIe was computed as ln(GF)/k using the GF for the benchmark and k=0.5 so that the LAIe of the benchmark corresponds to the same equations used here. The LAIe for the DCP images was computed as LAI · Ω, so that there is correspondence with the eqn. 1 to eqn. 5 used here. Twenty-one images of the benchmark were chosen to select training samples for the trunk, leaves, and sky, based on the availability of those classes on each image.

# Results

## Performance of canopy classification algorithms

Fig. 2 shows the performance of each classifier, where CNN and RF have the best performance in terms of classification accuracy, sensitivity, and specificity.

Fig. 2 - Classification performance.

## Canopy attributes

Fig. 3 shows the classification accuracy for the three best classifiers CNN, RF, and XGBoost, on each test site. As can be seen from these results, CNN seems to perform best for test sites 1, 2, 4, 5, and 6 and worst for site 8. RF and XGBoost have similar performance across all sites, being better than CNN on sites 3, 7, and 8. In general, CNN performs well on all sites except on site 8, where it falls behind RF and XGBoost by 13% in accuracy.

Fig. 3 - Classification accuracy on each test site.

The only difference we found between the image on site 8 and the other site images is that the image on-site 8 is very sunny compared to the images on the other sites, so it is probably due to this factor that CNN does not perform well on this image.

Fig. S2-S6 (Supplementary material) show the estimated canopy attributes using CNN, RF, XGBoost, and CaCo on each test site. The first three sites correspond to Chocó and the last five sites correspond to Caquetá and Putumayo. The different classifiers showed that, in general, the Choco canopies are thicker and denser than the Caquetá and Putumayo sites.

Fig. 4 shows in the x-axis the TLS total gap fraction (GF) of the benchmark versus the estimated GF in the y-axis using (a) CNN, (b) RF, and (c) CaCo. From this figure, RF obtains the best R2 statistic, followed by CaCo, and CNN; while CaCo obtains the best slope. Fig. 5 shows in the x-axis the LAIe estimated using the benchmark GF versus the estimated LAIe from the images in the y-axis using (a) CNN, (b) RF and (c) CaCo. From this figure, RF obtains the best R2 statistic, followed by CaCo and CNN; while CNN obtains the best slope followed by RF and CaCo.

Fig. 4 - Estimated gap fraction (GF) using (a) CNN, (b) RF, and (c) CaCo vs. TLS.

Fig. 5 - Estimated LAIe using (a) CNN, (b) RF, and (c) CaCo vs. TLS.

# Discussion

Estimated canopy cover attributes using CaCo tend to vary less from one site to the next, while they tend to vary more using CNN, RF and XGBoost classifiers, indicating a greater sensitivity to the varying conditions on each test site (Fig. S2-S6 in Supplementary material). The estimation of canopy cover attributes, as given in eqn. 1 to eqn. 5, depends on a good classification of the sky, tree trunks, and leaves, as provided by the best three classifiers, i.e., CNN, RF, and XGBoost. In contrast, CaCo only considers two classes: sky and canopy; however, CaCo results are not that far away from the CNNs, RFs, and XGBoost results, showing that even though Caco was made for sky-canopy classification, it provides also good results. As a matter of fact, in some cases, CaCo results are closer to CNN than RF or XGBoost, also indicating a high sensitivity of the estimated canopy attributes to classification accuracy. RF and XGBoost have similar classification performance (Fig. 3) and similar estimated canopy attributes. The canopy cover benchmark estimate of the GF uses a similar sky-canopy segmentation algorithm ([10]) to CaCo ([1]), despite that CNN, and RF also obtained good prediction accuracies of GF and LAIe overall.

The proposed technique can be used in agroforestry systems to estimate the canopy attributes using upward or downward DCP images, which would allow determining if for instance cacao trees are raised in a well-shaded farm, or if the programming of cultural practices such as pruning the canopy of trees is required. Canopy cover should be higher the warmer and drier the climate is, and there is a non-linear relationship between shade and yield (⇒ https:/­/­climatesmartcocoa.­guide/­entry-points/­shading-and-agroforestry/­). The percentage of shade can be easily computed from the classification images obtained using the proposed supervised classification method.

# Conclusions

A method to estimate canopy cover attributes from upward-looking DCP and machine learning algorithms have been proposed here. Given that canopy cover attributes are very sensitive to classification accuracy, it is of utmost importance to obtain good classification accuracy of the sky, tree trunks, and leaves. Deep learning convolutional neural networks provided, in general, the best classification results, compared to other well-known classification methods. Given that we compare CNN, RF and CaCo against a known benchmark and the results are satisfactory, there is confidence that the estimated canopy attributes using DCP images and machine learning algorithms are close to reality.

# Acknowledgments

This work is part of the “Expedicion Colombia CacaoBIO” project carried out between AGROSAVIA and the Andes University under the special cooperation agreement of the Colombian government no. FP44842-142-2018. The Administrative Department of Science, Innovation of Colombia is acknowledged for financing the project. We thank the professional Angela Sanchez Galán (University of The Andes) for her support in the field activities.

We thank Dr. Francesco Chianucci (CREA-FL, Arezzo, Italy) for providing us with the digital canopy cover benchmark as well as the GF and LAIe measurements for those images, and for his feedback on our research through peer-review.

# References

(1)
Alivernini A, Fares S, Ferrara C, Chianucci F (2018). An objective image analysis method for estimation of canopy attributes from digital cover photography. Trees - Structure and Function 32: 713-723.
CrossRef | Gscholar
(2)
Altman NS (1992). An introduction to kernel and nearest-neighbor nonparametric regression. American Statistician 46: 175-185.
CrossRef | Gscholar
(3)
Brown PL, Doley D, Keenan RJ (2000). Estimating tree crown dimensions using digital analysis of vertical photographs. Agricultural and Forest Meteorology 100: 199-212.
CrossRef | Gscholar
(4)
Chen HF (2009). In silico log p prediction for a large data set with support vector machines, radial basis neural networks and multiple linear regression. Chemical Biology and Drug Design 74: 142-147.
CrossRef | Gscholar
(5)
Cheng T, Guestrin C (2016). XGBoost: a scalable tree boosting system. In: “The Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining”. ACM, San Francisco, CA, USA, pp. 785-794.
CrossRef | Gscholar
(6)
Chianucci F, Cutini A (2012). Digital hemispherical photography for estimating forest canopy properties: Current controversies and opportunities. iForest - Biogeosciences and Forestry 5: 290-295.
CrossRef | Gscholar
(7)
Chianucci F, Chiavetta U, Cutini A (2014a). The estimation of canopy attributes from digital cover photography by two different image analysis methods. iForest - Biogeosciences and Forestry 7: 255-259.
CrossRef | Gscholar
(8)
Chianucci F, Cutini A, Corona P, Puletti N (2014b). Estimation of leaf area index in understory deciduous trees using digital photography. Agricultural and Forest Meteorology 198: 259-264.
CrossRef | Gscholar
(9)
Chianucci F, Disperati L, Guzzi D, Bianchini D, Nardino V, Lastri C, Rindinella A, Corona P (2016). Estimation of canopy attributes in beech forests using true colour digital images from a small fixed-wing UAV. International Journal of Applied Earth Observation and Geoinformation 47: 60-68.
CrossRef | Gscholar
(10)
Chianucci F (2020). An overview of in situ digital canopy photography in forestry. Canadian Journal of Forest Research 50: 227-242.
CrossRef | Gscholar
(11)
Cuatrecasas J (1964). Cacao and its allies: a taxonomic revision of the genus Theobroma. In: “Systematic Plant Studies”. Smithsonian Institution Press, Washington, DC, USA, pp. 379-614.
Online | Gscholar
(12)
Díaz GM, Negri PA, Lencinas JD (2021). Toward making canopy hemispherical photography independent of illumination conditions: a deep-learning-based approach. Agricultural and Forest Meteorology 296: 108234.
CrossRef | Gscholar
(13)
Glorot X, Bordes A, Bengio Y (2011). Deep sparse rectifier neural networks. In: Proceedings of the “14th International Conference on Artificial Intelligence and Statistics” (AISTATS). Fort Lauderdale (FL, USA) 11-13 Apr 2011. Proceedings of Machine Learning Research 15: 315-323.
Online | Gscholar
(14)
González-Orozco CE, Galán AAS, Ramos PE, Yockteng R (2020). Exploring the diversity and distribution of crop wild relatives of cacao (Theobroma cacao L.) in Colombia. Genetic Resources and Crop Evolution 67: 2071-2085.
CrossRef | Gscholar
(15)
Grotti M, Calders K, Origo N, Puletti N, Alivernini A, Ferrara C, Chianucci F (2020). An intensity, image-based method to estimate gap fraction, canopy openness and effective leaf area index from phase-shift terrestrial laser scanning. Agricultural and Forest Meteorology. 280: 107766.
CrossRef | Gscholar
(16)
Ho TK (1995). Random decision forests. In: Proceedings of the “ICDAR - 3rd International Conference on Document Analysis and Recognition”. Montreal (QC, Canada) 14-16 Aug 1995. IEEExplore 1: 278-282.
CrossRef | Gscholar
(17)
Ioffe S, Szegedy C (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the “ICML 2015 - 32nd International Conference on Machine Learning”. Proceedings of Machine Learning Research 37: 448-456.
Online | Gscholar
(18)
Krizhevsky A, Sutskever I, Hinton G (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM 60 (6): 84-90.
CrossRef | Gscholar
(19)
Li K, Huang X, Zhang J, Sun Z, Huang J, Sun C, Xie Q, Song W (2020). A new method for forest canopy hemispherical photography segmentation based on deep learning. Forests 11: 1-16.
CrossRef | Gscholar
(20)
Maxted N, Scholten M, Codd R, Ford-Lloyd B (2007). Creation and use of a national inventory of crop wild relatives. Biological Conservation 140: 142-159.
CrossRef | Gscholar
(21)
Noorian N, Shataee-Jouibary S, Mohammadi J (2016). Assessment of different remote sensing data for forest structural attributes estimation in the Hyrcanian forests. Forest Systems 25 (3): 1-11.
CrossRef | Gscholar
(22)
Patterson MF, Wiseman PE, Winn MF, Lee SM, Araman PA (2011). Effects of photographic distance on tree crown attributes calculated using urbancrowns image analysis software. Arboriculture and Urban Forestry 37: 173-179.
CrossRef | Gscholar
(23)
Rumelhart DE, Hinton GE, Williams RJ (1986). Learning internal representations by error propagation. In: “Parallel Distributed Processing: Explorations in the Microstructure of Cognition” (Rumelhart DE, McClelland JL, Williams RJ eds). MIT Press, Cambridge, MA, USA, vol. 1, pp. 318-362.
Gscholar
(24)
Shataee S, Kalbi S, Fallah A, Pelz D (2012). Forest attribute imputation using machine-learning methods and ASTER data: comparison of k-NN, SVR and random forest regression algorithms. International Journal of Remote Sensing 33: 6254-6280.
CrossRef | Gscholar
(25)
Smith M-L, Anderson J, Fladeland M (2008). Forest canopy structural properties. In: “Field Measurements for Forest Carbon Monitoring” (Hoover CM ed). Springer, Dordrecht, Netherlands, pp. 179-176.
CrossRef | Gscholar
(26)
Srivastava M, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014). Dropout: a simple way to prevent neural networks from overfitting. Machine Learning Research 15: 1929-1958.
Online | Gscholar

#### Authors’ Affiliation

(1)
Julio Martin Duarte-Carvajalino 0000-0001-7117-2051
Corporación Colombiana de Investigación Agropecuaria - AGROSAVIA, Centro de Investigación Tibaitatá, Km 14 vía Mosquera, Bogotá, Cundinamarca (Colombia)
(2)
Mónica Paramo-Alvarez 0000-0001-8682-2651
Corporación Colombiana de Investigación Agropecuaria - AGROSAVIA, Sede Central, Km 14 vía Mosquera, Bogotá (Colombia)
(3)
Pablo Fernando Ramos-Calderón 0000-0002-0748-8534
Corporación Colombiana de Investigación Agropecuaria - AGROSAVIA, Centro de Investigación Nataima, Km 9 vía Espinal-Chicoral, Tolima, Sede Florencia, Caquetá (Colombia)
(4)
Carlos Eduardo González-Orozco 0000-0002-4593-9113
Corporación Colombiana de Investigación Agropecuaria - AGROSAVIA, Centro de Investigación La Libertad, Km 14 vía Villavicencio, Puerto López, Meta (Colombia)

#### Corresponding author

Julio Martin Duarte-Carvajalino
jmduarte@agrosavia.co

#### Citation

Duarte-Carvajalino JM, Paramo-Alvarez M, Ramos-Calderón PF, González-Orozco CE (2021). Estimation of canopy attributes of wild cacao trees using digital cover photography and machine learning algorithms. iForest 14: 517-521. - doi: 10.3832/ifor3936-014

#### Paper history

Accepted: Sep 08, 2021

First online: Nov 17, 2021
Publication Date: Dec 31, 2021
Publication Time: 2.33 months

© SISEF - The Italian Society of Silviculture and Forest Ecology 2021

#### Breakdown by View Type

(Waiting for server response...)

#### Article Usage

Total Article Views: 6576
(from publication date up to now)

Breakdown by View Type
HTML Page Views: 5495
Abstract Page Views: 224

Web Metrics
Days since publication: 385
Overall contacts: 6576
Avg. contacts per week: 119.56

Article citations are based on data periodically collected from the Clarivate Web of Science web site
(last update: Jul 2021)

(No citations were found up to date. Please come back later)

#### iForest Database Search

Search By Author

Search By Keyword

Citing Articles

Search By Author

Search By Keywords

#### PubMed Search

Search By Author

Search By Keyword