TL;DR: BIRCH-Trees is the first benchmark for joint individual tree height estimation and species identification from UAV images. Our model, DINOvTree, leverages a VFM to extract features and predicts the height and species of the center tree in the input image with two separate heads.
Accurate estimation of forest biomass, a major carbon sink, relies heavily on tree-level traits such as height and species. Unoccupied Aerial Vehicles (UAVs) capturing high-resolution imagery from a single RGB camera offer a cost-effective and scalable approach for mapping and measuring individual trees.
We introduce BIRCH-Trees, the first benchmark for individual tree height and species estimation from tree-centered UAV images, spanning three datasets: temperate forests, tropical forests, and boreal plantations.
We also present DINOvTree, a unified approach using a Vision Foundation Model (VFM) backbone with task-specific heads for simultaneous height and species prediction. Through extensive evaluations on BIRCH-Trees, we compare DINOvTree against commonly used vision methods, including VFMs, as well as biological allometric equations. We find that DINOvTree achieves top overall results with accurate height predictions and competitive classification accuracy while using only 54% to 58% of the parameters of the second-best approach.
BIRCH-Trees (Benchmark for Individual Recognition of Class and Height of Trees) is the first benchmark for individual tree height and species estimation from tree-centered UAV images. We formulate individual tree height estimation and species identification as regression and classification tasks, respectively. We provide high-resolution RGB drone images, alongside corresponding Digital Surface Models (DSMs), of tree canopies from three distinct environments:
Representative examples from the Quebec Trees dataset test split, illustrating one example per class alongside its corresponding ground truth class and height.
Representative examples from the BCI dataset test split, illustrating one example per class alongside its corresponding ground truth class and height.
Representative examples from the Quebec Plantations dataset test split, illustrating one example per class alongside its corresponding ground truth class and height.
We compare our DINOvTree against competing methods in terms of parameter count, height estimation, and classification on the three datasets of BIRCH-Trees. We group models by parameter count: fewer than 300M (top) and more than 300M (bottom). For competing methods, 'Param.' denotes combined parameters of two independently-trained task-specific models. We report the mean ± standard error over 5 seeds. Bold and underline denote best and second-best. * indicates an oracle with ground truth class and segmentation; † denotes the Mask R-CNN variant from Hao et al. and Fu et al.; ❄ indicates a frozen backbone.
| Method | Param. (M) ▼ |
Height Estimation | Classification | ||||
|---|---|---|---|---|---|---|---|
| δ1.25 (%) ▲ | MSLE (10-2) ▼ | MAE (m) ▼ | RMSE (m) ▼ | F1 (%) ▲ | Acc (%) ▲ | ||
| Allometric equations* | 0 | 48.74 | 10.97 | 2.98 | 3.73 | - | - |
| Mask R-CNN† | 88 | 74.34 ± 0.32 | 4.09 ± 0.06 | 1.73 ± 0.02 | 2.26 ± 0.02 | 65.49 ± 0.45 | 65.86 ± 0.60 |
| ResNet50 | 47 | 68.16 ± 0.13 | 5.45 ± 0.09 | 1.96 ± 0.01 | 2.48 ± 0.01 | 76.67 ± 0.38 | 74.01 ± 0.50 |
| ResNet50 w/ DSM | 47 | 68.11 ± 0.16 | 5.12 ± 0.05 | 1.96 ± 0.01 | 2.47 ± 0.01 | 76.36 ± 0.36 | 73.58 ± 0.50 |
| ConvNext-B | 175 | 84.82 ± 0.39 | 2.47 ± 0.08 | 1.32 ± 0.02 | 1.71 ± 0.02 | 84.87 ± 0.42 | 82.91 ± 0.40 |
| SwinV2-B | 174 | 77.39 ± 0.87 | 3.38 ± 0.12 | 1.73 ± 0.04 | 2.17 ± 0.05 | 85.78 ± 0.45 | 83.77 ± 0.56 |
| MambaVision-B | 193 | 68.47 ± 0.70 | 60.19 ± 4.47 | 2.36 ± 0.06 | 3.42 ± 0.10 | 85.37 ± 0.28 | 83.42 ± 0.40 |
| AnySat | 250 | 78.03 ± 0.28 | 3.35 ± 0.05 | 1.57 ± 0.01 | 2.04 ± 0.01 | 69.99 ± 0.61 | 68.98 ± 0.44 |
| PECore-B | 187 | 84.85 ± 1.03 | 2.31 ± 0.14 | 1.29 ± 0.04 | 1.68 ± 0.05 | 86.82 ± 0.33 | 84.57 ± 0.53 |
| DINOv3-B | 171 | 86.35 ± 0.26 | 2.10 ± 0.04 | 1.25 ± 0.01 | 1.62 ± 0.01 | 87.46 ± 0.53 | 85.10 ± 0.80 |
| DINOvTree-B (Ours) | 100 | 88.29 ± 0.32 | 1.85 ± 0.05 | 1.17 ± 0.02 | 1.54 ± 0.02 | 86.89 ± 0.31 | 85.03 ± 0.45 |
| DINOv3-L-Sat (❄) | 606 | 53.07 ± 0.04 | 10.71 ± 0.02 | 2.81 ± 0.00 | 3.40 ± 0.00 | 27.43 ± 0.12 | 25.03 ± 0.11 |
| DINOv3-L (❄) | 606 | 58.28 ± 0.06 | 8.42 ± 0.01 | 2.48 ± 0.00 | 3.06 ± 0.00 | 28.76 ± 0.07 | 26.64 ± 0.06 |
| DINOv3-L | 606 | 88.03 ± 0.38 | 1.86 ± 0.04 | 1.17 ± 0.01 | 1.53 ± 0.01 | 88.85 ± 0.37 | 86.92 ± 0.57 |
| DINOvTree-L (Ours) | 328 | 89.28 ± 0.28 | 1.71 ± 0.04 | 1.12 ± 0.02 | 1.46 ± 0.02 | 88.38 ± 0.38 | 87.25 ± 0.36 |
| Method | Param. (M) ▼ |
Height Estimation | Classification | ||||
|---|---|---|---|---|---|---|---|
| δ1.25 (%) ▲ | MSLE (10-2) ▼ | MAE (m) ▼ | RMSE (m) ▼ | F1 (%) ▲ | Acc (%) ▲ | ||
| ConvNext-B | 175 | 85.91 ± 0.96 | 2.53 ± 0.07 | 3.68 ± 0.09 | 4.75 ± 0.08 | 45.99 ± 0.55 | 48.16 ± 0.63 |
| SwinV2-B | 174 | 86.38 ± 0.69 | 2.53 ± 0.09 | 3.51 ± 0.07 | 4.64 ± 0.07 | 45.73 ± 1.21 | 46.94 ± 1.42 |
| PECore-B | 187 | 89.22 ± 0.49 | 2.18 ± 0.07 | 3.21 ± 0.04 | 4.24 ± 0.06 | 41.92 ± 1.94 | 42.31 ± 1.59 |
| DINOv3-B | 171 | 85.74 ± 0.69 | 2.61 ± 0.09 | 3.61 ± 0.07 | 4.69 ± 0.07 | 46.44 ± 1.32 | 46.44 ± 1.32 |
| DINOvTree-B (Ours) | 100 | 91.25 ± 0.40 | 1.95 ± 0.05 | 3.11 ± 0.08 | 4.14 ± 0.10 | 48.96 ± 0.85 | 50.22 ± 1.00 |
| ConvNext-L | 392 | 86.14 ± 0.74 | 2.49 ± 0.07 | 3.63 ± 0.10 | 4.71 ± 0.09 | 46.98 ± 0.70 | 48.79 ± 0.71 |
| PECore-L | 634 | 91.88 ± 0.49 | 1.93 ± 0.03 | 3.01 ± 0.02 | 3.99 ± 0.03 | 51.93 ± 1.17 | 52.97 ± 0.59 |
| DINOv3-L | 606 | 86.78 ± 1.39 | 2.32 ± 0.12 | 3.41 ± 0.10 | 4.44 ± 0.11 | 53.49 ± 0.61 | 53.87 ± 1.20 |
| DINOvTree-L (Ours) | 328 | 94.14 ± 0.23 | 1.68 ± 0.01 | 2.75 ± 0.01 | 3.75 ± 0.02 | 54.39 ± 0.67 | 55.58 ± 0.63 |
| Method | Param. (M) ▼ |
Height Estimation | Classification | ||||
|---|---|---|---|---|---|---|---|
| δ1.25 (%) ▲ | MSLE (10-2) ▼ | MAE (m) ▼ | RMSE (m) ▼ | F1 (%) ▲ | Acc (%) ▲ | ||
| ConvNext-B | 175 | 84.70 ± 0.31 | 1.63 ± 0.04 | 0.39 ± 0.00 | 0.61 ± 0.01 | 83.78 ± 1.47 | 85.95 ± 1.47 |
| SwinV2-B | 174 | 80.37 ± 0.58 | 2.16 ± 0.11 | 0.47 ± 0.01 | 0.71 ± 0.02 | 79.33 ± 2.34 | 80.21 ± 2.22 |
| PECore-B | 187 | 84.57 ± 0.31 | 1.56 ± 0.04 | 0.39 ± 0.01 | 0.58 ± 0.01 | 84.34 ± 1.36 | 84.67 ± 1.92 |
| DINOv3-B | 171 | 86.71 ± 0.39 | 1.48 ± 0.04 | 0.37 ± 0.00 | 0.58 ± 0.01 | 82.56 ± 1.86 | 84.00 ± 1.76 |
| DINOvTree-B (Ours) | 100 | 85.13 ± 0.40 | 1.54 ± 0.03 | 0.38 ± 0.00 | 0.57 ± 0.01 | 81.88 ± 1.20 | 83.97 ± 0.45 |
| ConvNext-L | 392 | 84.34 ± 0.32 | 1.65 ± 0.02 | 0.40 ± 0.00 | 0.61 ± 0.01 | 84.46 ± 0.45 | 85.25 ± 0.87 |
| PECore-L | 634 | 86.74 ± 0.74 | 1.44 ± 0.04 | 0.36 ± 0.01 | 0.58 ± 0.01 | 86.61 ± 0.70 | 86.90 ± 1.35 |
| DINOv3-L | 606 | 86.26 ± 0.59 | 1.39 ± 0.03 | 0.36 ± 0.00 | 0.57 ± 0.01 | 85.94 ± 0.37 | 85.80 ± 0.38 |
| DINOvTree-L (Ours) | 328 | 85.09 ± 0.26 | 1.47 ± 0.02 | 0.37 ± 0.00 | 0.58 ± 0.01 | 87.80 ± 1.19 | 87.63 ± 1.33 |
@article{endres2026treeheightspecies,
title = {Estimating Individual Tree Height and Species from UAV Imagery},
author = {Endres, Jannik and Lalibert{\'e}, Etienne and Rolnick, David and Ouaknine, Arthur},
journal = {arxiv:ToDo [cs.CV]},
year = {2026}
}