Lidar and SfM Point Cloud Python KDTree Comparison

This short entry describes a comparison of KDTree implementations for Lidar PointClouds (PC) and Structure-from-Motion (SfM) dataset.

A more detailed analysis is found on Comparing Python KD-Tree Implementations with Focus on Point Cloud Processing and the github repository LidarPC-KDTree.

KDTree and KNN Algorithms

One of the core processing steps for irregular PC is to understand the neighborhood for each point (kNN - k-Nearest-Neighbors). This is often done using KD Trees. There exist myriad of implementations for various applications and KD Trees have become an important tool for Deep Learning that have been implemented in kNN (k-nearest neighbor) algorithms. Many of the approaches have been optimized for multi-dimensional datasets (n>5 and up to 50). In the recent months and years, KD-Trees relying on CUDA or OpenCL implementations have become more coming and easily approachable through Python or Matlab interfaces.

Here, we briefly explore existing algorithms and test, which one perform favorable for 3D lidar PC (or SfM PC). We only use three dimensions (x, y, z), but future implementation may rely on four (intensity) or higher dimensional lidar PC. We focus on implementations accessible through Python or C.

We note that there exist other algorithm and parameter comparisons (e.g. knn-benchmarking in python and knn-benchmarking) and these are very useful and helpful – but these are neither tailored for lidar/SfM PC nor have been using recent implementations. Most comparison also focus on the general applicability of KD-Tree algorithm and explore the impact of leaf sizes and dimensionality - both parameters do not change for lidar PC.

Method and Approach

We construct the following scenarios:

  1. Deriving k=5,10,50,100,500,1000 nearest neighbors from four lidar/SfM point clouds with 14e6, 38e6, 69e6, and 232e6 (million) points.
  2. We time the generation of a KD-Tree and the queries separately for each.
  3. Searching for neighbors within a given search radius/sphere (not supported by all algorithms).
  4. The k-nearest neighbors can be used to estimate point-density or perform further classification on the neighborhood structure of points (e.g., curvature)
  5. We compare approaches for three computing setups: (1) AMD Ryzen 3900X (3.8 GHz, 12 cores); (2) AMD Ryzen 2970WX (2.9 GHz, 24 cores); (3) Intel Xeon Gold 6230 CPU (2.10 GHz, 2x20 cores)
Name Reference and Documentation Comments
scipy.spatial.KDTree Manual Pure Python implementation of KD tree. Querying is very slow and usage is not suggested. Not used.Single core CPU processing.
scipy.spatial.cKDTree Manual KDTree implementation in Cython. Single and Multi-core CPU processing.
sklearn.neighbors.KDTree Manual KDTree implementation in sklearn. Single core CPU processing.
pyKDTree github page and pypi project page fast implementation for common use cases (low dimensions and low number of neighbours) for both tree construction and queries. The implementation is based on scipy.spatial.cKDTree and libANN by combining the best features from both and focus on implementation efficiency. Multi-core CPU processing.
pyflann github pyflann is the python bindings for FLANN - Fast Library for Approximate Nearest Neighbors and FLANN Manual 1.8.4 Multi-core CPU processing.
cyflann github cyflann is the a cython interface for FLANN - Fast Library for Approximate Nearest Neighbors and FLANN Manual 1.8.4. Multi-core CPU processing.
NearestNeighbors cuml-kNN GPU implementaiton of knn via rapidsai, currently only supports brute-force algorithm and is not competitive

Datasets

Subset of Pozo catchment: Airborne Lidar data from Santa Cruz Island, California

The airborne point cloud from Santa Cruz Island, California represents a natural terrain without buildings, but lower density. The dataset contains 3,348,668 points with a point-density of 7.2 pts/m^2 and has been ground-classified using LAStools. The points have been colored using an airphoto from the same time as the lidar flight. The test area is from a small subset of the Pozo catchment in the southwestern part of the island. These data are openly accessibly and available from opentopography and were originally acquired by the USGS in 2010. The geologic and geomorphic environment and setting of the Santa Cruz Island has been described in several peer-reviewed scientific publications (e.g., Perroy et al., 2010, Perroy et al., 2012, Neely et al., 2017, Rheinwalt et al., 2019, Clubb et al., 2019).

Summary of Results for the Pozo catchment (n=3,348,668 points)

We ran tests on a AMD Ryzen Threadripper 2970WX 24-Core Processor (2019) with a NVIDIA Titan RTX 24 GB running Ubuntu 18.04 (CUDA 10.1) and a AMD Ryzen 9 3900X 12-Core Processor with a NVIDIA GeForce RTX 2080 SUPER running Ubuntu 20.04 (CUDA 11.0).

A first test using standard single-core and multi core algorithms for n=3,348,668 queries for n=3,348,668 points. Note that the KDTree calculations from scipy.spatial.KDTree have note been included, because they were too slow. Also, for the single core sklearnKDTree approach, no higher number of neighbors have been included (too slow). All results show times in seconds (s) and have been averaged over n=3 runs.

Comparing the traditional and widely used sklearnKDTree (single core) and cKDTree (multi core) approaches we note the following results:

  1. The leaf size is an important parameter to speed up single-core querying trees. Depending on point cloud structure, different leaf sizes provide very different results and can improve query times. We note that the default leaf size does not generate useful results for real-world airborne lidar data and that there exists a minimum time representing an optimal leaf size.
  2. The sklearnKDTree (single core) is slow on these massive queries. The option dualtree=True has been used to speed up processing.
  3. cKDtree with jobs=-1 set for querying outperforms single-core approaches - especially on modern multi-core systems. Leaf size does not have a significant impact on multi-core processing, but some for larger neighborhood queries (k>100).
  4. There are minimal difference between different approach. For example. the max. difference between sklearnKDTree and cKDTree is 0.2m and the median difference is 0.0.
  5. Comparing cKDTree with 12, 24, and 48 core processors indicates a clear advantage of multi-threading systems. We emphasize that in order to take full advantage of multi-threading processes, an increase in available DRAM is needed (i.e., more cores require more DRAM). We note that pyKDTree has lower peak memory requirement than cKDTree.
  6. The FLANN (Fast Library for Approximate Nearest Neighbors) family of approaches provides additional advancements, especially for large datasets and massive queries.
  7. Initial tests with cuML (CUDA RAPIDS) show that the implemented brute-force approach for nearest neighbor searches is not competitive against the multi-core approaches (cKDTree and pyKDTree) and highly optimized FLANN approaches. But there are other processing advantages of data analysis using CUDA Dataframes (cudf).
Generation and query times for single-core sklearnKDTree for varying leafsizes.
The multi-core cKDTree implementation in scipy.spatial.cKDTree performs well - but you need to set the 'jobs=-1' parameter in the query to achieve best results and use all available cores (only during queries).
Comparison of pyKDTree and cKDTree for different number of cores (both use a leaf size of 20). cKDTree outperforms pyKDTree and shows a nearly linear rise in time for increasing values in k-nearest neighbors. Higher number of cores result in faster processing time, most notably at higher number of ks. Processing times for lower k are faster for higher CPU speeds (3.9 GHz vs. 2.1 GHz).
Comparison of cyFLANN and cKDTree. With higher number of cores, cKDTree performs better than cyFLANN for large number of neighbors. cyFLANN performs better for lower number of neighbors (small k) (a factor of 2 at k=500 neighbors with 40 cores).
Algorithm Generate KDTree (s) Query k=5 (s) Query k=10 (s) Query k=50 (s) Query k=100 (s) Query k=500 (s) Query k=1000 (s)
KDTree 5.25 nan nan nan nan nan nan
sklearnKDTree 1.51 36.93 nan nan nan nan nan
cKDTree 0.32 0.23 0.31 0.91 1.67 7.47 15.13
pyKDTree 0.07 0.23 0.32 1.1 2.58 35.17 129.81
pyflannKDTree 0.19 0.17 0.24 0.97 2.11 12.54 27.4
cyflannKDTree 0.26 0.2 0.26 1 2.2 9.69 20.01

Table: Comparison of fastest processing times (any leaf size) for all implemented algorithms in seconds. Note that scipy standard KDTree has not been processed due to the very slow processing times. All times are the average of 3 iterations.

Algorithm Generate KDTree (leaf size) Query k=5 (leaf size) Query k=10 (leaf size) Query k=50 (leaf size) Query k=100 (leaf size) Query k=500 (leaf size) Query k=1000 (leaf size)
KDTree 10 8 8 8 8 8 8
cKDTree 36 14 16 24 14 38 38
sklearnKDTree 28 12 8 8 8 8 8
pyKDTree 36 16 20 16 28 26 32

Table: Best leaf sizes (fastest times). Note the differences for varying numbers of neighbors.

Create and query cKDTree for three k neighbors and leaf sizes. Creating the cKDTree does not show large variability, but querying is mostly dependent on number of k neighbors.
Create and query cKDTree for using 1 to 8 threads. Generating the cKDTree only slightly improves when increasing numbers of threads, but querying significantly improves for higher number threads.

Leave a comment