How to classify datasets to detect features

Who can use this feature

Owners and Editors of projects within Pro and Pro + workspaces

Selection requirements for tool to work

At least two point or polygon layers (minimum two features in each) AND

A dataset

The Classify dataset tool in GeoNadir allows you to quickly perform Random Forest classification using training points or polygons that you define. This enables you to categorise land cover, vegetation types, or other features in your dataset with high accuracy—turning raw data into actionable insights in just a few clicks.

We take your drone orthomosaic and your training points / polygons from this...

...to this! You will see a classification layer based on your training data, as well as a number of statistics showing area and accuracy.

Step by step

To build your classification, follow these steps:

Create point and/or polygon layers as training sites of your features of interest.
1. You need to have at least two layers, and each layer must have at least two features in it. It's ok to have a mix of point and polygon layers, or to have them all as one type.
2. The features need to overlap with the area covered by the dataset you are going to classify.
3. If using points, we recommend a minimum of 30 point features per layer / class of interest. 70% of these points will be selected at random to train your classification, and the remaining 30% will be used to validate it.
4. If using polygons, aim for at least ten polygon features per layer / class of interest.
5. Try to capture the natural variability of your layer when you select these features (e.g. if you are interested in a certain tree species, don't just use a single tree for all of your features).
6. For best results, use the colour styling for your layers that you would like to see on your final classification.
In the table of contents, select your point/polygon layers AND the dataset you are going to classify (hold ctrl or cmd as you click to multiselect).
From the top menu bar, click the 'classify dataset' button.
Select either the object based or pixel based algorithm option
1. If following the object based workflow, optionally refine and preview the size of the objects created on a scale of 0 (few large objects) to 100 (many small objects).
2. Click next
Select one or more layers for the algorithm to include in the classification.
Click run

You will be able to see your classification in the table of contents of your project, attributed with 'Processing' as below.

How long will it take?

Depending on the size of your dataset, the classification process can take some time, so please be patient. But don't worry, you can continue to work on your project in GeoNadir while it's processing. Or you can even close the project, browser, and your computer and it won't disrupt the processing! It's all done in the cloud, so as soon as you've set it to run, you can relax and we'll take care of it for you.

When the classification is complete, it will automatically appear on your map, and you will see the updated status in your table of contents - see below for an example.

Note that if you have selected a mixture of multispectral layers with the RGB/DSM/DTM, the output will have the same spatial resolution as the multispectral input. This means that the input RGB/DSM/DTM data will be resampled with a bilinear interpolation equation.

Calculating classification statistics

Curious to delve into the statistics of your newly created classification? Follow these steps:

Select your classification in the table of contents.
From the top menu bar, click on the Calculate statistics button.

A pop up will appear with a range of statistics for you to explore and interact with.

Alternatively, you can access the calculate statistics tool via the toolbox > analysis > calculate statistics.

Understanding your data

Click on the expand arrow at the top right any graph to expand it - this will make it easier to see and interact with.

You can show or hide the results for each class on the graphs by clicking its name in the legend. A class will be 'greyed out' in the legend when you have hidden it.

Total area

The Donut Chart visually compares the area coverage of each category in the classification. The total area of all classes is displayed in the center. Hovering over a segment of the graph reveals the class name and its area. This helps you quickly understand how different classes contribute to the overall area distribution.

Overall accuracy

Overall accuracy is a measure of how well your classification performed, shown as a percentage. It’s calculated by comparing all your sample points—both training and validation—to the results of the classification. We check how many points were correctly classified and divide that by the total number of points.

For example, if you created 100 sample points and 87 of them were correctly matched to the right class in the final classification, your overall accuracy would be 87%.

In GeoNadir, we automatically split your input sample points: 70% are used to train the model, and 30% are used to test it—but the final accuracy is based on all 100% of your points to give you a clear picture of how well the model performed overall.

Sankey chart - accuracy

The Sankey chart helps you see how well each class in your dataset was classified.

Each colored bar on the left shows the true class of your sample points, while the bars on the right show how those points were predicted by the classification model. The lines (or “flows”) between them show where the points went—correct matches flow straight across, and misclassifications flow into the wrong class.

The thicker the line, the more points it represents. You can hover over each bar or line to see the class name and number of points.

This is a great way to spot which classes were most accurate and where confusion occurred between similar-looking areas.

Compare

The Lollipop chart gives you a quick visual comparison of classification results per class. You can switch between different metrics to explore how well each class performed:

Class accuracy – the percentage of points in each class that were correctly classified.
False positives – points incorrectly predicted as this class.
False negatives – points that belong to this class but were misclassified as something else.
Total area – the total area that the model assigned to this class. The statistics unit for the area measurements (e.g. m2, ft2) will be the same as your project units. If you would like to change your measurement units, follow these instructions.

This makes it easy to spot which classes performed well and where the model may have over- or under-predicted.

Below the chart, you'll find a summary table showing the same statistics in detail, so you can dig deeper into the numbers if needed. Use the expand arrow to the top right of the graph to expand to a larger version.

To change the statistic displayed on the large graph, simply click on the column in the table. The highlighted column in the table will match the y-axis of the graph.

How to get better classification results

Creating a good classification takes time and practice! And sometimes a bit of frustration. Here are some tips to help you get better results.

Evaluate your training sample separability before running the classification, using dataset statistics. If your training samples look too similar, the classifier will get confused.
Change the input layers. The dataset statistics can help you here to work out which layers are the most helpful to use.
Use polygons rather than points for your samples.
Review your samples - zoom in all the way and make sure that the pixels selected actually represent the feature you are interested in. This can be really hard with high spatial resolution data like drone imagery! Pay particular attention to those classes that have been incorrectly classified.
Try to have the same number of samples per training category (e.g. five polygons for each, rather than three of one and ten of another).

Release notes

How to create a project

How to crop your datasets to polygons

How to calculate dataset summary statistics

How to calculate vector summary statistics