VectorClassifier

Performs a classification of the input vector data according to a model file.

Description

This application performs a vector data classification based on a model file produced by the TrainVectorClassifier application.Features of the vector data output will contain the class labels decided by the classifier (maximal class label = 65535). There are two modes: 1) Update mode: add of the ‘cfield’ field containing the predicted class in the input file. 2) Write mode: copies the existing fields of the input file to the output file and add the ‘cfield’ field containing the predicted class. If you have declared the output file, the write mode applies. Otherwise, the input file update mode will be applied.

This application has several output images and supports “multi-writing”. Instead of computing and writing each image independently, the streamed image blocks are written in a synchronous way for each output. The output images will be computed strip by strip, using the available RAM to compute the strip size, and a user defined streaming mode can be specified using the streaming extended filenames (type, mode and value). Note that multi-writing can be disabled using the multi-write extended filename option: &multiwrite=false, in this case the output images will be written one by one. Note that multi-writing is not supported for MPI writers.

Parameters

Name of the input vector data -in filename [dtype] Mandatory
The input vector data file to classify.

Statistics file -instat filename [dtype]
A XML file containing mean and standard deviation to centerand reduce samples before classification, produced by ComputeImagesStatistics application.

Model file -model filename [dtype] Mandatory
Model file produced by TrainVectorClassifier application.

Output field -cfield string Default value: predicted
Field containing the predicted class.Only geometries with this field available will be taken into account. The field is added either in the input file (if ‘out’ off) or in the output file. Caution, the ‘cfield’ must not exist in the input file if you are updating the file.

Field names to be calculated -feat string1 string2...
List of field names in the input vector data used as features for training. Put the same field names as the TrainVectorClassifier application.

Confidence map -confmap bool Default value: false
Confidence map of the produced classification. The confidence index depends on the model:

  • LibSVM: difference between the two highest probabilities (needs a model with probability estimates, so that classes probabilities can be computed for each sample)
  • Boost: sum of votes
  • DecisionTree: (not supported)
  • KNearestNeighbors: number of neighbors with the same label
  • NeuralNetwork: difference between the two highest responses
  • NormalBayes: (not supported)
  • RandomForest: Confidence (proportion of votes for the majority class). Margin (normalized difference of the votes of the 2 majority classes) is not available for now.
  • SVM: distance to margin (only works for 2-class models)

Output vector data file -out filename [dtype]
Output vector data file storing sample values (OGR format).If not given, the input vector data file is updated.

Examples

From the command-line:

otbcli_VectorClassifier -in vectorData.shp -instat meanVar.xml -model svmModel.svm -out vectorDataLabeledVector.shp -feat perimeter  area  width -cfield predicted

From Python:

import otbApplication

app = otbApplication.Registry.CreateApplication("VectorClassifier")

app.SetParameterString("in", "vectorData.shp")
app.SetParameterString("instat", "meanVar.xml")
app.SetParameterString("model", "svmModel.svm")
app.SetParameterString("out", "vectorDataLabeledVector.shp")
app.SetParameterStringList("feat", "perimeter  area  width")
app.SetParameterString("cfield", "predicted")

app.ExecuteAndWriteOutput()

Limitations

Shapefiles are supported, but the SQLite format is only supported in update mode.