4.8.10 Train a classifier from multiple images

Train a classifier from multiple pairs of images and training vector data.

Detailed description

This application performs a classifier training from multiple pairs of input images and training vector data. Samples are composed of pixel values in each band optionally centered and reduced using an XML statistics file produced by the ComputeImagesStatistics application.
The training vector data must contain polygons with a positive integer field representing the class label. The name of this field can be set using the ”Class label field” parameter. Training and validation sample lists are built such that each class is equally represented in both lists. One parameter allows controlling the ratio between the number of samples in training and validation sets. Two parameters allow managing the size of the training and validation sets per class and per image.
Several classifier parameters can be set depending on the chosen classifier. In the validation process, the confusion matrix is organized the following way: rows = reference labels, columns = produced labels. In the header of the optional confusion matrix output file, the validation (reference) and predicted (produced) class labels are ordered according to the rows/columns of the confusion matrix.
This application is based on LibSVM and on OpenCV Machine Learning classifiers, and is compatible with OpenCV 2.3.1 and later.

Parameters

This section describes in details the parameters available for this application. Table 4.132, page 740 presents a summary of these parameters and the parameters keys to be used in command-line and programming languages. Application key is TrainImagesClassifier.





Parameter key

Parameter type

Parameter description




io

Group

Input and output data

io.il

Input image list

Input Image List

io.vd

Input vector data list

Input Vector Data List

io.imstat

Input File name

Input XML image statistics file

io.confmatout

Output File name

Output confusion matrix

io.out

Output File name

Output model

elev

Group

Elevation management

elev.dem

Directory

DEM directory

elev.geoid

Input File name

Geoid File

elev.default

Float

Default elevation

sample

Group

Training and validation samples parameters

sample.mt

Int

Maximum training sample size per class

sample.mv

Int

Maximum validation sample size per class

sample.bm

Int

Bound sample number by minimum

sample.edg

Boolean

On edge pixel inclusion

sample.vtr

Float

Training and validation sample ratio

sample.vfn

String

Name of the discrimination field

classifier

Choices

Classifier to use for the training

classifier libsvm

Choice

LibSVM classifier

classifier boost

Choice

Boost classifier

classifier dt

Choice

Decision Tree classifier

classifier gbt

Choice

Gradient Boosted Tree classifier

classifier ann

Choice

Artificial Neural Network classifier

classifier bayes

Choice

Normal Bayes classifier

classifier rf

Choice

Random forests classifier

classifier knn

Choice

KNN classifier

classifier.libsvm.k

Choices

SVM Kernel Type

classifier.libsvm.k linear

Choice

Linear

classifier.libsvm.k rbf

Choice

Gaussian radial basis function

classifier.libsvm.k poly

Choice

Polynomial

classifier.libsvm.k sigmoid

Choice

Sigmoid

classifier.libsvm.m

Choices

SVM Model Type

classifier.libsvm.m csvc

Choice

C support vector classification

classifier.libsvm.m nusvc

Choice

Nu support vector classification

classifier.libsvm.m oneclass

Choice

Distribution estimation (One Class SVM)

classifier.libsvm.c

Float

Cost parameter C

classifier.libsvm.opt

Boolean

Parameters optimization

classifier.libsvm.prob

Boolean

Probability estimation

classifier.boost.t

Choices

Boost Type

classifier.boost.t discrete

Choice

Discrete AdaBoost

classifier.boost.t real

Choice

Real AdaBoost (technique using confidence-rated predictions and working well with categorical data)

classifier.boost.t logit

Choice

LogitBoost (technique producing good regression fits)

classifier.boost.t gentle

Choice

Gentle AdaBoost (technique setting less weight on outlier data points and, for that reason, being often good with regression data)

classifier.boost.w

Int

Weak count

classifier.boost.r

Float

Weight Trim Rate

classifier.boost.m

Int

Maximum depth of the tree

classifier.dt.max

Int

Maximum depth of the tree

classifier.dt.min

Int

Minimum number of samples in each node

classifier.dt.ra

Float

Termination criteria for regression tree

classifier.dt.cat

Int

Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split

classifier.dt.f

Int

K-fold cross-validations

classifier.dt.r

Boolean

Set Use1seRule flag to false

classifier.dt.t

Boolean

Set TruncatePrunedTree flag to false

classifier.gbt.w

Int

Number of boosting algorithm iterations

classifier.gbt.s

Float

Regularization parameter

classifier.gbt.p

Float

Portion of the whole training set used for each algorithm iteration

classifier.gbt.max

Int

Maximum depth of the tree

classifier.ann.t

Choices

Train Method Type

classifier.ann.t reg

Choice

RPROP algorithm

classifier.ann.t back

Choice

Back-propagation algorithm

classifier.ann.sizes

String list

Number of neurons in each intermediate layer

classifier.ann.f

Choices

Neuron activation function type

classifier.ann.f ident

Choice

Identity function

classifier.ann.f sig

Choice

Symmetrical Sigmoid function

classifier.ann.f gau

Choice

Gaussian function (Not completely supported)

classifier.ann.a

Float

Alpha parameter of the activation function

classifier.ann.b

Float

Beta parameter of the activation function

classifier.ann.bpdw

Float

Strength of the weight gradient term in the BACKPROP method

classifier.ann.bpms

Float

Strength of the momentum term (the difference between weights on the 2 previous iterations)

classifier.ann.rdw

Float

Initial value Delta_0 of update-values Delta_ij in RPROP method

classifier.ann.rdwm

Float

Update-values lower limit Delta_min in RPROP method

classifier.ann.term

Choices

Termination criteria

classifier.ann.term iter

Choice

Maximum number of iterations

classifier.ann.term eps

Choice

Epsilon

classifier.ann.term all

Choice

Max. iterations + Epsilon

classifier.ann.eps

Float

Epsilon value used in the Termination criteria

classifier.ann.iter

Int

Maximum number of iterations used in the Termination criteria

classifier.rf.max

Int

Maximum depth of the tree

classifier.rf.min

Int

Minimum number of samples in each node

classifier.rf.ra

Float

Termination Criteria for regression tree

classifier.rf.cat

Int

Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split

classifier.rf.var

Int

Size of the randomly selected subset of features at each tree node

classifier.rf.nbtrees

Int

Maximum number of trees in the forest

classifier.rf.acc

Float

Sufficient accuracy (OOB error)

classifier.knn.k

Int

Number of Neighbors

rand

Int

set user defined seed

inxml

XML input parameters file

Load otb application from xml file

outxml

XML output parameters file

Save otb application to xml file











Table 4.132: Parameters table for Train a classifier from multiple images.

Input and output data This group of parameters allows setting input and output data.

Elevation management This group of parameters allows managing elevation values. Supported formats are SRTM, DTED or any geotiff. DownloadSRTMTiles application could be a useful tool to list/download tiles related to a product.

Training and validation samples parameters This group of parameters allows you to set training and validation sample lists parameters.

Classifier to use for the training Choice of the classifier to use for the training. Available choices are:

set user defined seed Set specific seed. with integer value.

Load otb application from xml file Load otb application from xml file

Save otb application to xml file Save otb application to xml file

Example

To run this example in command-line, use the following:

otbcli_TrainImagesClassifier -io.il QB_1_ortho.tif -io.vd VectorData_QB1.shp -io.imstat EstimateImageStatisticsQB1.xml -sample.mv 100 -sample.mt 100 -sample.vtr 0.5 -sample.edg false -sample.vfn Class -classifier libsvm -classifier.libsvm.k linear -classifier.libsvm.c 1 -classifier.libsvm.opt false -io.out svmModelQB1.txt -io.confmatout svmConfusionMatrixQB1.csv

To run this example from Python, use the following code snippet:

#!/usr/bin/python 
 
# Import the otb applications package 
import otbApplication 
 
# The following line creates an instance of the TrainImagesClassifier application 
TrainImagesClassifier = otbApplication.Registry.CreateApplication("TrainImagesClassifier") 
 
# The following lines set all the application parameters: 
TrainImagesClassifier.SetParameterStringList("io.il", ['QB_1_ortho.tif']) 
 
TrainImagesClassifier.SetParameterStringList("io.vd", ['VectorData_QB1.shp']) 
 
TrainImagesClassifier.SetParameterString("io.imstat", "EstimateImageStatisticsQB1.xml") 
 
TrainImagesClassifier.SetParameterInt("sample.mv", 100) 
 
TrainImagesClassifier.SetParameterInt("sample.mt", 100) 
 
TrainImagesClassifier.SetParameterFloat("sample.vtr", 0.5) 
 
TrainImagesClassifier.SetParameterString("sample.edg","1") 
 
TrainImagesClassifier.SetParameterString("sample.vfn", "Class") 
 
TrainImagesClassifier.SetParameterString("classifier","libsvm") 
 
TrainImagesClassifier.SetParameterString("classifier.libsvm.k","linear") 
 
TrainImagesClassifier.SetParameterFloat("classifier.libsvm.c", 1) 
 
TrainImagesClassifier.SetParameterString("classifier.libsvm.opt","1") 
 
TrainImagesClassifier.SetParameterString("io.out", "svmModelQB1.txt") 
 
TrainImagesClassifier.SetParameterString("io.confmatout", "svmConfusionMatrixQB1.csv") 
 
# The following line execute the application 
TrainImagesClassifier.ExecuteAndWriteOutput()

Limitations

None

Authors

This application has been written by OTB-Team.

See also

These additional ressources can be useful for further information: