10 Disparity Map Estimation

Chapter 10
Disparity Map Estimation

This chapter introduces the tools available in OTB for the estimation of geometric disparities between images.

10.1 Disparity Maps

The problem we want to deal with is the one of the automatic disparity map estimation of images acquired with different sensors. By different sensors, we mean sensors which produce images with different radiometric properties, that is, sensors which measure different physical magnitudes: optical sensors operating in different spectral bands, radar and optical sensors, etc.

For this kind of image pairs, the classical approach of fine correlation [82, 43], can not always be used to provide the required accuracy, since this similarity measure (the correlation coefficient) can only measure similarities up to an affine transformation of the radiometries.

There are two main questions which can be asked about what we want to do:

Can we define what the similarity is between, for instance, a radar and an optical image?
What does fine registration mean in the case where the geometric distortions are so big and the source of information can be located in different places (for instance, the same edge can be produced by the edge of the roof of a building in an optical image and by the wall-ground bounce in a radar image)?

We can answer by saying that the images of the same object obtained by different sensors are two different representations of the same reality. For the same spatial location, we have two different measures. Both information come from the same source and thus they have a lot of common information. This relationship may not be perfect, but it can be evaluated in a relative way: different geometrical distortions are compared and the one leading to the strongest link between the two measures is kept.

When working with images acquired with the same (type of) sensor one can use a very effective approach. Since a correlation coefficient measure is robust and fast for similar images, one can afford to apply it in every pixel of one image in order to search for the corresponding HP in the other image. One can thus build a deformation grid (a sampling of the deformation map). If the sampling step of this grid is short enough, the interpolation using an analytical model is not needed and high frequency deformations can be estimated. The obtained grid can be used as a re-sampling grid and thus obtain the registered images.

No doubt, this approach, combined with image interpolation techniques (in order to estimate sub-pixel deformations) and multi-resolution strategies allows for obtaining the best performances in terms of deformation estimation, and hence for the automatic image registration.

Unfortunately, in the multi-sensor case, the correlation coefficient can not be used. We will thus try to find similarity measures which can be applied in the multi-sensor case with the same approach as the correlation coefficient.

We start by giving several definitions which allow for the formalization of the image registration problem. First of all, we define the master image and the slave image:

Definition 1 Master image: image to which other images will be registered; its geometry is considered as the reference.

Definition 2 Slave image: image to be geometrically transformed in order to be registered to the master image.

Two main concepts are the one of similarity measure and the one of geometric transformation:

Definition 3 Let I and J be two images and let c a similarity criterion, we call similarity measure any scalar, strictly positive function

Sc(I,J)= f(I,J,c).

(10.1)

S_c has an absolute maximum when the two images I and J are identical in the sense of the criterion c.

Definition 4 A geometric transformation T is an operator which, applied to the coordinates (x,y) of a point in the slave image, gives the coordinates (u,v) of its HP in the master image:

( ) ( ) u x ( v ) = T( y )

(10.2)

Finally we introduce a definition for the image registration problem:

Definition 5 Registration problem:

determine a geometric transformation T which maximizes the similarity between a master image I and the result of the transformation T ∘J:
$Argmax(Sc(I,T ∘ J)); T$ (10.3)
re-sampling of J by applying T .

10.1.1 Geometric deformation modeling

The geometric transformation of definition 4 is used for the correction of the existing deformation between the two images to be registered. This deformation contains information which are linked to the observed scene and the acquisition conditions. They can be classified into 3 classes depending on their physical source:

deformations linked to the mean attitude of the sensor (incidence angle, presence or absence of yaw steering, etc.);
deformations linked to a stereo vision (mainly due to the topography);
deformations linked to attitude evolution during the acquisition (vibrations which are mainly present in push-broom sensors).

These deformations are characterized by their spatial frequencies and intensities which are summarized in table 10.1.


	Intensity	Spatial Frequency

Mean Attitude	Strong	Low

Stereo	Medium	High and Medium

Attitude evolution	Low	Low to Medium

Table 10.1: Characterization of the geometric deformation sources

Depending on the type of deformation to be corrected, its model will be different. For example, if the only deformation to be corrected is the one introduced by the mean attitude, a physical model for the acquisition geometry (independent of the image contents) will be enough. If the sensor is not well known, this deformation can be approximated by a simple analytical model. When the deformations to be modeled are high frequency, analytical (parametric) models are not suitable for a fine registration. In this case, one has to use a fine sampling of the deformation, that means the use of deformation grids. These grids give, for a set of pixels of the master image, their location in the slave image.

The following points summarize the problem of the deformation modeling:

An analytical model is just an approximation of the deformation. It is often obtained as follows:
1. Directly from a physical model without using any image content information.
2. By estimation of the parameters of an a priori model (polynomial, affine, etc.). These parameters can be estimated:
  1. Either by solving the equations obtained by taking HP. The HP can be manually or automatically extracted.
  2. Or by maximization of a global similarity measure.
A deformation grid is a sampling of the deformation map.

The last point implies that the sampling period of the grid must be short enough in order to account for high frequency deformations (Shannon theorem). Of course, if the deformations are non stationary (it is usually the case of topographic deformations), the sampling can be irregular.

As a conclusion, we can say that definition 5 poses the registration problem as an optimization problem. This optimization can be either global or local with a similarity measure which can also be either local or global. All this is synthesized in table 10.2.


Geometric model	Similarity measure	Optimization of the
		deformation

Physical model	None	Global

Analytical model	Local	Global
with a priori HP

Analytical model	Global	Global
without a priori HP

Grid	Local	Local

Table 10.2: Approaches to image registration

The ideal approach would consist in a registration which is locally optimized, both in similarity and deformation, in order to have the best registration quality. This is the case when deformation grids with dense sampling are used. Unfortunately, this case is the most computationally heavy and one often uses either a low sampling rate of the grid, or the evaluation of the similarity in a small set of pixels for the estimation of an analytical model. Both of these choices lead to local registration errors which, depending on the topography, can amount several pixels.

Even if this registration accuracy can be enough in many applications, (ortho-registration, import into a GIS, etc.), it is not acceptable in the case of data fusion, multi-channel segmentation or change detection [130]. This is why we will focus on the problem of deformation estimation using dense grids.

10.1.2 Similarity measures

The fine modeling of the geometric deformation we are looking for needs for the estimation of the coordinates of nearly every pixel in the master image inside the slave image. In the classical mono-sensor case where we use the correlation coefficient we proceed as follows.

The geometric deformation is modeled by local rigid displacements. One wants to estimate the coordinates of each pixel of the master image inside the slave image. This can be represented by a displacement vector associated to every pixel of the master image. Each of the two components (lines and columns) of this vector field will be called deformation grid.

We use a small window taken in the master image and we test the similarity for every possible shift within an exploration area inside the slave image (figure 10.1).

Figure 10.1: Estimation of the correlation surface.

That means that for each position we compute the correlation coefficient. The result is a correlation surface whose maximum gives the most likely local shift between both images:

ρI,J(Δx,Δy)= 1∑x,y(I(x,y)--mI)(J(x+-Δx,y+Δy)--mJ) N σIσJ .

(10.4)

In this expression, N is the number of pixels of the analysis window, m_I and m_J are the estimated mean values inside the analysis window of respectively image I and image J and σ_I and σ_J are their standard deviations.

Quality criteria can be applied to the estimated maximum in order to give a confidence factor to the estimated shift: width of the peak, maximum value, etc. Sub-pixel shifts can be measured by applying fractional shifts to the sliding window. This can be done by image interpolation.

The interesting parameters of the procedure are:

The size of the exploration area: it determines the computational load of the algorithm (we want to reduce it), but it has to be large enough in order to cope with large deformations.
The size of the sliding window: the robustness of the correlation coefficient estimation increases with the window size, but the hypothesis of local rigid shifts may not be valid for large windows.

The correlation coefficient cannot be used with original grey-level images in the multi-sensor case. It could be used on extracted features (edges, etc.), but the feature extraction can introduce localization errors. Also, when the images come from sensors using very different modalities, it can be difficult to find similar features in both images. In this case, one can try to find the similarity at the pixel level, but with other similarity measures and apply the same approach as we have just described.

The concept of similarity measure has been presented in definition 3. The difficulty of the procedure lies in finding the function f which properly represents the criterion c. We also need that f be easily and robustly estimated with small windows. We extend here what we proposed in [68].

10.1.3 The correlation coefficient

We remind here the computation of the correlation coefficient between two image windows I and J. The coordinates of the pixels inside the windows are represented by (x,y):

ρ(I,J)= 1-∑x,y(I(x,y)- mI-)(J(x,y)--mJ). N σIσJ

(10.5)

In order to qualitatively characterize the different similarity measures we propose the following experiment. We take two images which are perfectly registered and we extract a small window of size N ×M from each of the images (this size is set to 101×101 for this experiment). For the master image, the window will be centered on coordinates (x₀,y₀) (the center of the image) and for the slave image, it will be centered on coordinates (x₀ +Δx,y₀). With different values of Δx (from -10 pixels to 10 pixels in our experiments), we obtain an estimate of ρ(I,J) as a function of Δx, which we write as ρ(Δx) for short. The obtained curve should have a maximum for Δx = 0, since the images are perfectly registered. We would also like to have an absolute maximum with a high value and with a sharp peak, in order to have a good precision for the shift estimate.

10.2 Regular grid disparity map estimation

The source code for this example can be found in the file
Examples/DisparityMap/FineRegistrationImageFilterExample.cxx.

This example demonstrates the use of the otb::FineRegistrationImageFilter . This filter performs deformation estimation using the classical extrema of image-to-image metric look-up in a search window.

The first step toward the use of these filters is to include the proper header files.

#include "otbFineRegistrationImageFilter.h"

Several type of otb::Image are required to represent the input image, the metric field, and the deformation field.

    typedef otb::Image<PixelType, ImageDimension> InputImageType;
    typedef otb::Image<PixelType, ImageDimension> MetricImageType;
    typedef otb::Image<DisplacementPixelType,
        ImageDimension>                           DisplacementFieldType;

To make the metric estimation more robust, the first required step is to blur the input images. This is done using the itk::RecursiveGaussianImageFilter :

    typedef itk::RecursiveGaussianImageFilter<InputImageType,
        InputImageType> InputBlurType;

    InputBlurType::Pointer fBlur = InputBlurType::New();
    fBlur->SetInput(fReader->GetOutput());
    fBlur->SetSigma(atof(argv[7]));

    InputBlurType::Pointer mBlur = InputBlurType::New();
    mBlur->SetInput(mReader->GetOutput());
    mBlur->SetSigma(atof(argv[7]));

Now, we declare and instantiate the otb::FineRegistrationImageFilter which is going to perform the registration:

    typedef otb::FineRegistrationImageFilter<InputImageType,
        MetricImageType,
        DisplacementFieldType>
    RegistrationFilterType;

    RegistrationFilterType::Pointer registrator = RegistrationFilterType::New();

    registrator->SetMovingInput(mBlur->GetOutput());
    registrator->SetFixedInput(fBlur->GetOutput());

Some parameters need to be specified to the filter:

The area where the search is performed. This area is defined by its radius:

    typedef RegistrationFilterType::SizeType RadiusType;

    RadiusType searchRadius;

    searchRadius[0] = atoi(argv[8]);
    searchRadius[1] = atoi(argv[8]);

    registrator->SetSearchRadius(searchRadius);
The window used to compute the local metric. This window is also defined by its radius:

    RadiusType metricRadius;
    metricRadius[0] = atoi(argv[9]);
    metricRadius[1] = atoi(argv[9]);

    registrator->SetRadius(metricRadius);

We need to set the sub-pixel accuracy we want to obtain:
The default matching metric used by the otb::FineRegistrationImageFilter is standard correlation. However, we may also use any other image-to-image metric provided by ITK. For instance, here is how we would use the itk::MutualInformationImageToImageMetric (do not forget to include the proper header).

        typedef itk::MeanReciprocalSquareDifferenceImageToImageMetric
          <InputImageType, InputImageType> MRSDMetricType;
        MRSDMetricType::Pointer mrsdMetric = MRSDMetricType::New();
        registrator->SetMetric(mrsdMetric);

The itk::MutualInformationImageToImageMetric produces low value for poor matches, therefore, the filter has to maximize the metric :

        registrator->MinimizeOff();

The execution of the otb::FineRegistrationImageFilter will be triggered by the Update() call on the writer at the end of the pipeline. Make sure to use a otb::ImageFileWriter if you want to benefit from the streaming features.

Figure 10.2 shows the result of applying the otb::FineRegistrationImageFilter .

Figure 10.2: From left to right and top to bottom: fixed input image, moving image with a low stereo angle, local correlation field, local mean reciprocal square difference field, resampled image based on correlation, resampled image based on mean reciprocal square difference, estimated epipolar deformation using on correlation, estimated epipolar deformation using mean reciprocal square difference.

10.3 Irregular grid disparity map estimation

Taking figure 10.1 as a starting point, we can generalize the approach by letting the user choose:

the similarity measure;
the geometric transform to be estimated (see definition 4);

In order to do this, we will use the ITK registration framework locally on a set of nodes. Once the disparity is estimated on a set of nodes, we will use it to generate a deformation field: the dense, regular vector field which gives the translation to be applied to a pixel of the secondary image to be positioned on its homologous point of the master image.

The source code for this example can be found in the file
Examples/DisparityMap/SimpleDisparityMapEstimationExample.cxx.

This example demonstrates the use of the otb::DisparityMapEstimationMethod , along with the otb::NearestPointDisplacementFieldGenerator . The first filter performs deformation estimation according to a given transform, using embedded ITK registration framework. It takes as input a possibly non regular point set and produces a point set with associated point data representing the deformation.

The second filter generates a deformation field by using nearest neighbor interpolation on the deformation values from the point set. More advanced methods for deformation field interpolation are also available.

The first step toward the use of these filters is to include the proper header files.

  #include "otbDisparityMapEstimationMethod.h"
  #include "itkTranslationTransform.h"
  #include "itkNormalizedCorrelationImageToImageMetric.h"
  #include "itkWindowedSincInterpolateImageFunction.h"
  #include "itkGradientDescentOptimizer.h"
  #include "otbBSplinesInterpolateDisplacementFieldGenerator.h"
  #include "itkWarpImageFilter.h"

Then we must decide what pixel type to use for the image. We choose to do all the computation in floating point precision and rescale the results between 0 and 255 in order to export PNG images.

typedef double PixelType;
typedef unsigned char OutputPixelType;

The images are defined using the pixel type and the dimension. Please note that the otb::NearestPointDisplacementFieldGenerator generates a otb::VectorImage to represent the deformation field in both image directions.

typedef otb::Image<PixelType, Dimension> ImageType;
typedef otb::Image<OutputPixelType, Dimension> OutputImageType;

The next step is to define the transform we have chosen to model the deformation. In this example the deformation is modeled as a itk::TranslationTransform .

typedef itk::TranslationTransform<double, Dimension> TransformType;
typedef TransformType::ParametersType ParametersType;

Then we define the metric we will use to evaluate the local registration between the fixed and the moving image. In this example we chose the itk::NormalizedCorrelationImageToImageMetric .

typedef itk::NormalizedCorrelationImageToImageMetric<ImageType,
ImageType> MetricType;

Disparity map estimation implies evaluation of the moving image at non-grid position. Therefore, an interpolator is needed. In this example we chose the itk::WindowedSincInterpolateImageFunction .

    typedef itk::Function::HammingWindowFunction<3>          WindowFunctionType;
    typedef itk::ZeroFluxNeumannBoundaryCondition<ImageType> ConditionType;
    typedef itk::WindowedSincInterpolateImageFunction<ImageType, 3,
        WindowFunctionType,
        ConditionType,
        double> InterpolatorType;

To perform local registration, an optimizer is needed. In this example we chose the itk::GradientDescentOptimizer .

typedef itk::GradientDescentOptimizer OptimizerType;

Now we will define the point set to represent the point where to compute local disparity.

typedef itk::PointSet<ParametersType, Dimension> PointSetType;

Now we define the disparity map estimation filter.

    typedef otb::DisparityMapEstimationMethod<ImageType,
        ImageType,
        PointSetType> DMEstimationType;
    typedef DMEstimationType::SizeType SizeType;

The input image reader also has to be defined.

typedef otb::ImageFileReader<ImageType> ReaderType;

Two readers are instantiated : one for the fixed image, and one for the moving image.

    ReaderType::Pointer fixedReader = ReaderType::New();
    ReaderType::Pointer movingReader = ReaderType::New();

    fixedReader->SetFileName(argv[1]);
    movingReader->SetFileName(argv[2]);
    fixedReader->UpdateOutputInformation();
    movingReader->UpdateOutputInformation();

We will the create a regular point set where to compute the local disparity.

    SizeType fixedSize =
      fixedReader->GetOutput()->GetLargestPossibleRegion().GetSize();
    unsigned int NumberOfXNodes = (fixedSize[0] - 2 ⋆ atoi(argv[7]) - 1)
                                  / atoi(argv[5]);
    unsigned int NumberOfYNodes = (fixedSize[1] - 2 ⋆ atoi(argv[7]) - 1)
                                  / atoi(argv[6]);

    ImageType::IndexType firstNodeIndex;
    firstNodeIndex[0] = atoi(argv[7]);
    firstNodeIndex[1] = atoi(argv[7]);

    PointSetType::Pointer nodes = PointSetType::New();
    unsigned int          nodeCounter = 0;

    for (unsigned int x = 0; x < NumberOfXNodes; x++)
      {
      for (unsigned int y = 0; y < NumberOfYNodes; y++)
        {
        PointType                    p;
        p[0] = firstNodeIndex[0] + x⋆atoi(argv[5]);
        p[1] = firstNodeIndex[1] + y⋆atoi(argv[6]);
        nodes->SetPoint(nodeCounter++, p);
        }
      }

We build the transform, interpolator, metric and optimizer for the disparity map estimation filter.

    TransformType::Pointer transform = TransformType::New();

    OptimizerType::Pointer optimizer = OptimizerType::New();
    optimizer->MinimizeOn();
    optimizer->SetLearningRate(atof(argv[9]));
    optimizer->SetNumberOfIterations(atoi(argv[10]));

    InterpolatorType::Pointer interpolator = InterpolatorType::New();

    MetricType::Pointer metric = MetricType::New();
    metric->SetSubtractMean(true);

We then set up the disparity map estimation filter. This filter will perform a local registration at each point of the given point set using the ITK registration framework. It will produce a point set whose point data reflects the disparity locally around the associated point.

Point data will contains the following data :

The final metric value found in the registration process,
the deformation value in the first image direction,
the deformation value in the second image direction,
the final parameters of the transform.

Please note that in the case of a itk::TranslationTransform , the deformation values and the transform parameters are the same.

    DMEstimationType::Pointer dmestimator = DMEstimationType::New();

    dmestimator->SetTransform(transform);
    dmestimator->SetOptimizer(optimizer);
    dmestimator->SetInterpolator(interpolator);
    dmestimator->SetMetric(metric);

    SizeType windowSize, explorationSize;
    explorationSize.Fill(atoi(argv[7]));
    windowSize.Fill(atoi(argv[8]));

    dmestimator->SetWinSize(windowSize);
    dmestimator->SetExploSize(explorationSize);

The initial transform parameters can be set via the SetInitialTransformParameters() method. In our case, we simply fill the parameter array with null values.

    DMEstimationType::ParametersType
    initialParameters(transform->GetNumberOfParameters());
    initialParameters[0] = 0.0;
    initialParameters[1] = 0.0;
    dmestimator->SetInitialTransformParameters(initialParameters);

Now we can set the input for the deformation field estimation filter. Fixed image can be set using the SetFixedImage() method, moving image can be set using the SetMovingImage(), and input point set can be set using the SetPointSet() method.

    dmestimator->SetFixedImage(fixedReader->GetOutput());
    dmestimator->SetMovingImage(movingReader->GetOutput());
    dmestimator->SetPointSet(nodes);

Once the estimation has been performed by the otb::DisparityMapEstimationMethod , one can generate the associated deformation field (that means translation in first and second image direction). It will be represented as a otb::VectorImage .

typedef otb::VectorImage<PixelType, Dimension> DisplacementFieldType;

For the deformation field estimation, we will use the otb::BSplinesInterpolateDisplacementFieldGenerator . This filter will perform a nearest neighbor interpolation on the deformation values in the point set data.

    typedef otb::BSplinesInterpolateDisplacementFieldGenerator<PointSetType,
        DisplacementFieldType>
    GeneratorType;

The disparity map estimation filter is instantiated.

GeneratorType::Pointer generator = GeneratorType::New();

We must then specify the input point set using the SetPointSet() method.

generator->SetPointSet(dmestimator->GetOutput());

One must also specify the origin, size and spacing of the output deformation field.

    generator->SetOutputOrigin(fixedReader->GetOutput()->GetOrigin());
    generator->SetOutputSpacing(fixedReader->GetOutput()->GetSignedSpacing());
    generator->SetOutputSize(fixedReader->GetOutput()
                             ->GetLargestPossibleRegion().GetSize());

The local registration process can lead to wrong deformation values and transform parameters. To Select only points in point set for which the registration process was successful, one can set a threshold on the final metric value : points for which the absolute final metric value is below this threshold will be discarded. This threshold can be set with the SetMetricThreshold() method.

generator->SetMetricThreshold(atof(argv[11]));

The following classes provide similar functionality:

Now we can warp our fixed image according to the estimated deformation field. This will be performed by the itk::WarpImageFilter . First, we define this filter.

typedef itk::WarpImageFilter<ImageType, ImageType,
DisplacementFieldType> ImageWarperType;

Then we instantiate it.

ImageWarperType::Pointer warper = ImageWarperType::New();

We set the input image to warp using the SetInput() method, and the deformation field using the SetDisplacementField() method.

    warper->SetInput(movingReader->GetOutput());
    warper->SetDisplacementField(generator->GetOutput());
    warper->SetOutputOrigin(fixedReader->GetOutput()->GetOrigin());
    warper->SetOutputSpacing(fixedReader->GetOutput()->GetSignedSpacing());

In order to write the result to a PNG file, we will rescale it on a proper range.

    typedef itk::RescaleIntensityImageFilter<ImageType,
        OutputImageType> RescalerType;

    RescalerType::Pointer outputRescaler = RescalerType::New();
    outputRescaler->SetInput(warper->GetOutput());
    outputRescaler->SetOutputMaximum(255);
    outputRescaler->SetOutputMinimum(0);

We can now write the image to a file. The filters are executed by invoking the Update() method.

    typedef otb::ImageFileWriter<OutputImageType> WriterType;

    WriterType::Pointer outputWriter = WriterType::New();
    outputWriter->SetInput(outputRescaler->GetOutput());
    outputWriter->SetFileName(argv[4]);
    outputWriter->Update();

We also want to write the deformation field along the first direction to a file. To achieve this we will use the otb::MultiToMonoChannelExtractROI filter.

    typedef otb::MultiToMonoChannelExtractROI<PixelType,
        PixelType>
    ChannelExtractionFilterType;

    ChannelExtractionFilterType::Pointer channelExtractor
      = ChannelExtractionFilterType::New();

    channelExtractor->SetInput(generator->GetOutput());
    channelExtractor->SetChannel(1);

    RescalerType::Pointer fieldRescaler = RescalerType::New();
    fieldRescaler->SetInput(channelExtractor->GetOutput());
    fieldRescaler->SetOutputMaximum(255);
    fieldRescaler->SetOutputMinimum(0);

    WriterType::Pointer fieldWriter = WriterType::New();
    fieldWriter->SetInput(fieldRescaler->GetOutput());
    fieldWriter->SetFileName(argv[3]);
    fieldWriter->Update();

Figure 10.3 shows the result of applying disparity map estimation on a stereo pair using a regular point set, followed by deformation field estimation using Splines and fixed image resampling.

Figure 10.3: From left to right and top to bottom: fixed input image, moving image with a sinusoid deformation, estimated deformation field in the horizontal direction, resampled moving image.

10.4 Stereo reconstruction

The source code for this example can be found in the file
Examples/DisparityMap/StereoReconstructionExample.cxx.

This example demonstrates the use of the stereo reconstruction chain from an image pair. The images are assumed to come from the same sensor but with different positions. The approach presented here has the following steps:

Epipolar resampling of the image pair
Dense disparity map estimation
Projection of the disparities on an existing Digital Elevation Model (DEM)

It is important to note that this method requires the sensor models with a pose estimate for each image.

  #include "otbStereorectificationDisplacementFieldSource.h"
  #include "otbStreamingWarpImageFilter.h"
  #include "otbBandMathImageFilter.h"
  #include "otbSubPixelDisparityImageFilter.h"
  #include "otbDisparityMapMedianFilter.h"
  #include "otbDisparityMapToDEMFilter.h"

  #include "otbImageFileReader.h"
  #include "otbImageFileWriter.h"
  #include "otbBCOInterpolateImageFunction.h"
  #include "itkUnaryFunctorImageFilter.h"
  #include "itkVectorCastImageFilter.h"
  #include "otbImageList.h"
  #include "otbImageListToVectorImageFilter.h"
  #include "itkRescaleIntensityImageFilter.h"
  #include "otbDEMHandler.h"

This example demonstrates the use of the following filters :

    typedef otb::StereorectificationDisplacementFieldSource
      <FloatImageType,FloatVectorImageType>     DisplacementFieldSourceType;

    typedef itk::Vector<double,2>               DisplacementType;
    typedef otb::Image<DisplacementType>         DisplacementFieldType;

    typedef itk::VectorCastImageFilter
      <FloatVectorImageType,
       DisplacementFieldType>                    DisplacementFieldCastFilterType;

    typedef otb::StreamingWarpImageFilter
      <FloatImageType,
       FloatImageType,
       DisplacementFieldType>                    WarpFilterType;

    typedef otb::BCOInterpolateImageFunction
      <FloatImageType>                          BCOInterpolationType;

    typedef otb::Functor::NCCBlockMatching
      <FloatImageType,FloatImageType>           NCCBlockMatchingFunctorType;

    typedef otb::PixelWiseBlockMatchingImageFilter
      <FloatImageType,
       FloatImageType,
       FloatImageType,
       FloatImageType,
       NCCBlockMatchingFunctorType>             NCCBlockMatchingFilterType;

    typedef otb::BandMathImageFilter
      <FloatImageType>                          BandMathFilterType;

    typedef otb::SubPixelDisparityImageFilter
      <FloatImageType,
       FloatImageType,
       FloatImageType,
       FloatImageType,
       NCCBlockMatchingFunctorType>             NCCSubPixelDisparityFilterType;

    typedef otb::DisparityMapMedianFilter
      <FloatImageType,
       FloatImageType,
       FloatImageType>                          MedianFilterType;

    typedef otb::DisparityMapToDEMFilter
      <FloatImageType,
       FloatImageType,
       FloatImageType,
       FloatVectorImageType,
       FloatImageType>                          DisparityToElevationFilterType;

The image pair is supposed to be in sensor geometry. From two images covering nearly the same area, one can estimate a common epipolar geometry. In this geometry, an altitude variation corresponds to an horizontal shift between the two images. The filter otb::StereorectificationDisplacementFieldSource computes the deformation grids for each image.

These grids are sampled in epipolar geometry. They have two bands, containing the position offset (in physical space units) between the current epipolar point and the corresponding sensor point in horizontal and vertical direction. They can be computed at a lower resolution than sensor resolution. The application StereoRectificationGridGenerator also provides a simple tool to generate the epipolar grids for your image pair.

    DisplacementFieldSourceType::Pointer m_DisplacementFieldSource = DisplacementFieldSourceType::New();
    m_DisplacementFieldSource->SetLeftImage(leftReader->GetOutput());
    m_DisplacementFieldSource->SetRightImage(rightReader->GetOutput());
    m_DisplacementFieldSource->SetGridStep(4);
    m_DisplacementFieldSource->SetScale(1.0);
    //m_DisplacementFieldSource->SetAverageElevation(avgElevation);

    m_DisplacementFieldSource->Update();

Then, the sensor images can be resampled in epipolar geometry, using the otb::StreamingWarpImageFilter . The application GridBasedImageResampling also gives an easy access to this filter. The user can choose the epipolar region to resample, as well as the resampling step and the interpolator.

Note that the epipolar image size can be retrieved from the stereo rectification grid filter.

    FloatImageType::SpacingType epipolarSpacing;
    epipolarSpacing[0] = 1.0;
    epipolarSpacing[1] = 1.0;

    FloatImageType::SizeType epipolarSize;
    epipolarSize = m_DisplacementFieldSource->GetRectifiedImageSize();

    FloatImageType::PointType epipolarOrigin;
    epipolarOrigin[0] = 0.0;
    epipolarOrigin[1] = 0.0;

    FloatImageType::PixelType defaultValue = 0;

The deformation grids are casted into deformation fields, then the left and right sensor images are resampled.

    DisplacementFieldCastFilterType::Pointer m_LeftDisplacementFieldCaster = DisplacementFieldCastFilterType::New();
    m_LeftDisplacementFieldCaster->SetInput(m_DisplacementFieldSource->GetLeftDisplacementFieldOutput());
    m_LeftDisplacementFieldCaster->GetOutput()->UpdateOutputInformation();

    BCOInterpolationType::Pointer leftInterpolator = BCOInterpolationType::New();
    leftInterpolator->SetRadius(2);

    WarpFilterType::Pointer m_LeftWarpImageFilter = WarpFilterType::New();
    m_LeftWarpImageFilter->SetInput(leftReader->GetOutput());
    m_LeftWarpImageFilter->SetDisplacementField(m_LeftDisplacementFieldCaster->GetOutput());
    m_LeftWarpImageFilter->SetInterpolator(leftInterpolator);
    m_LeftWarpImageFilter->SetOutputSize(epipolarSize);
    m_LeftWarpImageFilter->SetOutputSpacing(epipolarSpacing);
    m_LeftWarpImageFilter->SetOutputOrigin(epipolarOrigin);
    m_LeftWarpImageFilter->SetEdgePaddingValue(defaultValue);

    DisplacementFieldCastFilterType::Pointer m_RightDisplacementFieldCaster = DisplacementFieldCastFilterType::New();
    m_RightDisplacementFieldCaster->SetInput(m_DisplacementFieldSource->GetRightDisplacementFieldOutput());
    m_RightDisplacementFieldCaster->GetOutput()->UpdateOutputInformation();

    BCOInterpolationType::Pointer rightInterpolator = BCOInterpolationType::New();
    rightInterpolator->SetRadius(2);

    WarpFilterType::Pointer m_RightWarpImageFilter = WarpFilterType::New();
    m_RightWarpImageFilter->SetInput(rightReader->GetOutput());
    m_RightWarpImageFilter->SetDisplacementField(m_RightDisplacementFieldCaster->GetOutput());
    m_RightWarpImageFilter->SetInterpolator(rightInterpolator);
    m_RightWarpImageFilter->SetOutputSize(epipolarSize);
    m_RightWarpImageFilter->SetOutputSpacing(epipolarSpacing);
    m_RightWarpImageFilter->SetOutputOrigin(epipolarOrigin);
    m_RightWarpImageFilter->SetEdgePaddingValue(defaultValue);

Since the resampling produces black regions around the image, it is useless to estimate disparities on these no-data regions. We use a otb::BandMathImageFilter to produce a mask on left and right epipolar images.

    BandMathFilterType::Pointer m_LBandMathFilter = BandMathFilterType::New();
    m_LBandMathFilter->SetNthInput(0,m_LeftWarpImageFilter->GetOutput(),"inleft");
    #ifdef OTB_MUPARSER_HAS_CXX_LOGICAL_OPERATORS
    std::string leftExpr = "inleft != 0 ? 255 : 0";
    #else
    std::string leftExpr = "if(inleft != 0,255,0)";
    #endif

    m_LBandMathFilter->SetExpression(leftExpr);

    BandMathFilterType::Pointer m_RBandMathFilter = BandMathFilterType::New();
    m_RBandMathFilter->SetNthInput(0,m_RightWarpImageFilter->GetOutput(),"inright");

    #ifdef OTB_MUPARSER_HAS_CXX_LOGICAL_OPERATORS
    std::string rightExpr = "inright != 0 ? 255 : 0";
    #else
    std::string rightExpr = "if(inright != 0,255,0)";
    #endif

    m_RBandMathFilter->SetExpression(rightExpr);

Once the two sensor images have been resampled in epipolar geometry, the disparity map can be computed. The approach presented here is a 2D matching based on a pixel-wise metric optimization. This approach doesn’t give the best results compared to global optimization methods, but it is suitable for streaming and threading on large images.

The major filter used for this step is otb::PixelWiseBlockMatchingImageFilter . The metric is computed on a window centered around the tested epipolar position. It performs a pixel-to-pixel matching between the two epipolar images. The output disparities are given as index offset from left to right position. The following features are available in this filter:

Available metrics : SSD, NCC and L^p pseudo norm (computed on a square window)
Rectangular disparity exploration area.
Input masks for left and right images (optional).
Output metric values (optional).
Possibility to use input disparity estimate (as a uniform value or a full map) and an exploration radius around these values to reduce the size of the exploration area (optional).

    NCCBlockMatchingFilterType::Pointer m_NCCBlockMatcher = NCCBlockMatchingFilterType::New();
    m_NCCBlockMatcher->SetLeftInput(m_LeftWarpImageFilter->GetOutput());
    m_NCCBlockMatcher->SetRightInput(m_RightWarpImageFilter->GetOutput());
    m_NCCBlockMatcher->SetRadius(3);
    m_NCCBlockMatcher->SetMinimumHorizontalDisparity(-24);
    m_NCCBlockMatcher->SetMaximumHorizontalDisparity(0);
    m_NCCBlockMatcher->SetMinimumVerticalDisparity(0);
    m_NCCBlockMatcher->SetMaximumVerticalDisparity(0);
    m_NCCBlockMatcher->MinimizeOff();
    m_NCCBlockMatcher->SetLeftMaskInput(m_LBandMathFilter->GetOutput());
    m_NCCBlockMatcher->SetRightMaskInput(m_RBandMathFilter->GetOutput());

Some other filters have been added to enhance these pixel-to-pixel disparities. The filter otb::SubPixelDisparityImageFilter can estimate the disparities with sub-pixel precision. Several interpolation methods can be used : parabolic fit, triangular fit, and dichotomy search.

    NCCSubPixelDisparityFilterType::Pointer m_NCCSubPixFilter = NCCSubPixelDisparityFilterType::New();
    m_NCCSubPixFilter->SetInputsFromBlockMatchingFilter(m_NCCBlockMatcher);
    m_NCCSubPixFilter->SetRefineMethod(NCCSubPixelDisparityFilterType::DICHOTOMY);

The filter otb::DisparityMapMedianFilter can be used to remove outliers. It has two parameters:

The radius of the local neighborhood to compute the median
An incoherence threshold to reject disparities whose distance from the local median is superior to the threshold.

    MedianFilterType::Pointer m_HMedianFilter = MedianFilterType::New();
    m_HMedianFilter->SetInput(m_NCCSubPixFilter->GetHorizontalDisparityOutput());
    m_HMedianFilter->SetRadius(2);
    m_HMedianFilter->SetIncoherenceThreshold(2.0);
    m_HMedianFilter->SetMaskInput(m_LBandMathFilter->GetOutput());

    MedianFilterType::Pointer m_VMedianFilter = MedianFilterType::New();
    m_VMedianFilter->SetInput(m_NCCSubPixFilter->GetVerticalDisparityOutput());
    m_VMedianFilter->SetRadius(2);
    m_VMedianFilter->SetIncoherenceThreshold(2.0);
    m_VMedianFilter->SetMaskInput(m_LBandMathFilter->GetOutput());

The application PixelWiseBlockMatching contains all these filters and provides a single interface to compute your disparity maps.

The disparity map obtained with the previous step usually gives a good idea of the altitude profile. However, it is more useful to study altitude with a DEM (Digital Elevation Model) representation.

The filter otb::DisparityMapToDEMFilter performs this last step. The behavior of this filter is to :

Compute the DEM extent from the left sensor image envelope (spacing is set by the user)
Compute the left and right rays corresponding to each valid disparity
Compute the intersection with the mid-point method
If the 3D point falls inside a DEM cell and has a greater elevation than the current height, the cell height is updated

The rule of keeping the highest elevation makes sense for buildings seen from the side because the roof edges elevation has to be kept. However this rule is not suited for noisy disparities.

The application DisparityMapToElevationMap also gives an example of use.

    DisparityToElevationFilterType::Pointer m_DispToElev = DisparityToElevationFilterType::New();
    m_DispToElev->SetHorizontalDisparityMapInput(m_HMedianFilter->GetOutput());
    m_DispToElev->SetVerticalDisparityMapInput(m_VMedianFilter->GetOutput());
    m_DispToElev->SetLeftInput(leftReader->GetOutput());
    m_DispToElev->SetRightInput(rightReader->GetOutput());
    m_DispToElev->SetLeftEpipolarGridInput(m_DisplacementFieldSource->GetLeftDisplacementFieldOutput());
    m_DispToElev->SetRightEpipolarGridInput(m_DisplacementFieldSource->GetRightDisplacementFieldOutput());
    m_DispToElev->SetElevationMin(avgElevation-10.0);
    m_DispToElev->SetElevationMax(avgElevation+80.0);
    m_DispToElev->SetDEMGridStep(2.5);
    m_DispToElev->SetDisparityMaskInput(m_LBandMathFilter->GetOutput());
    //m_DispToElev->SetAverageElevation(avgElevation);

    WriterType::Pointer m_DEMWriter = WriterType::New();
    m_DEMWriter->SetInput(m_DispToElev->GetOutput());
    m_DEMWriter->SetFileName(argv[3]);
    m_DEMWriter->Update();

    RescalerType::Pointer fieldRescaler = RescalerType::New();
    fieldRescaler->SetInput(m_DispToElev->GetOutput());
    fieldRescaler->SetOutputMaximum(255);
    fieldRescaler->SetOutputMinimum(0);

    OutputWriterType::Pointer fieldWriter = OutputWriterType::New();
    fieldWriter->SetInput(fieldRescaler->GetOutput());
    fieldWriter->SetFileName(argv[4]);
    fieldWriter->Update();

Figure 10.4 shows the result of applying terrain reconstruction based using pixel-wise block matching, sub-pixel interpolation and DEM estimation using a pair of Pleiades images over the Stadium in Toulouse, France.

Figure 10.4: DEM image estimated from the disparity.

[next] [prev] [prev-tail] [front] [up]

Chapter 10Disparity Map Estimation