Processing stereo pairs to derive an elevation map is a very frequent feature request from our users. The otb::FineRegistrationImageFilter has been a first step toward it but it and allows to estimates disparities in a precise way (see for instance this previous post). Yet, there are several steps missing to achieve an elevation (not disparity) map using this tool. In the mean time, nowadays sensors allow to acquire Very High Resolution stereo-pairs along-track (from the same orbit, and near simultaneous acquisition). The future Pleiades sensor will even be able to acquire triplets of stereo-images. All these new sensors make use of the RPC modelling to provide highly accurate sensor-to-ground transformation.
Classical scheme
The classical scheme to compute elevation maps from these kind of data is to first resample both images of the pair into the epipolar geometry (with optional refinement of the sensor-to-ground function using GPCs and tie points) , where displacement related to height only occurs in the horizontal direction. A block-matching algorithm (just like otb::FineRegistrationImageFilter does) is then applied to retrieve the disparities. Finally, disparities are casted back to height values using sensor modelling and the elevation map is resampled to a cartographic projection. Several post-processing may be needed to make the produced elevation map usable, which are out of the scope of this post.
However, resampling to epipolar geometry is an heavy process that must be done for both images, and in the case of push-broom acquisition system, this resampling may be tricky. Moreover, this resampling is often done with an average elevation hypothesis, which leads to a wide range of disparities to explore in case of highly varying relief over the scene.
Implicit exploration of epipolar lines
In Orfeo ToolBox, the otb::GenericRSTransform allows to map points from sensor geometry to ground, but it also allows to map points between two sensors geometries : we can therefore use it to build a function that will transform a point of the first image of our stereo pair to its homologous point in the second image of the stereo pair, according to both RPC modelling (possibly refined). This function also uses an elevation value as a parameter (either average or drawn from a Digital Elevation Model), so that we in fact have a function which associate a point with a candidate elevation to its homologous point in the other image of the stereo pair. If we transform the same point with a range of candidate elevation, we are then implicitly exploring its associated epipolar line in the other image (a similar approach is performed by the MicMac software).
A new OTB filter
Using this principle, we developped a new OTB filter called otb::StereoSensorModelToElevationFilter. This filter takes a pair of along-track stereo images, and behave as follows. For each pixel of the first (reference) image:
- Extract the neighborood of the pixel in the reference image according to the user-defined radius,
- For candidate elevations in a user-defined range:
- Transform the patch grid into the secondary image with the sensor-to-sensor function and the candidate height
- Interpolate the corresponding secondary patch
- Compute cross-correlation of reference and secondary patches
- If best correlation is high enough, keep the corresponding best elevation candidate as the final elevation.
The output of the algorithm is directly an elevation map in meters, in the same geomety and grid as the reference image of the stereo pair. There is no need to resample both image in epipolar geometry, we only need to interpolate necessary locations along the epipolar lines in the secondary image. One can then orthorectify the elevation map in any desired cartographic projection.
We can see that a difficult part of this algorithm lies in the definition of the appropriate elevation exploration range. First, it requires to know about the average elevation over the area, and even with this information, elevation might be varying a lot over the scene, so this range has to be pretty wide, and lead to high computation time and lots of matching errors. Therefore, the filter allows to draw the initial elevation from a DEM (like SRTM for instance). The algorithm will then explore elevation values around this intiial local elevation, which allows to define a much more narrow range.
Results
Our first result will be a pair of Worldview 2 images over the city of Toulouse. The following results were obtained on a detail of the multi-spectral images (2.3 meters resolution) over the city stadium.
Applying the same algorithm to the panchromatic images (0.5 meters resolution) leads to noisier results, but we can still clearly distinguish the shape of the stadium. Image on the left is the correlation mask associated with the result (and produced by the filter). Dark values correspond to low correlation, brighter ones correspond to higher correlation. Please note that in this case, initial height has been subtracted from the result (filter option).
Next results were obtained from a pair of WorldView 2 multi-spectral images (2.3 meters) over the area of Aurignac, in France, an area of agricultural and natural landscape with lots of valley and hills. The two images were first properly down-sampled at a resolution of around 11.5 meters before the process was applied. The upper-left corner is more blur because it is an area which is only viewed in one of the image, and thus contains only the initial SRTM elevation.
The two images bellow shows the difference between the SRTM (on the left) and the estimated elevation (on the right).
If one does not want to use an initial DEM, the filter can start from a user-defined average elevation. Next image shows the result of using an average elevation on the Aurignac scene. Red pixels are those for which correlation fails. The result looks worse than the one using an initial DEM because in the DEM case pixels for which correlation fails are given the intial DEM value, which leads to a smooth result.
Conclusion
Of course, this block-matching algorithm is still very simple, and we do not claim that these results are better, faster or even valid with respect to those one can produce with similar tools. For instance, it is obvious that the results suffers from the well known fattening effect, and no particular filtering is performed to reduce it. The same holds for occlusions, incorrect matches or stereoscopic effects. Using some work from Ipol would be of great interest in this purpose. Still, this filter provides a very simple tool to perform elevation map estimation from along-track stereo pairs which makes use of the generic implementation of sensor-to-ground and ground-to-sensor transformation in Orfeo ToolBox to avoid the need for explicit epipolar geometry. Built-in access to DEM allows to initialise the height map with sound values and thus to narrow the search space. One can also make use of the filter streaming capability to produce only part of the image, which is more delicate with the standard scheme. Future work is to speed-up the computation, since the filter currently makes an intensive use of the sensor to sensor transform and of interpolation. Reviewers are welcome to read and comment the code which can be found here, whereas users can give it a try using the new otbStereoSensorModelToElevationMap-cli application from the latest revision.