An Image to Image Translation Model for Road Network Extraction in Remotely Sensed Images
Final Project for Remote Sensing
Read the report here

The goal of this project was to train a Generative Adversarial Network (GAN) to extract road networks from satellite imagery and assess its performance relative to pixel-wise classification methods. The data I used was 4 band (Red, Green, Blue, Near IR) imagery with a 3m resolution from Planet Labs. Throughout the Remote Sensing course, we often used indices such as NDVI or NDWI for land cover classification. With indices like these, a value is calculated for each pixel that is a function of the band values at that pixel. This value can then be thresholded to classify a certain type of land cover. A shortcoming of this method is that no local context is used to inform the classification of each pixel. By using a discriminator with convolutional layers, I hoped to achieve superior performance to pixel-wise classification through the incorporation of some of this local context in the classification process.


The first challenged I faced was producing training data. Acquiring 4 band imagery from Planet was relatively simply, but producing accurate labels for the roads was more difficult. I ended up using the OpenStreetMap API to retrieve all of the roads within the Planet Scene, and then drew binary road masks for each scene.

Satellite Image of Western Massachusetts Road Map Mask of Western Massachusetts

The image on the left is a Planet image of Western Massachusetts. The image on the right is a mask of the roads in the Planet scene, constructed from OpenStreetMap data.

The architecture I used for my GAN was based off of the Pix2Pix model. The generator is a U-Net style model with skip connections between the downsampling and the upsampling layers, while the discriminator is a convolutional network that separately classifies patches of the input image as either real or generated.

For comparison, I also trained an SVM with an RBF kernel to classify each pixel as road or not road, taking the band values of that pixel as inputs. Since the RBF kernel effectively projects the 4 band data into a much higher dimensional space, the SVM should give a rough upper bound for the sort of performance that can be expected from a classifier that only considers pixel band values and not scene geometry.


Road Map Mask generated by RoadGAN Road Map Mask generated by a trained SVM

The left shows the road mask generated by RoadGAN for the Planet image above. The right shows the road mask generated by the SVM.

RoadGAN and the SVM had very different performances. RoadGAN had a higher precision than the SVM, but a lower recall. However, when the F1 scores (a combination of precision and recall) are compared, RoadGAN outperforms the SVM. The SVM also took significantly longer to classify a scene since it had to classify each pixel independently. Due to limited time and computational resources, I'm not confident that RoadGAN was trained to convergence. With additional training and data it would likely show improved performance.

Scene Precision Recall F1 Score
Western Mass. 1 0.287 0.176 0.218
Western Mass. 2 0.595 0.182 0.279
Western Mass. 3 0.474 0.192 0.274
Western Rhode Island 0.347 0.254 0.293
Average 0.480 0.189 0.271
Scene Precision Recall F1 Score
Western Mass. 1 0.114 0.257 0.158
These tables show the performance of the RoadGAN model and the pixel-wise SVM classifier. The SVM was only run on one scene because it was very slow.