Research ideas in AV: Domain Adaptation
2020 was a fruitful year for research in autonomous vehicles. While the self-driving car industry has shown a lot of progress over the last two years, there still remain a number of difficult problems that need to be solved to deploy AVs at scale. The ML research community plays an important role in developing innovative solutions to these problems and this blog series will deep-dive into some of the more interesting ideas being explored. In this post, we will take a look at a domain adaptation paper and summarize the takeaways.
Dense urban environments can vary a lot from city to city - from specialized street signs to unique driver and pedestrian behavior, even humans have to adapt their visual cues and driving style when driving in a new place. Furthermore, changing conditions such as weather or time or day pose additional challenges to vision based systems. While a large amount of data is necessary for reliability, it is not sufficient if the data doesn’t cover all possible conditions the system will be exposed to during inference. This is where domain adaptation is useful - it allows the system to adapt the information learnt in one domain (e.g. sunny weather) to another target domain (e.g. rainy weather) where data coverage might be sparse. This helps the system to learn the most comprehensive representation that works in all the relevant conditions. Furthermore, the ability to adapt to novel domains is essential in a dynamic world where the environment is constantly changing. In order to provide these capabilities and achieve reliability at scale, domain adaptation is essential. The paper we will discuss in this post offers a novel approach to solving open compound domain adaptation for the task of segmentation.
Discover, Hallucinate, and Adapt: Open Compound Domain Adaptation for Semantic Segmentation - Park et. al
This paper comes out of KAIST and was published in NeurIPS 2020. It addresses the important problem of open compound domain adaptation for semantic segmentation. While most domain adaptation approaches focus on a single-source, single-target setting, this paper investigates adaptation to multiple targets. In the task of open compound domain adaptation (OCDA), the training target is a set of domains without domain labels (compound target domain) and an unseen domain is used to test performance on novel domains (open target domain).
Their approach consists of three parts (illustrated in the figure above):
Discover - Cluster to discovering multiple latent domains.
Hallucinate - Hallucinate multiple latent target domains in source.
Adapt - Learn translations between domains adversarially
The Discover step aims to make implicit target domains explicit by applying K-means clustering on the mean and standard deviation of the convolutional features (referred to as style) of the target images. This results in K latent target domains.
Next, in the Hallucinate step, an image-translation network is applied to the source domain images with the styles from the previous method as input. This results in K hallucinated images for each source image. The purpose of these hallucinated images is to preserve the content of the image while making them stylistically similar to the target domains.
Finally, in the Adapt step, these hallucinated images and corresponding target images are passed through a (single) segmentation model and K discriminators attempt to identify the real target image from the outputs. The discriminators are then trained adversarially with the segmentation network and style transfer network. The translated source data is also added to the training set to better enhance the model’s ability to function in all K domains. A minor note here is that while it is possible to train these networks end-to-end, in their experiments, the authors trained the hallucination and adaptation step separately.
This figure from the paper illustrates the learnt domains and the utility of the hallucinated source images. On the left, are samples of images from the target domain arranged in columns. On the right, are examples of what the source image looks like after being translated to the style of an exemplar target image. Although the domains look very diverse visually, the style transfer network is still able to produce realistic images.
The authors demonstrate results on the C-driving dataset that contains real-world examples from several domains such as cloudy, rainy, snowy and overcast conditions (subset of BDD100K). The source dataset they use is the synthetic GTA5 dataset. For evaluation, they consider the rainy, snowy and cloudy domains as part of the compound set while the overcast domain is considered as the open domain. In addition to showing results for their approach, they provide comparisons to several unsupervised domain adaptation and open compound domain adaptation approaches. As shown in the figure above, the approach showed an absolute average gain of around 6% on the compound and open domains.
In addition, they also measure the alignment bias of the method. Traditional unsupervised domain adaptation methods tend to suffer from biased-alignment, where target domains close to the source domain perform better than those further away. For example, if the source domain consists of daytime images, daytime images from the cloudy domain are closer conceptually and visually to the source domain than night time images. To test biased-alignment, they compare results of the cloudy-daytime, snowy-daytime and rainy-daytime domains to the dawn and night time domains. They show that their approach performs consistently well on all these domains compared to other baseline methods.
This paper introduces a novel approach to train a model to perform well under multiple domains. This approach works well in real world use cases. Furthermore, the setup also uses synthetic data as its source domain, which illustrates how synthetic data can be used in conjunction with domain adaptation to develop a larger effective training dataset. In addition, this approach works well in conjunction with location-specific networks as the compound domains can be customized to better fit the distribution observed at the given location while still allowing for distribution-shift (open domain).