Shadow Removal - Part 1


In a photogrammetry project that I was working on, one consistent issue that I came across was that shadows present in images were responsible for innacurate feature matching. So I started to see if there are shadow removal models, and as usual its a subfield of its own in image enhancement techniques!

I briefly went over 2 papers - RASM and ShadowFormer currently top performing models on Papers with code benchmarks.

Both the models take as input an image with shadow, mask of the shadow and output a shadow-free image. Both being transformer based models, concentrate on how to effectively use attention between shadow regions and non-shadow regions. In short, ShadowFormer divides the feature map (obtained by using channel attention modules over the image) into non-overlapping regions and add an attention mechanism over these regions to act as global attention (it simply means we have a relatively large receptive field for each of the regions) - idea is to enable information sharing across the entire image, to remove the shadow. And RASM argues that instead of having such a ‘global’ attention, there should be a higher weight to attention with neighbouring non-shadow areas. It uses concepts from another paper Neighbourhood Attention Transformer to find the attention map. (I’m not yet sure about implementation details but it seems it is an attention mechanism for each pixel with another pixel within a region r).

At this point I wanted to see what would be the simplest baseline approach. And I didnt want to use a shadow mask as input. Lets learn both together. So, one approach I could think of - I need a strong feature extraction model, so DINOv2 and we need to be able to do dense prediction, so lets go with a DPT head and see how it performs. The idea is simple, we will predict both a mask as well as a residual image that has to be added to remove the shadow. Lets keep our loss simple - so just the Photometric reconstruction loss (not even using Perceptual loss 😂)

After training for around 10,000 steps on a single A40 GPU, I get the following results - Training loss Output samples

Well, it isnt great. I can clearly see the shadow still but considering that it ran only for 10,000 steps without any extra mechanism for shadow removal, its interesting. I also found out about the Shadow Removal challenge, even though I’m barely familiar with this topic, maybe I can give it a go, if I get time 🤞.