Monday, April 20, 2020

Perceptual Loss / VGG Loss Function - Is this the Magic Behind Magic Pony Technology Hundred-Millions Acqusition?

MagicPony is a deep-learning startup that got acquired by Twitter back in 2016. There wasn't much written about what they did in order to be valued at $150 millions, except that they have something to do with better video compression technology. From poking around online and looking at the Zehan Wang's -- MagicPony's CTO -- background, it seems Deep Learning-based Super Resolution is at the heart of their technology. This seems to be further confirmed by the paper published after the acquisition happened: Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.

 Deep learning-based Super Resolution is not a novel technique. There have been several papers published on them that I read back in 2013. But there were several challenges, such as the slowness of algorithm FPS and that the result isn't much better than a conventional bicubic interpolation. What MagicPony seemed to have been able to achieve is a way to do it faster and much better.

The paper published by MagicPony describes a GAN-based technique to do Super Resolution. One of the most interesting aspects of the paper is on how they use Perceptual Loss / VGG Loss in order to compute the lost function between original high-res image vs. the deep learning-upscaled image. On the paper they mention it as one of the key factors to get significantly better result. So what it's? It's one of the convolution layers taken from VGG-16 model pre-trained on ImageNet dataset. There was another paper that dissects, and tries to understand the different layers on VGG-16 model by trying to visualize each of the layers. They found out that a particular layer can be leveraged for a better way to perceptually compare two images (i.e. versus using naive pixel-based MSE loss function)

I can help to wonder about the following:
1. Would a more traditional CNN-based Super Resolution technique (i.e SRCNN) yields as good as a result by simply swapping its loss function with VGG loss function?