Video surveillance: an AI capable of reading lips

Researchers have managed to use a voice-recognition system to teach artificial intelligence to read lips. This task represents a real challenge for both computers and humans.

Reading on the lips represents a real challenge, both for humans and for artificial intelligence. In 2016, AI Deepmind of Google was able to exceed the performance of a professional but had reached only 46.8% success rate, compared to the still 12.4% success rate for humans in the same circumstances.

A team of researchers from Zhejiang University in China, Stevens Institute of Technology in the United States, and Alibaba, has developed a new approach, using voice recognition systems to improve training of artificial intelligence. This system, called Lip by Speech (LIBS) , allows AI to learn to recognize much more subtle cues in the movement of the lips.

Speech recognition to train lip reading

The researchers used pre-established databases , the LRS2 – with 45,000 sentences in English from the BBC -, and the CMLR – with more than 100,000 sentences in Mandarin. Despite an error rate of around 10%, the use of voice recognition allows fine analysis of the videos, which drives the Libs system both in terms of whole sequences or sentences and image by image.

This new approach was able to reduce the error rate by 7.66% in Chinese and 2.75% in English compared to previous methods. The improvement is even more noticeable when training data is limited. Researchers plan to apply this method to teach an AI to interpret sign language.