Lip-reading computer raises privacy fears
Automated system can read lips with 76 per cent accuracy – and more powerful machines are on the way
A Jordanian scientist has created an automated lip-reading system that can decipher speech with an average success rate of 76 per cent. The findings, in conjunction with recent advances in the fields of computer vision, pattern recognition, and signal processing, suggest that computers will soon be able to read lips accurately enough to raise questions about privacy and security.
How long have humans read lips?
According to Technology Review, "almost everybody uses lip-reading to a certain extent". In ordinary conversation, even people without hearing difficulties rely to an extent on visual information. This explains why it is possible to understand people in loud environments, and on television with the sound muted.
The art of lip-reading is believed to date back to 1500 AD. The first recorded lip-reading teacher was a Benedictine monk named Pietro Ponce, who died in 1588. The German Samuel Heinecke created a school for lip reading in Leipzig in 1787 and the first known lip-reading conference was held at Chautauqua, USA in 1894.
How does lip-reading work?
Human communication is divided into speech sounds, or phonemes, and their corresponding facial and mouth position, or visemes. Lip readers attempt to interpret speech only from reading visemes, which presents challenges because there are many more phonemes (between 45 and 53) compared to visemes (between 10 and 14). This makes it difficult to capture some words with visual information alone, because one mouth shape could cover a number of different words.
How might technology be able to assist?
Researcher Ahmad Hassanat from Mu’tah University in Jordan says that automated lip-reading has improved enormously in recent years, but there are still challenges in making software that can accurately connect visemes with phonemes. He says human lip readers perform best when they have an idea about the context of a conversation and a good grasp of grammar, idioms and common turns of phrase. Making a computer program that can accurately recognise these will take time, Hassanat says.
Why could lip-reading technology be useful?
Technology that can read lips has a broad range of potential uses in human-computer interaction (a discipline that helps design new input systems to make it easier for people to control their devices), speaker recognition, sign language, and video surveillance.
Hassanat proposes that lip-reading technology could be used to help protect data by creating a "visual password", whereby users speak a string of words into their device's camera to help verify their identity online.
However, the idea of using lip-reading technology in surveillance "raises a whole set of privacy-related issues", Technology Review suggests. "For example, it may be that videos of conversations without sound are impossible to interpret now but may be easy to interpret in future. How might politicians, business leaders and popular figures fair under that kind of future analysis?"
Hassanat concedes that it will be many years yet before visual speech recognition software is able to interpret speech with significantly greater accuracy than at present.
Image from Christine Roth