AI that enables the blind to ‘see’
By Anushia Kandasivam November 27, 2017
- AI tells users what and who is around them and reads text in real time
- Technology inclusiveness is important, as is humans and AI working together
SAQIB Shaikh (pic), a software developer at Microsoft based in London, England, has developed an artificial intelligence (AI) that, leveraging on visual recognition, speech intelligence, advanced machine learning and other leading-edge technologies, enables visually impaired people to ‘see’ the world.
Called Seeing AI, the AI has been built into an app using APIs from Microsoft Cognitive Services, the app runs on smartphones.
How it works is the user points their smartphone camera at the person, object or scene they want to ‘see’ (the AI directs the user on how to accurately line up the camera), the AI snaps a photo, and then describes in detail what it sees.
The user can snap a photo of a person sitting in front of them and the AI will describe the person in terms of age, gender, appearance (hair colour) and perceived facial expression – happy, sad, surprised and so on.
The user can tag the photo with the person’s name so that the AI can identify the person by name the next time, which is particularly useful to identify family members or friends or when searching for a particular person in a crowded space or in a park.
The app can also read text and documents in real time. Technology that reads text has been around for quite some time, but this app does not require the user to take a photo of the text, a task that can be quite challenging for a visually impaired or blind person.
Instead, the AI directs the user on the positioning of the camera by detecting the edges of the document and then, when it detects that the camera has been correctly lined up, it snaps the photo itself, processes the image instantly and starts to read the text.
The first step
Shaikh, who himself is blind, having lost his sight at the age of seven, says that he has always liked making things since he was young.
“Having a visual impairment meant that I wondered whether technology could do the same thing that my friend would do when we were walking around together,” he says, adding that this idea was with him for many years before he decided to do something with it.
His chance to create the technology came at one of Microsoft’s annual hackathons, where Shaikh usually spent time working on personal projects with his colleagues.
He realised that as Microsoft had developed various AI building blocks, he could put it together in an interesting way that, while it would not tell him exactly what was going on around him, would be a first step to getting to that point.
The app is still in its first phases. Users are currently unable to teach the AI themselves. This does limit the AI somewhat, and it is not always accurate; it sometimes does say “I don’t know.”
However, Shaikh says that teaching the AI may be a possibility in the future. “The cool thing is that there are really talented scientists around the world working on this technology to make it even better.”
According to Shaikh, one of the more interesting problems he and his colleagues had to solve when first building the app was how to have the AI direct a blind user, to whom using a camera was a new concept, on how to line up the camera accurately.
Shaikh says it is difficult to quantify how much code went into making this powerful AI but explains that when creating AI, it has to solve the two parts of every problem: “We have to make an AI that can do things and that can work together with humans. This world is a human world. If you can get machines and people working together, that combination is even more powerful,” he says.
When it comes to the combination of Seeing AI and the people using it, the different ways in which different people use it continue to surprise Shaikh, from a blind person using it to watch a foreign movie by having it read the subtitles to a sighted person using it to read hard-to-access serial numbers.
Shaikh is a strong believer in the inclusiveness of technology, saying that when creating software, it is important to not only think about who you are making the software for but also who won’t be able to use it if you do it the way you are doing it.
“If you recognise your exclusions, you can embrace diversity, learn from that person [you are excluding] and make something that everyone can use. It’s really important to talk to people who will be using the software before you make it. I wish this was something everyone would think about.”
Shaik has an interesting way of thinking about disabilities. He explains that a disability may not be something to do with the person but with the environment. “There are people who are disabled just for a while, like when you break your arm. Or just for a situation, like a person with a baby and a dozen carrier bags or a person on a work site using a drill with two hands and it’s so noisy you can’t hear.”
“There are different ways of thinking about disability that you should just include everyone, and that's going to help people in ways you never thought. I want all tools to be inclusive. That is my dream.”
As for inclusiveness in terms of the future of work, Shaikh does not believe that technology by itself will drive inclusiveness and equality but rather, it the combination of technology, environment and people that will create a diverse workforce.
“We can hope that the future will be more equal. There are many good things we can do with technology; it’s about how you best use it.”
When asked what technology he is working on to inform this future, all he says is “We are always working on new ideas.”
Seeing AI is currently available for free on the Apple iStore in nine countries, including the United States, Canada, India, Hong Kong, New Zealand and Singapore.