Singapore embraces AI with open source libraries and talent development
By Kiran Kaur Sidhu February 6, 2019
- Open-sourced speech corpus and sensing toolbox to accelerate AI development
- Essential to upskill and prepare Singapore citizens to work within the AI realm
THE economy of Singapore thrives on the back of the nation’s efficient services industry, especially since the industry makes up 72% of the country’s gross domestic product and 74% of national employment. With the benefits of automation embraced widely, Singapore has identified artificial intelligence (AI) as one of the frontier technologies to power its digital economy.
The Info-communications Media Development Authority (IMDA) of Singapore has announced several initiatives to spearhead the development of AI areas while the AI Singapore agenda focuses on talent development in the area.
The National Speech Corpus V1.0 and Intelligent Sensing Toolbox
The National Speech Corpus (NSC) was released on Nov 22, 2018 after it was first announced a year prior. The corpus contains 2,000 hours of locally accented audio along with its corresponding text transcriptions, and over 40,000 unique words which include local words such as “Tanjong Pagar”, “ice kacang” and “nasi lemak”.
Through this effort, IMDA hopes that local companies can grab a slice of the burgeoning global speech and voice recognition market, which has been tipped to be worth US$18.3 billion (S$25 billion) by 2023. Instead of individual companies allocating budgets and collecting their own suite of data, this initiative provides local companies with fertile playing ground to experiment and develop voice technology.
“The National Speech Corpus has been released as part of an effort to improve the accuracy of Automatic Speech Recognition (ASR) technologies for Singapore. With the National Speech Corpus, speech engine developers will not need to collect voice data themselves,” said a spokesperson from IMDA.
Since its release, IMDA said that companies or entities who have a main business developing speech engines have downloaded the NSC to perform trials on their engines. “They may aim to develop models specifically for Singapore accented English, or develop General English models that are adapted and includes the Singapore English Accent. Once such models are made available to the industry, industry specific speech use cases can begin to proliferate.”
When asked if machine learning would render the speech corpus redundant sooner or later, IMDA commented: “Without this basic dataset – which is what a speech corpus can be equated to – it would be more difficult for higher-level software – such as Siri or Alexa – to ever ‘learn’ the right accents. It is thus critical that the National Speech Corpus be created to enable them to learn the local accent.”
“Finally, by creating an open corpus, we enable more companies to innovate without having to cross the data hurdle repeatedly. Generating a quality speech corpus is costly in terms of manpower, resources for recordings, and other factors. These can be alleviated if a central open corpus is available.”
Additionally, the IMDA has also introduced the Intelligent Sensing Toolbox (IST) which is a suite of open-source AI tools and technologies for Big Data analysis. This sense-making AI algorithm will offer business a plug-and-play open source code that can be adapted and layered on top of its existing data analytics system to help it make better decisions.
The founder and chief executive officer of The Centre of Applied Data Science (CADS), Sharala Axyrd (pic, above) commented saying that this is expected as more and more companies are developing drag and drop solutions. “This solution is not a surprise. My only guess is that it would take a longer time to achieve the perfection we expect because an AI tool is only as good as how much data it gets.”
Through the Singapore Open Data Licence, the data is made available and interested parties can request for a copy here. Meanwhile the algorithm for the Intelligent Sensing Toolbox is available here.
The AI Singapore programmes for talent development
In Singapore, the IMDA believes the AI market has the potential to become a US$960 million market in 2022. With AI identified to be a major tenet of the nation’s digital economy, it is essential that preparatory and upskilling programmes are in place for Singaporean citizens.
Under the AI Singapore (AISG) national programme by the National Research Foundation, the AI for Everyone (AI4E) and AI for Industry (AI4I) programmes were unveiled in August 2018. Beyond that, AISG has also introduced its 100 AI Experiments (100E) programme for companies with a significant industry problem and the AI Apprenticeship Programme (AIAP).
The director of AI Industry Innovation, Laurence Liew, shared: “Our 100 Experiments is a programme where we match the industry that has an AI problem to solve, equipped with datasets and manpower, as in their own IT or software engineers. If we approve the project, we will assign a professor from within our six partner universities and assemble an engineering team.”
Through the 100E programme, an organisation can propose problem statements where no commercial off-the-shelf AI solution exists but can potentially be solved by Singapore’s ecosystem of researchers and AI Singapore’s engineering team within nine to 18 months.
The 100E programme also enables participants of the AIAP to receive on-the-job training by working on a real-world AI problems. The AIAP is a nine-month, full time programme consisting a hybrid of classroom and online training aiming to train up to 200 AI professionals in batches of 20 to 30 trainees per intake, over the next three years.
To be eligible for the AIAP, a trainee must be a Singapore citizen holding a degree qualification in ICT, STEM or related disciplines; or a fresh professional within three years of graduation.
“These AI apprentice are Singaporeans that may have anywhere from one to 15 years of experience. Most of them will not have had formal education in AI or machine learning but would have studied the subject matter on their own over the past few years,” Liew explained.