Addressing the data scientist glut
By Dr Shawn Tan December 22, 2014
- Ensure there is work for returning data scientists, faculty structure needs revamp
- Government should lead by example – too much data is classified ‘secret’
BIG data seems to be the buzzword of the year. Governments and corporations all over are rushing either to address this issue, or to capitalise on it.
Even ordinary folk know that big data is going to have a big impact on how we make decisions and choose to run our lives in the future.
My attention was recently brought to the Malaysian Government’s rollout of its big data framework. Among the things highlighted is the issue of talent. According to the article, our Government has taken certain steps to ensure that we will have the necessary talent to deal with big data.
First is the policy to “provide scholarships for government officers to take up postgraduate courses in data sciences at reputable universities so that they can be equipped as data scientists in the country.”
This old strategy has been used to develop talent for fields that are critical for our nation.
However, there are a couple of problems with it. On one hand, some of these people do not return as their skills are equally valuable elsewhere. But we should not stand in the way of personal progress, as long as they refund the tax dollars spent on their education.
But the other problem is that, those who do return to serve are not put in a place to do what they were trained to do. Government scholars have returned with PhDs from top universities, but they were not assigned to work in the areas where they could contribute most effectively.
While sending our people to the best institutions to study data science is a step in the right direction, ensuring that their skills are put to proper use after that is the challenge, particularly for a government that often pays mere lip service to the issue that has resulted in more than a million moving on to greener pastures.
We will need to institutionalise systems that encourage development so that these returning scholars continue to flourish in Malaysia. Otherwise, they will leave just as quickly as they have returned, due to frustration.
Unfortunately, our track record shows that we do not always give scholars the freedom to go where their scholarship takes them.
Big data is going to reveal a lot of things that we may or may not want revealed. That is the true power of big data – to draw inferences and relationships from data that was unrelated.
We have all read of what happened to one professor who merely worked with surveys and polls, not big-data.
Second, the Government is “ensuring universities in Malaysia start offering data science degrees, and not just as electives within their computer science courses.”
My concern is that while we may have good lecturers, they may not be in a position to teach these courses.
Looking at existing data science curriculum, it is a multi-disciplinary programme that spans mathematics, computer science and even engineering. These are separate faculties in a university. Internal structure and politics may be a problem.
But let's put aside the internal politics.
Accreditation often requires that lecturers who teach in a faculty to have a degree in that field. For example, one needs to have an engineering degree to teach in an engineering faculty, and so on.
This has already resulted in some rather creative hiring practices at universities.
Some lecturers are caught in a limbo. While they teach in one faculty, they are being parked under another faculty. As a result, their work in one faculty risks not being recognised when it comes to promotions and reviews under the other faculty.
We might need to establish a new faculty for data sciences, or to amalgamate these different faculties together.
But the simpler solution would be to change accreditation requirements so that faculty members do not need to be members of any specific field.
Since jobs are becoming increasingly multi- and cross-disciplinary, taking in faculty members from various fields may encourage the cross-pollination of ideas and hybridisation of skills that may just produce inventive results.
Thirdly, the Government wants everyone to “adopt big-data as a first-mover advantage in their own respective industries.”
The Government should really lead by example. While one can make the argument that corporations produce and consume a lot of data, governments produce the most data and can benefit most from said data.
Therefore, our Government needs to make data readily available, and that’s where the biggest challenge lies. Currently, much government data is often classified as secret by default, or released post-analysis.
Ideally, the raw data should be made available, which would allow the private sector to innovate the data into useful products and services.
I hope that our new chief data scientist, whoever he or she turns out to be, will make it a priority to mobilise our Government, at all levels, to produce and release all available data in a machine-friendly format, and not as mere printed matter nor pretty PDF downloads.
Otherwise, my concern is that when all these government scholars return to serve, and the local graduates from these shiny new programmes hit the market, there may not actually be any available big data to crunch.
Dr Shawn Tan is a chartered engineer who has been programming since the late 1980s. A former lecturer and research fellow, he minds his own business at Aeste while reading Law. He designs open-source microprocessors for fun. He can be reached via Twitter as @sybreon.
Related Stories:
Malaysia’s big data aspirations and the talent gap
Malaysia's big data framework rolls out
Singapore’s first data science academy launched
MDeC, Tentspark kick off big data analytics/ open data app challenge
For more technology news and the latest updates, follow @dnewsasia on Twitter or Like us on Facebook.