Hadoop likely to become de facto standard for analytics in big data industry
Still some way to go in SEA; lack of skills but vendor tools appearing to help
BIG data has been in the news in the past year and for good reason too. According to IDC, the big data technology and services market in Asia Pacific (excluding Japan) is expected to grow from US$258.5 million in 2011 to US$1.76 billion in 2016, on the back of a 46.8% five-year compound annual growth rate (CAGR).
Defined as huge sets of data comprising both unstructured data (not neatly classified into rows and columns) and structured data, big data cannot be processed by traditional databases and legacy software.
Recent advancement in database technology however means that big data can be tackled with newer software, with Hadoop being one of the most prominently headlined in the past year.
Hadoop is an open source software framework that supports data-intensive distributed applications, and has its roots derived in Google’s MapReduce and Google File System research papers.
The Java-based Hadoop is designed to run on a large number of machines (as opposed to a single large server) that don’t share any memory or disks. When large data sets need to be processed, Hadoop breaks up these complex data sets and spreads them over many servers, which can be architected using commodity-based components.
One of the main advantages of Hadoop is its ability to work in a distributed nature and keep track of where all the data is, thus giving it the power to query complicated computational loads. Because of its parallel processing nature, Hadoop is able to index the data, send out codes to process the data on dozens of servers, and return the results back as a unified whole.
While Hadoop shows much promise to enterprises grappling with their big data needs, its benefits to these companies, especially in South-East Asia, are still not yet apparent.
But according to Miamisburg, Ohio-based data warehouse and analytics software player Teradata, one industry vertical that has made some progress, at least experimentally, is the retail industry.
Cesar Rojas, solutions marketing director for data science and Hadoop atTeradata Labs, says retailers generate large amounts of in-store, online data and social networking data, and Hadoop has the ability to easily capture and refine online logs and social networking sentiment data sets to create new data structures with a simpler and more readable data schema.
“These new data structures are readable by analytical platforms that interface with Hadoop,” he says in an email interview. "The platforms can combine not only data gleaned from online and social data but also that from in-stored structure data in order for its users to get richer information about the buying patterns and behaviours of their retail customers.”
Impediments to adoption
One of the biggest challenges for enterprises wanting to adopt Hadoop as part of their advanced analytics needs is the lack of skill sets and understanding needed to make implementation work.
Tony Baer (pic), principal analyst of software for enterprise solutions at London-based research firm Ovum, notes that there are remarkable parallels between the emergence of Hadoop markets and that of business intelligence (BI) and data warehousing (DW) before it, with one exception.
In a research note, he says, “BI and DW emerged because client/ server architectures freed data formerly locked away in applications, which in turn spawned appetite for new ways to use the data.
“Conversely, Hadoop emerged because there was a need for new technology to solve a growing problem – existing data platforms couldn’t adequately support issues such as advertisement placement, search optimisation, or similar processes for Internet companies.
Baer believes that the trajectories of market emergence are otherwise similar: Vendor ecosystems forming ahead of the ability for mainstream enterprises to implement those technologies. Such ecosystems are necessary to make it safe for enterprises to invest, he notes.
Additionally, Baer says there is a lack of skills or understanding of how to handle and work with data in Hadoop; the same was true for BI and DW back around 1996.
“BI/ DW markets didn’t take off until a vendor community formed to deliver products for consuming data warehouse data (e.g., Business Objects, Cognos, Hyperion, and MicroStrategy), and transforming and integrating it (e.g., Informatica).
“[Also needed was] a consulting and training community, for example in systems integrators, for implementation,” he says.
Next: Still some way to go