Big data’s need for integrative thinkers
By Bernard Sia March 4, 2013
- Knowing English, knowing mathematics and learning how to draw are a must … yes, you read that right!
- There is a need for highly integrative thinkers and multi-disciplinary capabilities to unleash power of big data
I HAVE written previously about elements of data management which should not change despite the zeitgeist of big data. Anchoring it are “Big Decisions” -- businesses have to know that they’re making the right moves, fueled by the right information; at the right time.
What is counterintuitive is the ability for businesses to articulate and define the moves in the first place; failing which, you risk experiencing routines involving two-hour meetings just to go through 50 inane reports sans outcomes.
Today, we’re going to talk about the massive changes required to ensure successful big data implementations.
Knowing English, knowing mathematics, learning how to draw
You’ve got to be kidding me, right? Well, not exactly – with English, I meant the weltanschauung and lingua franca of the trade that you’re in.
I didn’t just write German and Latin above for fun, but because industry-speak may as well be Pig Latin to a German. For example, the data elements required for Petroleum Econometrics and Clinical Informatics decisions.
Why is this important? Because big data exposes the wealth of unstructured information i.e. data strewn across documents that do not conform to the traditional columnar and row-based relational databases.
Although the technology behind big data can crunch this information, you have to almost always begin with semantics mapping and clean-up; like how someone needs to say that Cherry, Grape, Roma and Plum tomatoes are all species from the Solanum genus.
In mathematics, I am referring to statistics, the science of deductive reasoning through numerical data analysis. All this information, once extracted, needs to be followed up with sieving correlation and determining causality. You can’t do the math if you don’t know the English.
Finally, drawing means the ability to represent information graphically to ease business decision without the prejudices inherent to data presentation; like how we should ignore Facebook crime reports because based on official police data, crime rates have fallen in Malaysia. [Coff, coff – ED]
The transition to Non-Relational thinking (aka NoSQL)
Big data databases which allow for massive horizontal scaling maintains information as key-value pairs that are non-relational.
Okay, now in English.
As mentioned, relational databases today maintain information in tables, and tables are like columns and rows in Excel and relational linkages are keys which link multiple tables together. For example, you have an Excel sheet for Address and another for User Details; and the key linking them together is the UserID.
To model the information requires a more ‘business oriented’ thinking hat and encoding through a very object oriented like script. Question is, can your database administrators dust off their programming skills and wear Object Oriented lenses to make the transition?
Not only that, but also possess the “English, Maths and Drawing” skills to model and represent information for business decisions?
Managing distributed and massively scalable infra
The name of the big data game is the power of horizontal scaling; that is, the ability to maintain high resiliency by distributing data across multiple servers and having allowances for heavy number-crunching as processing is done in parallel across these servers. Whenever you require more power, just add more servers.
But here’s the kicker: Are you able to manage all these servers and how they relate to the big data technology stack? Especially when you include the rapidly mounting networking complexity as servers expand and the need to pinpoint and identify infrastructure bottlenecks in relation to the technology?
If you have an IT team sufficiently enthusiastic, you could download the open source Apache HADOOP toolset and APIs (the technology behind big data), setup HDFS (HADOOP Distributed File System) across the servers; download Apache Zookeeper to manage the server synchronization; load up Apache HIVE since you’re more familiar with SQL, etc …
… suffice to say, it is not for the fainthearted.
I’ve not used Cloudera personally, but the premise sounds good, it being one of the earliest start-ups to improve on Apache HADOOP and presumably ready it for the enterprise with the required management tools and consultants to ease you into big data implementations.
But don’t take my word for it, try it for yourself; almost all the major vendors have also built atop Apache HADOOP to provide management, visualization and reporting tools as well as integration to their respective data products.
For example, I was pleasantly surprised to find that Microsoft also has an implementation of Apache HADOOP called HDInsight.
The Galactic Empire using Rebel technology – nice!
Summary and closing
We’ve gone through the three major paradigm shifts; some would start carping and claim that it is really just two because the need to know the business is a given.
I say it is not only a ‘given,’ it is likely the only determinant of success with big data! Bjarne Stroustrup used to say that in C++ it is harder to shoot yourself in the foot, but when you do, you blow off your whole leg. With big data, you blow off your career: Big Data = Big Decisions.
Lastly, I believe that there’s a place for every technology. For now, big data would not replace plain old relational databases because online transaction processing has 30 years of a head-start to handle insertions and updates at blinding speeds, while big data solutions are meant for massive retrieval and analysis. To argue that one is better than the other is moot.
What is true though is the need for highly integrative thinkers and multi-disciplinary capabilities to unleash the power of Big Data for the enterprise.
Perhaps it’s time to go back to school ....
Bernard Sia is head of strategy at Mesiniaga Alliances Sdn Bhd. His opinions here do not necessarily reflect the views of Mesiniaga.
Malaysian systems integrators under siege
Social banking with CIMB OctoPay
As big data grows, so does the confusion it brings: Forrester
Big data spells more than just bottom-line gains: Oracle
Move over cloud, big data is here
Did big data analytics help Obama get re-elected?
For more technology news and the latest updates, follow @dnewsasia on Twitter or Like us on Facebook.