95% of data is ‘out there’ and unstructured
Big data allows businesses to find hidden relationships and act on them
IF 2011 was the year of the cloud in the vendor space in Malaysia, with both Microsoft and Google launching major campaigns to get Malaysian businesses on Office 365 and Google Apps respectively, this year the focus may shift to “big data.”
Or at least, Oracle Corp hopes so, since this latest technology trend – or buzzword, if you’re the cynical type – falls firmly into the database company’s area of speciality.
The analyst firms are all on it too. McKinsey describes it as “the next frontier for innovation, competition and productivity,” while IDC says it “will earn its place as the next ‘must have’ competency in 2012.”
Speaking its benefits, Gartner says that “organizations integrating high-value, diverse, new information types and sources into a coherent information management infrastructure will outperform their industry peers financially by more than 20%.”
As with cloud computing, every vendor has its own definition of big data, depending on the kind of solutions it offers. Oracle’s short definition is “all that data you do not have in your data warehouse,” says Andrew Lim, its vice president of Technology Sales for Asean. “It’s most of the data that is ‘out there’.”
This includes spheres that the IT departments in businesses generally ignore: weblogs, social networks (Facebook and Twitter), video and images, and online forums where customers may be ranting or raving about your products.
Big data, at least in Oracle’s universe, is defined by three Vs: Volume (extremely large quantities of data), Velocity (very fast streams of data – blink, and that tweet is gone), and Variety (very diverse types).
Large enterprises, especially those with large IT departments and budgets, have been making good use of structured data: The stuff in their data warehouses, like sales and billing information that is well known and hardly changes, increasing in only incremental amounts.
But by Oracle’s estimates, 95% of data is of the “out there,” unstructured variety, information outside your boundaries. Analyst firm Forrester puts it at 80%, but whatever the difference, there is a lot of it. And it is growing at a tremendous pace, with estimates that between 2009 and 2020, the size of the “Digital Universe” will increase 44-fold. That’s a 41% increase in capacity every year.
“It’s not just there is a lot of data out there, but that a lot of it has little or no value. The challenge is filtering it and finding the data that is of value to your business, then making use of it,” says Lim (pic). “You need to find hidden relationships between the data and be able to monetise or act on it.”
“Finding hidden relationships” may sound very much like what data-mining vendors touted years ago, but that doesn’t make it any less relevant today.
Big money in big data
To make use of the unstructured data, organizations must be able to integrate it with their structured, data warehouse type of information.
Oracle believes it has just the ticket with the Oracle Big Data Appliance, an engineered system that it says combines optimized hardware with a comprehensive software stack developed to deliver a complete, easy-to-deploy solution for acquiring, organizing and analyzing big data.
The system was “architected to rapidly turn around ad hoc information requests, which otherwise take weeks, thus making businesses agile and helping customers control IT costs by pre-integrating all hardware and software components into a single big data solution that complements enterprise data warehouses,” says Lim.
This approach relieves customers of the integration involved in assembling a suitable set of hardware and software components to create big data architecture, he adds.
Oracle Big Data Appliance incorporates Cloudera’s Distribution including Apache Hadoop with Cloudera Manager, plus an open source distribution of R that has been enhanced to run the Oracle database. Running on Oracle Linux, the system also features Oracle NoSQL Database and Oracle HotSpot Java Virtual Machine.
Oracle has been doing R&D on “engineered systems” for years, but this took on a new dimension when the software company acquired one-time server and workstation hardware leader Sun Microsystems in 2009.
While that acquisition has become an area of scrutiny and contention in on-going legal action between Oracle and Hewlett-Packard in the United States – HP earlier this week released documents in which Oracle sales executives described Sun hardware as “pig in lipstick at best” – others believe that the Sun acquisition and the resultant tight integration between hardware and software accelerated important engineered systems developments within the company.
One of them is Christopher G. Chelliah (pic), vice president of Exadata and Strategic Solutions at Oracle Asia Pacific.
Engineered systems include appliances such as the Big Data Appliance above: Preconfigured hardware and software systems that help to increase time to value and reduce the total cost of ownership.
The Exadata systems, first announced by Oracle founder and CEO Larry Ellison in 2008, “consist of pre-integrated servers, network fabric and storage which are rated and engineered for specific workloads,” he says.
“That ‘specific workloads’ aspect is very important, and a key differentiator for us,” he claims. “Typical appliances are like your Big Mac Meal, consisting of standard items in a package. Exadata systems are like your cordon bleu meal, with every item optimized.”
Exadata works by putting more intelligence into the storage space so that much of the processing can be done there, removing a lot of the back-and-forth that characterizes the typical database query.
“The traditional way was very much like looking for a needle in a haystack every time you made a database query, unnecessarily using up computing resources. Intelligent storage flips the equation, and makes Exadata faster than anything else out there,” he claims.
Exadata has since become the fastest-selling product in Oracle’s history, and big data is going to be a key driver for it going forward, says Chelliah.
“Our approach to big data has four key stages: Acquire, Organize, Analyze and Decide,” he says. “We’ve got them covered.”
The Oracle Big Data Appliance takes the user from the Acquire to the Organize stage, Oracle Exadata Database Machine takes the user from the Organize to the Analyze stage, and the Oracle Exalytics In-Memory Machine takes the user from the Analyze to the Decide stage.
Big data here
Malaysian businesses aren’t laggards when it comes to this “next big thing,” the two Oracle executives say.
“I am surprised at the level of interest Malaysian companies have in big data, mirroring that of more advanced economies in Asia Pacific,” says Chelliah. “We don’t have to go in trying to sell big data, they’re coming to us and demanding to know what our big data strategy is.”
“My discussions with customers can go straight to the point: What are you doing with your data now? What kind of data would you like to have? If you had that latter kind of data, what would you be able to do?
“Previously, database discussions were mainly about how to save you money; now they’re largely about how to make you money,” he says.
Neither he nor Lim were willing to put numbers on how big the big data market in Malaysia can be for Oracle, but both agreed that early adopters and drivers would hail from the telecommunications, public sector and financial services areas.
“However, in the end it will depend on the level of customer readiness,” says Lim. “The Big Three in the telco space, for instance, have already invested tremendous amounts in the data warehousing space, and would be able to move to big data relatively quickly and easily.”
“It also depends on the chief information officer (CIO) in an organization,” he adds. “Many, I’m happy to say, look at big data as a way they can contribute to the business by analysing how data can affect bottomlines.”
Chelliah notes that McKinsey believes that the initial uptake for big data will come from telecommunications companies, who can use churn predictions to optimise special offers and services to customers quickly.