IBM makes major commitment to Open Source BDA platform Apache Spark

By Digital News Asia June 19, 2015

Joins Spark community, plans to educate more than 1mil data scientists
To embed Spark into its analytics and commerce platforms, offer it on IBM Cloud

IBM makes major commitment to Open Source BDA platform Apache Spark

IBM Corp has announced what it said was a major commitment to Apache Spark, describing it as the most important new open source project in a decade.

The company said it plans to embed Spark into its analytics and commerce platforms, and to offer Spark as a service on IBM Cloud.

In a statement, it said:

It will also put more than 3,500 researchers and developers to work on Spark-related projects at more than a dozen labs worldwide;

Donate its IBM SystemML machine learning technology to the Spark open source ecosystem; and

Educate more than one million data scientists and data engineers on Spark.

IBM is one of four founding members of the UC Berkeley AMPLab, where Spark was first invented in 2009, and as a result participates in multi-day research retreats, provides advice and real-world insight, and interacts closely with AMPLab researchers on projects of mutual interest.

As data and analytics are embedded into the fabric of business and society – from popular apps to the Internet of Things (IoT) – Spark brings essential advances to large-scale data processing, IBM said in its statement.

First, it dramatically improves the performance of data dependent apps. Second, it radically simplifies the process of developing intelligent apps, which are fuelled by data.

“We believe strongly in the power of open source as the basis to build value for clients, and are fully committed to Spark as a foundational technology platform for accelerating innovation and driving analytics across every business in a fundamental way,” said Beth Smith, general manager, Analytics Platform, IBM Analytics.

“Our clients will benefit as we help them embrace Spark to advance their own data strategies to drive business transformation and competitive differentiation,” she added.

As an example, IBM, NASA (National Aeronautics and Space Administration), and the SETI Institute are collaborating to analyse terabytes of complex deep space radio signals using Spark’s machine learning capabilities in a hunt for patterns that might betray the presence of intelligent extraterrestrial life.

“With Spark as a Service on Bluemix, we’ll be able to work with IBM to develop promising new ways to analyse signal data as we hunt for evidence of intelligence elsewhere in the cosmos,” said Dr Seth Shostak, senior astronomer and director of the Centre for SETI Research.

Spark is agile, fast and easy to use, IBM said. And because it is open source, it is improved continuously by a worldwide community.

Over the course of the next few months, IBM scientists and engineers will work with the Apache Spark open community to rapidly accelerate access to advanced machine learning capabilities and help drive speed-to-innovation in the development of smart business apps.

By contributing SystemML, IBM will help data scientists iterate faster to address the changing needs of business and to enable a growing ecosystem of app developers to apply deep intelligence into everything, the company said.

For more, click on the video below: