Skip navigation

Tag Archives: Big Data

Big Data is going to become important.

Big Data is getting a lot of coverage at the moment. Like the Internet before the web, it sounds great but few people can see how they can use it, especially here in Australia. I believe that is going to change quite soon.

Few argue that Big Data does not have great potential for unlocking information from raw data which until recently has been too large to actively maintain. I imagine few existing corporations can see uses for Big Data in their own organisations though. The exception to this is the link between social media and marketing, collecting feedback and targeting specific customer groups. Social media seems to be the poster child of Big Data.
Unless corporations take a step back to look at their data with a view to finding new ways to utilise it or are forced to reconsider their data because of new compliance requirements, they are unlikely to see an immediate need for Big Data solutions.

I have been a follower of Big Data for the past 12 months, I believe that Big Data could be suffering from being sold short. I see Big Data, Hadoop in particular, as a platform for organisations to adopt a more mature approach to data, not as a solution to a single problem. Such a platform makes a single source of truth possible, lends itself to integrating with existing and future systems and adds the potential of Big Data analytics to an organisation.

Cloudera are working hard to package Hadoop and it’s ecosystem in an Enterprise ready distribution. Their upcoming CDH4 release will have improved security and availability, a proper management tool and the pick of the available eco system.

In theory all data processing or storage applications could run on such a platform but the cost (in time) of managing distributed processing could prove too high a barrier to many low-latency applications.

Latency is one of the first arguments for not using Hadoop, closely following by security and ease of use. I think Hadoop is maturing enough to answer these arguments with Enterprise ready solutions.

The initial response to latency concerns is to use an architecture similar to one Oracle would have you buy, an integration of a data warehouse, a NoSQL database and a Hadoop cluster. This combines low-latency responses for known questions and a platform to allow batch processing for new or unknown questions.

I believe Cloudera are also working on adding low latency processing to the robust, reliable platform that is already available for batch processing. This is based on my observation that CDH4 includes support for alternative processors to Map Reduce, Eli Collins interview on InfoQ (http://www.infoq.com/interviews/collins-hadoop) and their reluctance at a recent event to answer direct questions on low latency processing and the increasing community interest in Storm (https://github.com/nathanmarz/storm#readme), a stream processing framework acquired by Twitter.

When Hadoop can support low-latency stream processing as well as batch processing of all data for Enterprise solutions the idea of replacing existing architectures with ones built on top of a Hadoop platform becomes a real option.

Next time you are deciding what architecture to adopt why not ask yourself if a relatively cheap Hadoop platform may be your solution.

Why trust Cloudera?

Cloudera’s offering is at the heart of both Oracle’s Big Data and IBM’s Big Insights solutions. The creator of Hadoop is one of the founders of Cloudera.

Why build applications on Hadoop?

Imagine a bank with all their data in one shared store. No longer is integration of systems required to service an individual application’s data requirements. Applications can focus on integrating data. No longer does each application have to manage its own data storage, archiving and disaster recovery.

The one shared store I refer to is HDFS, a scalable, low-cost, fault tolerant storage system that maintains itself. A storage system that has distributed processing power is the key to Hadoop’s ability to scale.

Melbourne’s Big Data community is active. What’s happening?
Big Data Analytics Conference Melbourne, 28 – 29 Aug 2012
Big Data Analytics MeetUp Group
Cloudera Developer Training for Apache Hadoop Class
Sep 25 in Melbourne

Design a site like this with WordPress.com
Get started