Pivotal Hadoop & Python Map-Reduce Tutorial

First of all, thanks to Michael G. Noll for his blog which can be found here: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python

The following is a slightly modified version of this blog to allow it to run with the Pivotal Hadoop

Step 1.

Download the Pivotal Hadoop VM from here:

http://www.gopivotal.com/big-data/pivotal-hd

Find the downloads button. Download the Pivotal HD Single Node VM. At this point it may ask you to create an account with Pivotal. Continue reading Pivotal Hadoop & Python Map-Reduce Tutorial

Why Australia Has A Big Data Problem

…..and why it is all going to change very soon.

Australia Long Way

Australia is a long way from a lot of things. This creates a unique environment where we tend to alternate between leading and following technology trends. Take cloud and virtualisation for example. Australia has had one of the fastest adoptions of virtualization technology in the world with over 60% of physical servers now virtualized. This is well above of the global average which is somewhere around 30-40%. (Interestingly NZ is upward of 90% virtualized).

The same is true for Cloud. When I benchmark the Australian Telcos and Cloud providers against those of US and European providers there is a definite early adoption of cloud locally. The shift towards new consumption models in Australia has been extremely fast.

I often hear the term from colleagues in the US that we are “the canary in the gold mine” meaning that technology trends tend to evolve early and quickly in our market place. Leaders in the US generally look to Australia to see how IT will evolve for them.

The same is not true however for big data. None of my US or Asian colleagues have ever said to me “tell me how your Australian customers have become so innovative with Big Data and Analytics”. Usually it is me asking them.

The early adopter scenario that exists with Cloud and Virtualization doesn’t extend to Big Data, data analytics and other trends that mean gathering and combining lots of internal and external data sources and doing something smart with them.

Maybe it is that Australian’s tend to be more skeptical, more risk averse or maybe we haven’t had enough exposure to the potential of the technology that we haven’t adopted it yet. Whatever it is, we are definitely lagging.

Why do I say this?

There’s a few common threads that I am seeing.

– Firstly there are plenty of projects going on in Europe, Asia and the US across many different industries and segments. Telco and retail organizations have been particularly fast to adopt new ideas around Big Data.

– our Australian based “Big Data Scientists” are spending more of their time offshore due to skills shortages there, with few projects going on here

– while some big data projects in Australia were implemented by various companies in 2012-2013, it is difficult to find examples in Australia where these have had major impact to revenue or market share

– one Australian organization that I spoke to that was doing some pretty cool stuff around big data was doing their R&D in the US “because that is where the skills are”

Recently I had a conversation with one of our Data Scientists who said that when he does do work here, he usually refers to himself as “a statistician or analyst, because Australian’s are skeptical when they hear the term big data”.

So that is the current situation, but in my opinion this is all about to change. In the US, companies from Walmart to Netflix to Target to ebay (successfully) use Analytics and data mining to drive customer recommendations. This is driving Australian business to rethink their adoption of the technology.

Towards the back half of 2013 we saw Australian retailers start to ramp up their investment and capabilities in big data (e.g. Woolworths investment in Quantium). In the finance sector a number of banks starting outlining their big data strategies, highlighting how better use of customer information and improved risk management will become a competitive weapon. See one cool example of big data project here from UBank: http://www.peoplelikeu.com.au. From a telecommunications perspective, organizations here are now looking at how they can use platforms like Hadoop to reduce call drop outs, improve network quality and raise customer NPS.

While Australia might be behind, one thing I have found is that we are quick to catch up. 2014 will be an interesting year as these strategies unfold and Australian companies either innovate or face increasing competitive pressures from major offshore rivals, many of which will come from “cashed up” social networking organizations who decide to expand into new markets rather than traditional competitors. This will be a catalyst to driving big data strategies and a greater adoption of new technologies that can process larger pools of information with Australian companies.