Apr 19, 2010

Calling MapReduce/Hadoop experts

After experimenting with MapReduce (Hadoop, Pig) last year, we recently ran some tests to check if its worth pursuing further for large data analytics.

Test details:
- Environment: We ran these tests on Amazon's cloud (quick, cheap, no hassles :-)
- Test data: 500 Million and 1 Billion rows of simple observations (2 column data - Customer_ID and Amount)
- Computation: Simple (group_by and summation)


Here's what we found:
1) Hadoop scales well: Even when we doubled the data volume, processing time did not increase proportionately (notice the vertical distance between the curves)
2) MapReduce gains flatten out after a certain point: Beyond a certain # nodes, there are no more savings in computation time (notice the flattening of the curves). Scaling up infinitely won't make drop computation time to seconds :-|
3) Pay more to save more time: Processing time is a factor of # nodes. We can easily decide how much we're willing to pay based on the time savings (RoI) example, a high priced expert might chose to pay more to save more time, compared to what an analyst chooses.


Now that Hadoop has passed the "sniff" test, we plan to run a real-life computation on it. I'm also looking for experts to drive this forward now. If there's anyone interested, please leave a comment.

We need to keep Amdahl's law in mind to estimate the max. savings expected from parallelization.

Note: I'm also interested in running R on Mapreduce sometime in the future

3 comments:

  1. Yahoo has done extensive work on Hadoop. Maybe you should check them below

    http://developer.yahoo.net/blogs/hadoop/

    http://developer.yahoo.com/hadoop/

    ReplyDelete
  2. This has actually been so much informative and wonderful blog from the many perspective, now I'v been intended to see it through more successful ways. Visit http://great-college-paper.com/ for best Paper

    ReplyDelete
  3. It's in point of fact a nice and useful piece of information. I'm satisfied that you simply shared this helpful information with us. Please keep us up to date like this. Thank you for sharing. Check business phone calls for best Calls.

    ReplyDelete