May 30, 2011

May 26, 2011

Cricket fever again


England take on Sri Lanka in a test series starting today. 

After watching Cricket world cup 2011 and witnessing India play in quarter final, semi final and the finale in stadiums all over India, I had little appetite for any of the IPL 2011 matches. 

But now, I'm excited again. This test series will be a thrilling contest considering the two teams are more or less equal:
  • England is at #3 and Sri Lanka are at #4 in ICC test ranking
  • England plays well against Sri Lanka at home, and Sri Lanka has an edge in their home conditions. And the overall record is well balanced





Sri Lanka has won 2 of the last 3 test series against England with 1 draw. Clearly, they have a better track record. However, it is difficult to predict the winner as England has the home advantage and the momentum (Ashes tour and other tests in last 18 months). Sri Lanka has a new captain in Dilshan who will be under pressure. England will be under pressure too with the recent emergence of English cricket and the desire to prove their test cricket status. 



Let the games begin!

May 20, 2011

10 reasons why you should learn R


10. Can't crack that hard Sudoku problem?? Use R!


9. Want to pick a skill that will give you an early adopter advantage?? Learn R! It is the leading open source statistical and data analysis programming language, and is heating up! 


8. Need to run statistical calculations in your software application?? Deploy R! It integrates with many programming languages like Java, Ruby, C++, Python


7. Looking for reusable libraries to solve a complex problem?? Get R! It has 2000+ free libraries to use in areas of finance, natural language processing, cluster analysis, optimization, prediction, high performance computing etc. 


6No Windows, No Doors - R runs on all the platforms. Just name it and you got it!! Windows PC, Mac, Linux to name a few


5Did you know how much fun stats can be- Try R!!


4Are you updated with the current trends?? Leading firms like NY Times, Google, Facebook, Bank of America, Pfizer, Merck are all using R, where are you??


3. Need to run your own analysis?? Need to solve an optimization problem?? Struggling with Excel or SQL in your model??..... just few statements away - Try R!! 


2. Want to create a compelling chart?? Try R! 


1. Want the coolest job in 2014?? Learn Statistics. It is the future. Data Scientists will be the sexy job in 2018

May 18, 2011

Vehicle Routing Problem

This is a follow-up to a previous question on VRP. I investigated R libraries and several other options to solve VRP and decided to build a custom desktop application using open source libraries from COIN-OR. Screenshots attached below.







Leave a comment if you're interested. I will contact you directly.


Team: Prasoon, Khaled, James

May 2, 2011

Introducing R in the Enterprise



We've introduced R in the organization!

It is running along with the heavy weights of statistical analysis like SAS, SPSS, Matlab. Here's what we did and how we did it...




HOW DID IT START?
I started learning R last year and loved its simplicity and power. After using it primarily for personal projects, I came across a business problem in which R can be considered a good fit.

BUSINESS PROBLEM
The business need was to build a web-based tool for marketing budget optimization  - Marketing RoI (Return on Investments) i.e. how should a company that has multiple advertisement channels allocate its marketing budget across multiple channels to maximize profit or customer loyalty or customer life time value (LTV).

1) Input: The input to the analysis is the company's historical marketing budget allocation, profit, customer loyalty and LTV. 

2) Analysis: This analysis is done in 2 steps.
- Step 1) Our experts create a formula that relates the inputs given with RoI and LTV etc. It involves econometric techniques etc.
- Step 2) Optimization of the formula when the user conducts what-if analysis by varying total budget and/or spend across individual channels to see its effect on RoI and LTV. The desktop optimization model written in Excel using a commercial Excel plugin.

3) Output: Optimized spend across advertising channels and ability to evaluate multiple scenarios to determine optimum marketing mix

The initial version of the tool runs as an Excel model using a commercial Excel plugin. The business objective was to transform this Excel-based single-user application into multi-user web-based application.



TECHNICAL SOLUTION



A) Web application: The web forms needed to allow users to input data and run scenarios were simple. We develop web applications using Ruby on Rails on LAMP internally. Ruby on Rails gives us an agile environment to develop software by taking care of routine web application tasks like database connectivity. 

B) Optimization: Since, the Excel model uses a commercial plugin for step 2, the stakeholders started with the hypothesis of using the same commercial plugin's server version for optimization in the web application too.

For this we had to prove a couple of things:
1) Optimization of formula from step 1
2) Integration with web application

Option 1: Commercial optimization engine
We did a quick spike to test optimization with the commercial optimization plugin's server version and also its integration with Ruby on Rails web application and it was successful. We had to use JRuby to integrate Ruby with plugin's server edition as it provides only Java and .NET API.

Option 2: R (Open source)
In parallel, we checked if R can be used. R is a leading open source statistical environment.
- To solve the optimization problem in R we found a lot of R optimization packages and started testing packages like BB as the formula (from step 1) was non-linear, and had constraints and conditions. We tested BB's SPG function and also tried other generic algorithms. We got good optimization results from R (similar or better compared to commercial optimization engine).
- Now we had to check how to integrate R with our web application written in Ruby. We found a number of options like integrating R with Apache (rApache) or integrating R directly with Ruby (rsruby). We decided to use rsruby.

We ran a number of proof of concepts with R and shared results with stakeholders. The results were positive in terms of performance as well as the optimized results... So we got better results and that too for free! 



LESSONS LEARNED
Technical
  1. You need to be careful in running it in a shared environment, where it can use all your CPU and memory if it runs for long
  2. Don't forget to write unit tests using RUnit for your R code
  3. Capturing exceptions from R and dealing with them properly (appropriate message to users)
  4. rsruby installation documentation is good but needs a few tries depending on your Linux distribution
  5. rsruby does not run on Windows (wasn't a problem for us as we run our web applications on LAMP)
Process
  1. User acceptance testing: If you are transforming an Excel-based model into web-version, it is critical to have a fully working example of the Excel model to replicate it in R/other statistical packages
  2. Overcoming the challenges of using new open source software in enterprise: Like most enterprise IT shops, we are used to commercial software as well and the idea of using open source software to do serious work is limited to the most popular open source frameworks like Drupal, Ruby on Rails, Linux. We positioned R as an add-on to our LAMP environment and got a separate virtual server dedicated to it as it is memory hungry.