Mar 1, 2011

Why learn R?

I'm introducing R to a few colleagues this week and want to share why learning a software like R is important... Here are a few articles that explain it well... Other reasons?

Importance of data science
- Couple years ago Google's Chief Economist Hal Varian said that the sexy job in the next ten years will be statisticians. Read the full article (requires registration)
The ability to take data - to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it's going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it.
I think statisticians are part of it, but it's just a part. You also want to be able to visualize the data, communicate the data, and utilize it effectively. But I do think those skills - of being able to access, understand, and communicate the insights you get from data analysis - are going to be extremely important. Managers need to be able to access and understand the data themselves.
- Rise of data scientists

- Becoming a data scientist

- Essential skills for a data scientist

Where R fits?
R provides an environment for all tools needed for data science (see the data science process below from Benjamin Fry's thesis).




- R is ideal for small data analysis i.e. data that fits in a computer's RAM e.g. data < 10GB. Whereas SQL and search techniques seem good for larger data sets that can fit in one machine and techniques like Hadoop are good for BIG data sets that cannot fit in one machine.

- NY times article on R you ready for R?

- NY times article on R

- R is becoming popular

6 comments:

  1. I do agree, but it is difficult to communicate the value of R.

    Just yesterday I started to work as business analyst and the main tool is... Excel. Which is used to a point that is not supposed to be used. The main issue for me is that I can't see a script to understand how they get the tables.

    Now, if you are able to offer the power of R with the interactivity and easiness of Excel, you may have a really strong product right there.

    Do you have any ideas how to "humanize" R? I was wondering if using html for reporting. I mean, I need something interactive (aka with excel-like filters) and dynamic (ok, with R dynamic is not a problem).

    Alessio

    ReplyDelete
  2. Look here:
    http://www.knime.org/

    ReplyDelete
  3. Thanks for the post!

    @Alessio, you got a point, it is difficult to communicate teh value of R. Given that most of my graduate studies was based on this programming language and now out on a job market, hard to find job based on R.. they do show up but not so frequent as compared to job posting regards SAS, SPSS and others in this field of predictive analytic.

    Make no mistake, R is the future and the future is already here.. the rate at which R is growing, its popularity is on a rise and vast amount of resources, all free.. because of all these factors, it has already beaten MATLAB in terms of popularity and usability.

    And to the author, please don't forget to mention a great IDE to your colleagues, the beta-released of RStudio among others..

    Thanks for the post again and now, I have something sexy to point at when talking about R ;)

    iThink, iAct!

    ReplyDelete
  4. I believe one of the great things about R is that there is no barrier to entry into the world of statistics and analytics. You don't need a seat license to do some of the most advanced mathematical and statistical analysis in the world. This is far beyond glorified spreadsheets. This is right up there with enterprise ready business analytics and optimization.

    ReplyDelete
  5. Thank you all for sharing your thoughts.

    I will mention RStudio to the IDE lovers in my team (I'm a textpad/textmate and command prompt kind-of-guy :)

    Re: Excel and humanizing R, here's my $0.02...

    Excel's ease of use makes it compelling in the corporate world and we should work together to make R easy to learn and use. I work at a management consulting firm and Blackberry, Powerpoint and Excel are the most popular tools here.

    I don't think R needs to (or does) compete with Excel. Excel has its own place for spreadsheets with:
    - small datasets
    - simple data models
    - analytics with simple to medium complexity

    In fact, I use Excel often and quite effectively for quick analysis.

    R and other tools like Matlab, Stata, SQL are useful when any of the above criteria isn't met... So picking the right tool for the job is key!

    So, if you find yourself struggling with Excel because of large data volume or complex data model needs or complex analytics, it might be the right time to explore alternatives. And R is good option!

    ---------------------------------

    Humanizing R is important (R has a steep learning curve) and all of us should do our bit. Many of us blog, write books, software (plugins, IDEs etc.), compare it with other software, share our learnings at meetups/conferences. The community behind R is fantastic and mostly academically-oriented. May be we need to engage the content and user interface designers to make the curriculum and tools easier to use.

    ReplyDelete
  6. Seems like there's some material to graduate from Excel to R... Check out http://blog.fosstrading.com/2011/03/moving-from-excel-to-r.html

    ReplyDelete