Mar 20, 2012

Data Analysis Training

I'm training some of my colleagues on Big'ish data analysis this week. Here's how I'm running the class. Would love your ideas to make it better.


After completion of the course, you will be able to:
  • Understand concepts of data science, related processes, tools, techniques and path to building expertise
  • Use Unix command line tools for file processing (awk, sort, paste, join, gunzip, gzip)
  • Use Excel to do basic analysis and plots
  • Write and understand R code (data structures, functions, packages, etc.)
  • Explore a new dataset with ease (visualize it, summarize it, slice/dice it, answer questions related to dataset)
  • Plot charts on a dataset using R


  • Good knowledge of basic statistics (min, max, avg, sd, variance, factors, quantiles/deciles, etc.)
  • Familiarity with Unix OS


A) Intro to data science
  • Explain data science and its importance. Data-driven business functions e.g. MROI, mix optimization, IPL teams / fantasy teams, predictions
  • Big data
    - Definition: Data sets that no longer fit on a disk, requiring compute clusters and respective software and algorithms (map/reduce running on Hadoop).
    - Real big data problems: parallel computing, distributed computing, cloud, hadoop, casandra
    - Most analysis isn't Big Data. Business apps often deal with datasets that fit in Excel/Access
  • Products: Desktop tools (Excel (solver, what if), Access, SQL, spss, stata, R, sas, programming languages (ruby, python, java) -- stats libs in these languages, BI tools, etc.

B) Steps in data science
  1. Acquire data: "obtaining the data"... databases, log files... exports, surveys, web scraping etc.
  2. Verify data
  3. Cleanse and transform data: outliers, missing values, dedupe, merge
  4. Explore data: The first step when dealing with a new data set needs to be exploratory in nature: what actually is in the data set? Summarize, Visually inspect entire data
    - What does the data look like? summaries, cross-tabulation
    - What does knowing one thing tell me about another? Relationships between data elements
    - What the heck is going on?
  5. Visualize data
  6. Interact with data (not covered here): BI tools, custom dashboards, other tools (ggobi etc.)
  7. Archive data (not covered here)

C) Skills needed for data science
  • Statistics: Concepts, approach, techniques
  • Databasing: SQL
  • Scripting language: Ruby, Python
  • RegEx
  • Visual design: Story telling with charts
  • File handling: Unix preferred. awk, gzip, gunzip, paste, sort etc.
  • Office tools: Excel (plugins like Solver, What If)
  • Statistical tools: R, SAS, SPSS, Stata, MATLAB, etc.
  • BI tools: Qlikview, Tableau

D) Learning R

We will pick a tool to learn the concepts of data science. We will use R, a leading open source stats package. Why I started learning data science and picked R

Curriculum for Intro to R (R has steep learning curve. Purpose of this discussion is to get you started)

E) Where to go from here?
  • Learn adv techniques: sampling, predictions. Books, Conferences
  • Analyse your favorite dataset: e.g. Cricket data analysis
  • Compete (kaggle)
  • Learn other tools (Excel Solver, SAS etc.)



  • TBD


  1. This looks great...wish I could take the class!

  2. Very good summary,

    One minor comment: after B.4) do you plan to give an intro intro basic modeling techniques (like regression etc. using R)?

  3. Thanks Erin and Daniel. I decided to keep clustering, decision trees and regressions for the next class... Stay tuned...

  4. Question for you - what BI tools will you cover?

  5. @mike, we discussed mid-tier BI tools like Qlikview and Tableau in passing (no details, just introduced the area). Briefly discussed enterprise-grade tools like Cognos and Business Objects too...

  6. Hi prasoonsharma, do you plan to take on students online? I'd love to join your class and learn more about R. Would you? (:

  7. Hi Prasoon,

    I do not have the privilege to attend your class...just that I have a fairly good system and my father
    is a mathematician...request you to tell me how can I learn big data...I want to be an expert in two


  8. Hi Sujeet, perhaps the best way to start is to sign-up for Stats 202 class. Stanford teaches Stats using Intro to Data Mining book and R software. Check it out

    Buy the book on Amazon:
    Read more about the book on Kumar's page:

  9. The number of points and ideas given here are quite considerable and up to the level to my considerations, hopefully this cold proved to be great guide, if further information could be taken from the same platform.

  10. Thanks for your article. I would also like to say this that the first thing you will need to complete is find out if you really need credit score improvement. To do that you need to get your hands on a copy of your credit history. credit card processor

  11. I am really impressed with your writing skills as well as with the layout on your blog. Is this a paid theme or did you modify it yourself? Anyway keep up the nice quality writing, it is rare to see a nice blog like this one these days. Check pre filled party bags for best Party Bags.

  12. I feel So happy to read this post. This is full of information and i have bookmark this webpage and will come back again in future. Kindly keep post more amazing posts like this. Thanks

  13. When thinking of market research, surveys are most likely the first technique that comes to one's mind. However, surveys are a quantitative research and, in order to understand customer behavior and the social and cultural context in which our business will operate, we will need to perform some qualitative research as well. See more quantitative data analysis

  14. For training your blog and also that information you post here is so good and really relevant for me. I am working on this field. When i am thinking about the market research,then that is good 70-410 this site is work for networking field. I am also part of networking field,and no idea how i get it.Anyway thanks for your blog.

  15. Daviedson MacFebruary 15, 2016

    I want to personally appreciate your post. This is really amazing and great. I like the way of your writing and your thoughts. Kindly keep posting more amazing posts. Visit Hosting Valley for the great hosting services in UK & USA.

  16. Data analysis training is incredibly important for a new member of any organization. Fortunately for us, we have a company that provides us with an excellent predictive analytics solutions. As a result, the data analysts in our organization have most of their job completed before they even touch the output. I'd definitely suggest checking out Modern Analytics if this is something you're interested in learning more about.

    - Max

  17. Students can analysis of quantitative data with perfect combination of with a new data set needs to be exploratory in nature: what actually is in the data set? Summarize, Visually inspect entire data.

  18. College study data saving is not easy now this article give me good idea i really like it thanks for share it check sentence grammar .

  19. Visualizing data with graphics can be more precise and revealing than conventional statistics. If you do not use statistical graphics, then you forfeit a deeper understanding of a dataset's structure.

  20. Hi Admin,
    Your post on excel advanced training is really useful. I agree with your thoughts, knowledge on excel is mandatory for everyone. Excel Training in Chennai | Advanced Excel Training in Chennai


  21. Excellant content thanks for sharing the unique information and keep posting. SAS is an analytical tool which is created by SAS system for the data storage and analytical purpose.
    SAS Training in Chennai | SAS Course in Chennai

  22. Thank you for taking the time to provide us with your valuable information. We strive to provide our candidates with excellent care and we take your comments to heart.As always, we appreciate your confidence and trust in us.

    SAP training in Chennai

  23. Interesting post. I have heard about this and I want to find more resources related on this issue. Thank you for posting.

  24. Love data analysis and the related topics. I believe that this trend will have even more impact on the contemporary world.I was working for a while with WP design and development. The thing I realized is that there are really tons of WP plugins and themes. It is interesting what are general tendencies in WP oriented design. Suppose there are several big groups of the design solution and lots of small variations. As for me, this means that it is especially important to build you own unique appearance if you are building a business online (but still concerning the general tendencies and customs- if you don't want to be "fashion guide"). Usually I use theme building services for this purpose.

  25. • Great! Thanks for sharing the information.
    tib co training in chennai

  26. • Nice information in the post....Keep on sharing..
    microstrategy training in chennai

  27. Thank you for taking time to provide us some of the useful and exclusive information with us.
    SAS Training Institute in Chennai | SAS Training Chennai | SAS Courses in Chennai

  28. Updating with the latest technology and implementing it is the only way to survive in our niche. Thanks for making me this article. You have done a great job by sharing this content in here. Keep writing article like this.
    SAS Training in Chennai | SAS Training Institute in Chennai

  29. Thanks for sharing the information about the SaS and keep updating us.You have done good job by updating the content.
    SAS Training in Chennai
    SAS Training Center in Chennai

  30. We prefer to honor several other net sites around the internet, even when they arent linked to us, by linking to them. Below are some webpages worth checking out.

    always a huge fan of linking to bloggers that I love but dont get a whole lot of link appreciate from :)

    Agen Poker Online Indonesia Terbaik Terpercaya Terbesar 2017

  31. Wonderful blog.. Thanks for sharing informative blog.. its very useful to me..

    iOS Training in Chennai

  32. This is my first visit to your blog, your post made productive reading, thank you. dot net training in chennai

  33. The best thing is that your blog really informative thanks for your great information!
    erp providers in chennai

  34. When someone writes an paragraph he/she retains the thought of
    a user in his/her brain that how a user can understand it.

    Thus that’s why this post is outstdanding. Thanks!