Tuesday, 5 June 2012

Data exploration


An analyst has to deal with lots of data. An analyst will never start state a way the process of analysis on the given heaps of data. First of all the data needs to be understood and need to find out what are the fields given. If the data is in bad shape, it has to be aligned in a way in which it can be used for further steps. Once the data is arranged and then it needs to be segregated on the basis of bad data and good data. This step is very critical as this will remove the noise from the set provided. The above process will only make a difficult dataset into manageable dataset but what we want is to do the analysis on it, we need to find out the actionable insight for which everybody is struggling with data.
In the last session by Sarita digumarti who is Co-founder at Jigsaw Academy explained us the process of data exploration. I always feel excited to see data and I crave to understand what this data tells me, but always wondered where I should begin. This problem was solved in the last session when sarita explained that after cleaning the dataset, making a summary of the given dataset will solve most of the problem which we face while working with large chunk of data. This summary will include finding out the definition of each and every alien term which we can find out in the dataset.
I was very excited and I actually used it when I saw a dataset of an industry which is nowhere related to my work, the kind of industry to which I never even gave such importance so never tried to know anything about that. I started with cleaning the dataset. Arranged in the way it makes me comfortable, tried finding out the definition of the terms used (got to know some new terms), Then started making the summary sheet. In my summary sheet I started with
1.       Number of observations for each field
2.       Missing number
3.       Mean
4.       Maximum
5.       Minimum
6.       Standard deviation
Trust me all those above gave me brief understanding of what actually happens in Horse racing industry and how much effort goes after each and every race. Another fact I got to know that if you are lucky you can make fortune in this and if not you will lose everything.

Monday, 28 May 2012

The world of R

Being an analyst, I always wanted to do the kind of analysis which will help in taking decisions which are really very crucial and easily understandable but always thought how to deal with huge data. Thought of trying some statistical tool and here I go. These days I am going through training for R programming language for statisticians provided by Jigsaw Academy.

R is a programming language which is used in statistical computing and graphics. I was scared when I got to know that I have to do programming to access this tool efficiently but it’s not at all complicated, very user friendly and it has help for each and every package and function which can be used for the calculations or for any kind of analysis.

On the first session we got to know how can we install different packages and then use them. One thing which really amazed me was that, R provides lot many options of creating different kind of visually appealing graphs. I, being an analyst always look for a graph which will help me in reaching to a conclusion and this feature of R, is making me explore more of it and apply on the kind of data I come across every day at corporate front.