Knowledge Base for Descriptive Statistics

What is Descriptive Statistics?

Raw data by itself is not enough and needs to be converted to information. The conversion of data to information is descriptive statistics. It basically summarizes or tells the story of what the data represents.


What is Frequency Distribution?

These are summarized tables in which raw data is arranged into classes and frequencies. It also tells the pattern of distribution. Eg: Histogram


What is Cumulative Frequency Distribution?

Cumulative frequency distribution is the sum of the class and all classes below it in a frequency distribution. All that means is you're adding up a value and all of the values that came before it.



What is Cumulative Distribution Function(or Ogive Curve)?

This function tracks how many observations are less than or equal to a certain number. This is another way to categorize what shape of distribution is.


What are the Measures of Central Tendency?

This term is also known as Measures of Location or Statistical Average. It basically has three parameters namely Mean, Median and Mode

  • Arithmetic Mean
It is the sum of all values in the data set divided by total no of observations(sample size). It can be uniquely defined and requires measurement of all observations.
X= sum of all obs/no of obs
Arithmetic Mean is affected by extreme values or fluctuations.

  • Median(50% percentile)
It is the middle value of all observations arranged in ascending order. It does not require measurement of all observations and cannot be uniquely determined.
If it has an odd number of observations it takes (n+1)/2 th value and if even takes the average of the two middle values.

  • Mode
It is the value which occurs most often. It has resistance to outliers. It does not require measurement on all observations and is not uniquely defined for multi-modal situations.


What are the Measures of Dispersion(Variation)?

This indicates how large the spread of the distance is around central tendency. This is of three types:-
  • Range- Difference between the maximum value and minimum value(Max-Min). If Max=Min, all observations are equal, 0 range, 0 dispersion. 
This gets affected by extreme values.

  • Inter-Quartile Range(IQR)- Difference between bottom 25% (Q3)of the data and top 25%(Q1) of the data. The data has to be arranged in ascending order
IQR=Q3-Q1, Q3=Upper Quartile, Q1=Lower Quartile

  • Standard Deviation- It is the average deviation from the middle of the data
Formula:-Variance=sum of squares/(no of obs-1)
Std Dev= Sqrt Of Variance

What is the Coefficient of Variation?

It is the ratio of Standard Deviation to Mean. While analysis one should not get carried away by averages. One should consider variation as well.



The table above shows that though Average of Sales Person 2 is better, he or she has a higher std deviation hence coefficient of variation plays an important role in decision making.

What is the Empirical Rule?

This rule approximates the variation of data in a bell-shaped distribution(normal distribution)
It says that 68% of the data lies within 1 std dev of the mean
It also says 95% of the data lies within 2 std dev of the mean
It also says 99.7% of the data lies within 3 std dev of the mean


What is the Chebyshev Rule?

Chebyshev Rule is used when the data is not bell shaped. 
It says regardless of how data is distributed at last (1-1/k*k) * 100% of values will fall within k std dev of the mean(for k>1)


What is the Five Number Summary?

The five number summary includes the smallest value, the first quartile, median, the third quartile and the largest value
Here First Quartile, Median and Third Quartile are measures of location.


What is Skewness of Data?

Skewness is defined as the asymmetry of the distribution. It will either be right(positively), left(negatively) or symmetrically skewed.

What is Box and Whisker Plot?
 Box and Whisker Plot can be vertical or horizontal. It gives us the five number summary data.
It also gives us the idea of outliers in the dataset. Beyond 1.5*Inter quartile Range all observations are considered to be outliers.
Boxplot loses info on mode. Instead it lets us know if the distribution is negatively or positively skewed.

 What is Quantile and Quantile-Quantile Plot(Q-Q)?

Quantile Plot is a display of data and sorted as if they were equally distributed. It shows percentile.
Quantile-Quantile Plot is a display of data and shows comparatives of different distributions.

What is a Scatter Plot?

Scatter Plot is used for bivariate data to see clusters of points, outliers etc.


What is Correlation Analysis?

It is the relationship between two numerical variables. It depicts the behaviour of how when one variable increases, the second variable changes. This is known as covariance










           








Comments

Brands Worked with or Featured On

Brands Worked with or Featured On

Popular Posts