Knowledge Base for Descriptive Statistics
What is Descriptive Statistics?
Raw data by itself is not enough and
needs to be converted to information. The conversion of data to information is
descriptive statistics. It basically summarizes or tells the story of what the
data represents.
What is Frequency Distribution?
These are summarized tables in which
raw data is arranged into classes and frequencies. It also tells the pattern of
distribution. Eg: Histogram
What is Cumulative Frequency Distribution?
Cumulative frequency distribution is
the sum of the class and all classes below it in a frequency distribution. All
that means is you're adding up a value and all of the values that came before
it.
What is Cumulative Distribution Function(or Ogive Curve)?
This function tracks how many
observations are less than or equal to a certain number. This is another way to
categorize what shape of distribution is.
What are the Measures of Central Tendency?
This term is also known as Measures of
Location or Statistical Average. It basically has three parameters namely Mean,
Median and Mode
- Arithmetic
Mean
It is the sum of all values in the data
set divided by total no of observations(sample size). It can be uniquely
defined and requires measurement of all observations.
X= sum of all obs/no of obs
Arithmetic Mean is affected by extreme
values or fluctuations.
- Median(50%
percentile)
It is the middle value of all
observations arranged in ascending order. It does not require measurement of
all observations and cannot be uniquely determined.
If it has an odd number of observations
it takes (n+1)/2 th value and if even takes the average of the two middle
values.
- Mode
It is the value which occurs most
often. It has resistance to outliers. It does not require measurement on all
observations and is not uniquely defined for multi-modal situations.
What are the Measures of Dispersion(Variation)?
This indicates how large the spread of
the distance is around central tendency. This is of three types:-
- Range- Difference between the maximum value and
minimum value(Max-Min). If Max=Min, all observations are equal, 0 range, 0
dispersion.
This gets affected by extreme values.
- Inter-Quartile
Range(IQR)- Difference between bottom 25% (Q3)of the data
and top 25%(Q1) of the data. The data has to be arranged in ascending
order
IQR=Q3-Q1, Q3=Upper Quartile, Q1=Lower Quartile
- Standard
Deviation- It is
the average deviation from the middle of the data
Formula:-Variance=sum of squares/(no of obs-1)
Std Dev= Sqrt Of Variance
What is the Coefficient of Variation?
It is the ratio of Standard Deviation
to Mean. While analysis one should not get carried away by averages. One should
consider variation as well.
The table above shows that though
Average of Sales Person 2 is better, he or she has a higher std deviation hence
coefficient of variation plays an important role in decision making.
What is the Empirical Rule?
This rule approximates the variation of
data in a bell-shaped distribution(normal distribution)
It says that 68% of the data lies
within 1 std dev of the mean
It also says 95% of the data lies
within 2 std dev of the mean
It also says 99.7% of the data lies
within 3 std dev of the mean
What is the Chebyshev Rule?
Chebyshev Rule is used when the data is
not bell shaped.
It says regardless of how data is
distributed at last (1-1/k*k) * 100% of values will fall within k std dev of
the mean(for k>1)
What is the Five Number Summary?
The five number summary includes the
smallest value, the first quartile, median, the third quartile and the largest
value
Here First Quartile, Median and Third
Quartile are measures of location.
What is Skewness of Data?
Skewness is defined as the asymmetry of
the distribution. It will either be right(positively), left(negatively) or
symmetrically skewed.
What is Box and Whisker Plot?
It also gives us the idea of outliers
in the dataset. Beyond 1.5*Inter quartile Range all observations are considered
to be outliers.
Boxplot loses info on mode. Instead it
lets us know if the distribution is negatively or positively skewed.
Quantile Plot is a display of data and
sorted as if they were equally distributed. It shows percentile.
Quantile-Quantile Plot is a display of
data and shows comparatives of different distributions.
What is a Scatter Plot?
Scatter Plot is used for bivariate data
to see clusters of points, outliers etc.
What is Correlation Analysis?
It is the relationship between two
numerical variables. It depicts the behaviour of how when one variable
increases, the second variable changes. This is known as covariance
Comments
Post a Comment