Descriptive Statistics and Inferential Statistics

The role of statistics is important since the beginning of human civilization. Human has used statistics for different purposes such as use of statistics in commercial activities, social and political life, etc. However, In the time of data science and artificial intelligence, the importance of statistics has grown many times. The question is what is statistics? Why has it become so important?

Statistics is all about collecting data from different sources, organizing it, doing mathematical and logical calculations over collected data and presenting insights out of the given or collected data. Data collection, data cleaning, data mining, getting numerical inference, visualizing data, performing several statistical testing over collected data are some of the tasks performed in statistics. In this blog we will learn statistics with examples.

Descriptive statistics and Inferential statistics are two fundamental branches of statistical analysis. Each serves different purposes.

Descriptive Statistics

Descriptive statistics aim to summarize and describe the basic features of a sample dataset, providing an overview of the data’s central tendency, variability, and distribution. A common descriptive statistics involves:

1. Measures of Central Tendency:

i. Mean

There are three measures to find the central location of data. One of them is arithmetic mean. Mean is also popular by name ‘average’. Mean is calculated by summing all the observations and dividing by the total number of observations.

If observations are given like x1, x2, x3, ——————-, xn, here, first observation is x1 and last observation is xn and n is total number of observations.

Mean = (x1+x2+x3+ ——–+xn)/n

Let’s take a example to get better clarity. Some students were asked to share the number of hours they spent on internet in a week. The results are recorded below

12 08 11 10 15 11 09 12 15 16

Ten students have shared the numbers of hours they spent on Internet. Now find what is the mean hour spend by students over Internet.

Mean = Sum of total observation divided by total observation n. Here n is 10.

Mean(X̄ ) =(12+08 +11+10+15+11 +09+12+15+16)/10= 11.9

So, the mean value denotes on an average every student spends 11.9 hours on the internet in a week.

For population data mean is denoted by u(mu) and for the sample data mean is denoted by X̄ (x bar)

ii. Median

Median is second most popular measures of central location. Before calculating median of given data, all the observations are placed into order ascending or descending. Then the observations fall in the middle are considered as median.

Example: – Below is the measurement of height of 11 students (in cm)

143 167 139 138 155 147 167 144 156 162 140

Arrange the above result in ascending order
then

138 139 140 143 144 147 155 156 162 167 167

Height 147 is at the sixth position that is the middle value. Therefore, The median height of the given data of students are 147.

If number of observations are even then we calculate the average of middle two numbers and result is considered as median.

Suppose, I want to calculate the median age of our students. Below is the age of 10 students

13 16 10 13 14 12 11 09 15 07

Arrange the given age into ascending order

07 09 10 11 12 13 13 14 15 16

At 5^th and 6^thposition age are 12 and 13

Average value of 12 and 13 are (12+13)/2=12.5

Therefore, the median age of 10 students are 12.5

iii. Mode
Mode is the third and last measures of central location. In a given dataset, largest frequency of any number is considered as a mode of that given dataset.
For example, below is the marks obtained by students in an exam in computer subject
87 76 86 87 75 87 98 67 87

In the given numbers 87 occurrence is 4 times. Since the frequency of 87 is highest, it is the mode of the given data.

2. Measures of Variability

i. Range

ii. Variance

iii. Standard Deviation

3. Data Visualization

i. Histograms

ii. Bar Charts

iii. Scatter plot

Inferential Statistics

i. Hypothesis Testing

ii. Confidence Intervals

iii. Regression Analysis

iv. Statistical Significance testing