The role of statistics is important since the beginning of human civilization. Human has used statistics for different purposes such as use of statistics in commercial activities, social and political life, etc. However, In the time of data science and artificial intelligence, the importance of statistics has grown many times. The question is what is statistics? Why has it become so important?
Descriptive statistics and Inferential statistics are two fundamental branches of statistical analysis. Each serves different purposes.
Descriptive Statistics
Descriptive statistics aim to summarize and describe the basic features of a sample dataset, providing an overview of the data’s central tendency, variability, and distribution. A common descriptive statistics involves:
- Measures of Central Tendency:
i. Mean
There are three measures to find the central location of data. One of them is arithmetic mean. Mean is also popular by name ‘average’. Mean is calculated by summing all the observations and dividing by the total number of observations.
If observations are given like x1, x2, x3, ——————-, xn, here, first observation is x1 and last observation is xn and n is total number of observations.
Mean = (x1+x2+x3+ ——–+xn)/n
Let’s take a example to get better clarity. Some students were asked to share the number of hours they spent on internet in a week. The results are recorded below
12 08 11 10 15 11 09 12 15 16
Ten students have shared the numbers of hours they spent on Internet. Now find what is the mean hour spend by students over Internet.
Mean = Sum of total observation divided by total observation n. Here n is 10.
Mean(X̄ ) =(12+08 +11+10+15+11 +09+12+15+16)/10= 11.9
So, the mean value denotes on an average every student spends 11.9 hours on the internet in a week.
For population data mean is denoted by u(mu) and for the sample data mean is denoted by X̄ (x bar)
ii. Median
Median is second most popular measures of central location. Before calculating median of given data, all the observations are placed into order ascending or descending. Then the observations fall in the middle are considered as median.
Example: – Below is the measurement of height of 11 students (in cm)
143 167 139 138 155 147 167 144 156 162 140
Arrange the above result in ascending order
then
138 139 140 143 144 147 155 156 162 167 167
Height 147 is at the sixth position that is the middle value. Therefore, The median height of the given data of students are 147.
If number of observations are even then we calculate the average of middle two numbers and result is considered as median.
Suppose, I want to calculate the median age of our students. Below is the age of 10 students
13 16 10 13 14 12 11 09 15 07
Arrange the given age into ascending order
07 09 10 11 12 13 13 14 15 16
At 5th and 6th position age are 12 and 13
Average value of 12 and 13 are (12+13)/2=12.5
Therefore, the median age of 10 students are 12.5
iii. Mode
Mode is the third and last measures of central location. In a given dataset, largest frequency of any number is considered as a mode of that given dataset.
For example, below is the marks obtained by students in an exam in computer subject
87 76 86 87 75 87 98 67 87
In the given numbers 87 occurrence is 4 times. Since the frequency of 87 is highest, it is the mode of the given data.
2. Measures of Variability
i. Range
ii. Variance
iii. Standard Deviation
3. Data Visualization
i. Histograms
ii. Bar Charts
iii. Scatter plot
Inferential Statistics
i. Hypothesis Testing
ii. Confidence Intervals
iii. Regression Analysis
iv. Statistical Significance testing