Descriptive Statistics and Inferential Statistics


The role of statistics is important since the beginning of human civilization. Human has used statistics for different purposes such as use of statistics in commercial activities, social and political life, etc. However, In the time of data science and artificial intelligence, the importance of statistics has grown many times. The question is what is statistics? Why has it become so important? 

Statistics is all about collecting data from different sources, organizing it, doing mathematical and logical calculations over collected data and presenting insights out of the given or collected data. Data collection, data cleaning, data mining, getting numerical inference, visualizing data, performing several statistical testing over collected data are some of the tasks performed in statistics. In this blog we will learn statistics with examples. 

Descriptive statistics and Inferential statistics are two fundamental branches of statistical analysis. Each serves different purposes.

Descriptive Statistics

Descriptive statistics aim to summarize and describe the basic features of a sample dataset, providing an overview of the data’s central tendency, variability, and distribution. A common descriptive statistics involves:

      1. Measures of Central Tendency:

    i. Mean

    There are three measures to find the central location of data. One of them is arithmetic mean. Mean is also popular by name ‘average’. Mean is calculated by summing all the observations and dividing by the total number of observations.

    If observations are given like x1, x2, x3, ——————-, xn, here, first observation is x1 and last observation is xn and n is total number of observations.

    Mean = (x1+x2+x3+ ——–+xn)/n

    Let’s take a example to get better clarity. Some students were asked to share the number of hours they spent on internet in a week. The results are recorded below

    12  08   11   10     15      11      09      12      15     16

    Ten students have shared the numbers of hours they spent on Internet. Now find what is the mean hour spend by students over Internet.

    Mean = Sum of total observation divided by total observation n. Here n is 10.

    Mean( ) =(12+08 +11+10+15+11 +09+12+15+16)/10= 11.9

    So, the mean value denotes on an average every student spends 11.9 hours on the internet in a week.

    For population data mean is denoted by u(mu) and for the sample data mean is denoted by   (x bar)


    ii. Median

    Median is second most popular measures of central location. Before calculating median of given data, all the observations are placed into order ascending or descending. Then the observations fall in the middle are considered as median.

    Example: – Below is the measurement of height of 11 students (in cm)

    143   167 139     138   155    147    167   144 156   162   140

     Arrange the above result in ascending order
    then

      138 139  140  143 144  147   155 156   162   167  167


    Height 147 is at the sixth position that is the middle value. Therefore, The median height of the given data of students are 147.

    If number of observations are even then we calculate the average of middle two numbers and result is considered as median.

    Suppose, I want to calculate the median age of our students. Below is the age of 10 students

    13   16    10      13      14      12    11   09   15    07

    Arrange the given age into ascending order

    07    09  10   11   12 13   13   14  15   16


    At 5th and 6th position age are 12 and 13

    Average value of 12 and 13 are (12+13)/2=12.5

    Therefore, the median age of 10 students are 12.5

    iii. Mode
    Mode is the third and last measures of central location. In a given dataset, largest frequency of any number is considered as a mode of that given dataset.
    For example, below is the marks obtained by students in an exam in computer subject
    87    76     86     87      75     87     98       67     87

    In the given numbers 87 occurrence is 4 times. Since the frequency of 87 is highest, it is the mode of the given data.
     


    2. Measures of Variability

    i. Range

    ii. Variance

    iii. Standard Deviation

    3. Data Visualization

    i. Histograms

    ii. Bar Charts

    iii. Scatter plot

    Inferential Statistics

    i. Hypothesis Testing

    ii. Confidence Intervals

    iii. Regression Analysis

    iv. Statistical Significance testing

    ×