Descriptive Statistics: Concepts for Analysing and Summarizing Data

April 2, 2024

Statistics studies data. A major branch of applied mathematics, statistics is multidisciplinary and permeates nearly every sector & domain. The ubiquity of data and information and the essentiality of data analysis have made stats nigh indispensable. And, with the recent rise of data science, analytics, and AI, the subject has taken centre stage in academia and the job sector.

Table of Contents

There are two primary divisions of the subject, namely, descriptive and inferential statistics. Inferential statistics uses the methods of stat and probability to draw inferences from datasets. Descriptive statistics concerns itself with describing the nature & characteristics of datasets and allows one to explore & make better sense of available data.

This article focuses specifically on descriptive statistics and revises some of its most crucial concepts. Do read on if you wish to strengthen your fundamentals, as this write-up comes to you straight from the statistic assignment help experts of MyAssignmentHelp UK, one of the oldest academic service providers in the United Kingdom.

Let’s get started.

What is Descriptive Statistics?

This is the branch of statistics that’s concerned with describing, summarising, and organising data for better comprehension. Clarity in presentation and communication of what descriptive statistics aims for. Clear descriptions and understanding of data are generally preliminary to further analysis and subsequent inference.

Descriptive statistics can be implemented on both quantitative and qualitative data. It defines certain measures for accurate descriptions of the features, characteristics, and nature of given data. They are à

Measures of Position (also referred to as measures of central tendency or location measures)
Measures of Spread (also referred to as variance, variability, or dispersion measures)
Measures of Shape

The position, spread, and shape of data distribution may change individually or in tandem when change-inducing factors affect the processes generating said data.

Let’s have a look at each of these three measures one by one. But before we do that, are you looking for some urgent assignment help for statistics? Then, do get in touch with the experts at MyAssignmentHelp in UK.

Measures of Central Tendency

Central tendency is a descriptive statistical measure that identifies a specific value that can ideally represent or describe the nature of a distribution. It is the singular value that can be most typical of the collected data. The central tendency of a dataset or distribution is its tendency or nature to cluster around the measured value of central tendency.

Mean, median, and mode are the three measures of central tendency as defined by descriptive statistics.

When presented with any dataset or distribution, one of the most common and convenient ways to describe the nature or summarising of that data is by identifying the centre of the distribution. Mean, median, and mode are three useful measures for finding the centre of a data distribution.

Mean à Also known as the average, it involves adding up all the elements in a distribution or set and then dividing the sum by the total number of elements in the set.

Median à Median is the middle number in a set that’s identified by arranging all elements in a set in a specific order.

Mode à This is the most frequently occurring number.

There may be values in a distribution that lie far from the central tendency measures. Such values or elements are known as outliers. Measures such as the mean are affected heavily by outliers.

Let’s look at an example.

The following is a set of measured weights of passenger luggage in an airline à

18, 23, 20, 21, 24, 23, 20, 20, 15, 19, 24

Mean à (18+23+20+21+24+23+20+20+15+19+24)/11= 20.64

Median à 15 18 19 20 20 20 21 23 23 24 = 20

Mode à 20

As mentioned, central tendency describes the tendency of data or observations to cluster or bunch around a specific value. What measure you will choose to describe the central tendency in a distribution depends on the nature of the distribution, the data, as well as your objectives.

Consider the following example.

Below is a table of the type of cargo carried by a freighter à

Type	Quantity
Metal Alloys	5
PCB & Microcontrollers	4
Carbon Fibre	3
Heavy Machinery	1
Fabric	4

If we are interested in determining the type of cargo that occurs most on the manifest, then cars will be the modal category, and mode will be 5.

If we are only interested in the average quantity of total cargo carried by the freighter, then we will need to calculate the mean. The mean, from the above example, is 17/5 = 3.4

Choosing mean as the measure of central tendency has certain advantages. First up, all observed values in a data set are used to calculate the mean of a distribution, unlike the median, which is concerned with only the middle or middle two values. Secondly, mean values tend to remain stable from one sampling to another.

The downside of using means stems from the influence of outliers on the average value, especially outliers with extreme values.

Measures of Spread

Spread measures how much a data distribution deviates or spreads from a specific position measure. Measures of spread indicate the variability intrinsic to distribution and are a good indicator of quality and overall process variability.

Range, standard deviation, and variance are the three key measures used to determine spread & dispersion.

The range is the difference between the largest and smallest values in a data set or distribution. The simplest measure of variability, it does not make full use of the entire data set. It can provide misleading in case there’s a skewed distribution or there are quite a few outliers of extremely high values.

Standard Deviation is akin to the average of the absolute deviation of the elements in a sample distribution from the mean of the distribution. In simpler words, it is the average distance of the data points/elements from the distribution mean.

Low standard deviations indicate that data points cluster around the mean, while higher values mean they are more scattered around. The value of sample standard deviations is generally different from population standard deviations.

While considered a bit tricky by many, standard deviation is a very useful and robust way to identify dispersion in a dataset. It is calculated as à

S = √ [ ∑ (x – x’)² / (n-1)]

Where x’ is the mean, x is a data point, and n is the total size of the data set.

Variance is calculated as the square of the standard deviation.

If the median is chosen as the measure of central tendency, then quartiles and inter-quartile ranges come into play as useful measures of dispersion or spread. Here’s a handy resource for refreshing your ideas on quartile and interquartile ranges.

We wrap things up with a look at the measure of shape in descriptive statistics.

Measures of Shape

Measures of central tendency & dispersion can describe how data points are clustered or dispersed. Measures of shape describe the very nature of a data distribution. There are two statistical measures for describing the nature or shape of a distribution, namely, skewness and kurtosis.

Skewness

It is a statistical number that indicates whether a distribution is symmetric or not. Symmetric distributions showcase similarities between their right and left portions.

Symmetric distributions have a skewness value of 0. Normal or Gaussian distributions are symmetric, have zero skewness, and their mean, median & mode are equal.

Distributions with skewness greater than zero are right-skewed; that is, their right tail is longer than the left tail. In such cases, mean < median < mode,

Skewness less than zero denote left-skewed distributions, where the left tail is longer than the right tail. In such cases, mode< median< mean.

The formula for calculating the skewness of distribution is given as à

Kurtosis

This measure of shape tells us whether a particular distribution is taller or smaller than a normal distribution. A kurtosis value equal to zero indicates a distribution similar to normal distributions. Kurtosis greater than zero denotes that the shape has a higher peak than a normal distribution. Lower than zero means its peak is lower than that of a normal distribution.

Kurtosis defines three different types of distributions, namely:

Leptokurtic that is sharply peaked with less variability (Kurtosis >0)
- Mesokurtic that have medium peaks (Kurtosis = 0)
- Platykurtic that have very flat peaks and are highly dispersed (Kurtosis <0)

The formula for calculating kurtosis is:

Well, that’s all the space we have for today. Hope this write-up was a good refresher for your descriptive statistics concepts. Practice often to boost your skills & ideas and, if need be, get expert help from a reputed academic service.

Form more information to visit over site : Essential Skills Every College Statistics Student Should Have

Descriptive Statistics: Concepts for Analysing and Summarizing Data

What is Descriptive Statistics?

Measures of Central Tendency

Measures of Spread

Measures of Shape

Shahab Khattak