Skewness

 Skewness is a statistical term that refers to the degree of asymmetry or departure from symmetry in the distribution of data. It measures the extent to which a data set deviates from a normal distribution, where a perfectly symmetrical distribution has a skewness of zero.

In simple terms, skewness helps to understand whether the data is leaning to the left or right of the mean.

Types of Skewness:

  1. Positive Skew (Right Skew):

    • In a positively skewed distribution, the right tail (larger values) is longer or more spread out than the left tail.
    • Most of the data points are concentrated on the left side of the distribution, with a few larger values stretching out the right tail.
    • The mean is greater than the median because the larger values pull the mean toward the right.
    • Example: The distribution of income in a population where most people earn a lower income, but a few earn exceptionally high salaries.
  2. Negative Skew (Left Skew):

    • In a negatively skewed distribution, the left tail (smaller values) is longer or more spread out than the right tail.
    • Most of the data points are concentrated on the right side of the distribution, with a few smaller values stretching out the left tail.
    • The mean is less than the median because the smaller values pull the mean toward the left.
    • Example: Age at retirement in some countries where most people retire around a certain age, but a few retire much earlier.
  3. Zero Skew (Symmetrical Distribution):

    • When the skewness is zero, the distribution is symmetrical, meaning the left and right tails are of equal length. A normal distribution is an example of a symmetrical distribution.
    • In this case, the mean and median are equal.

How to Calculate Skewness:

There are several methods to calculate skewness, but the most common is Pearson's formula:

  1. Formula for Skewness (using third central moment):

    Skewness=n(n1)(n2)×(XiXˉ)3σ3\text{Skewness} = \frac{n}{(n-1)(n-2)} \times \frac{\sum (X_i - \bar{X})^3}{\sigma^3}

    Where:

    • nn is the number of data points.
    • XiX_i is each individual data point.
    • Xˉ\bar{X} is the mean of the data set.
    • σ\sigma is the standard deviation of the data set.
  2. Interpreting Skewness:

    • Positive skew: Skewness > 0
    • Negative skew: Skewness < 0
    • No skew (symmetrical distribution): Skewness ≈ 0
    • Generally, skewness values between -0.5 and +0.5 are considered acceptable for a fairly symmetrical distribution.

Importance of Skewness:

  1. Identifying Data Distribution: Skewness helps in understanding whether a data set follows a normal distribution or if it's skewed. This is important when choosing the right statistical tools or tests. For example, many statistical techniques assume data to be normally distributed.

  2. Decision Making: In finance, skewness helps investors understand the distribution of returns on investments. A positively skewed distribution could indicate higher potential for large profits, but also more risk, while a negatively skewed distribution could suggest the opposite.

  3. Adjustments for Skewed Data: If a data set is highly skewed, transformations like logarithmic transformations may be applied to make the distribution more symmetrical, making it easier to apply various statistical tests.

Example:

Consider the following data set:

  • 3, 5, 7, 8, 9, 9, 10, 10, 12, 15, 50

This data is positively skewed because the value "50" is much higher than the rest of the data points, pulling the mean to the right.

Conclusion:

Skewness is a valuable measure in statistics that helps in identifying the asymmetry of a data distribution. Understanding skewness can aid in selecting the right statistical tools and transformations, ensuring that the data analysis is accurate and reliable.

show graphically
ChatGPT said:
ChatGPT
Output image

Here are the graphical representations of three types of distributions:

  1. Positive Skew (Right Skew): The distribution has a longer right tail, with most data points concentrated on the left side. The mean is greater than the median.

  2. Negative Skew (Left Skew): The distribution has a longer left tail, with most data points concentrated on the right side. The mean is less than the median.

  3. Normal Distribution (Zero Skew): This distribution is symmetrical with equal tails on both sides, and the mean and median are equal.

These graphs help visualize the skewness in data. 

Comments

Popular posts from this blog

Two-Step System GMM (Generalized Method of Moments)

Shodhganaga vs Shodhgangotri

Panel Stationarity Tests: CADF and CIPS Explained