Skewness
Skewness is a statistical term that refers to the degree of asymmetry or departure from symmetry in the distribution of data. It measures the extent to which a data set deviates from a normal distribution, where a perfectly symmetrical distribution has a skewness of zero.
In simple terms, skewness helps to understand whether the data is leaning to the left or right of the mean.
Types of Skewness:
Positive Skew (Right Skew):
- In a positively skewed distribution, the right tail (larger values) is longer or more spread out than the left tail.
- Most of the data points are concentrated on the left side of the distribution, with a few larger values stretching out the right tail.
- The mean is greater than the median because the larger values pull the mean toward the right.
- Example: The distribution of income in a population where most people earn a lower income, but a few earn exceptionally high salaries.
Negative Skew (Left Skew):
- In a negatively skewed distribution, the left tail (smaller values) is longer or more spread out than the right tail.
- Most of the data points are concentrated on the right side of the distribution, with a few smaller values stretching out the left tail.
- The mean is less than the median because the smaller values pull the mean toward the left.
- Example: Age at retirement in some countries where most people retire around a certain age, but a few retire much earlier.
Zero Skew (Symmetrical Distribution):
- When the skewness is zero, the distribution is symmetrical, meaning the left and right tails are of equal length. A normal distribution is an example of a symmetrical distribution.
- In this case, the mean and median are equal.
How to Calculate Skewness:
There are several methods to calculate skewness, but the most common is Pearson's formula:
Formula for Skewness (using third central moment):
Where:
- is the number of data points.
- is each individual data point.
- is the mean of the data set.
- is the standard deviation of the data set.
Interpreting Skewness:
- Positive skew: Skewness > 0
- Negative skew: Skewness < 0
- No skew (symmetrical distribution): Skewness ≈ 0
- Generally, skewness values between -0.5 and +0.5 are considered acceptable for a fairly symmetrical distribution.
Importance of Skewness:
Identifying Data Distribution: Skewness helps in understanding whether a data set follows a normal distribution or if it's skewed. This is important when choosing the right statistical tools or tests. For example, many statistical techniques assume data to be normally distributed.
Decision Making: In finance, skewness helps investors understand the distribution of returns on investments. A positively skewed distribution could indicate higher potential for large profits, but also more risk, while a negatively skewed distribution could suggest the opposite.
Adjustments for Skewed Data: If a data set is highly skewed, transformations like logarithmic transformations may be applied to make the distribution more symmetrical, making it easier to apply various statistical tests.
Example:
Consider the following data set:
- 3, 5, 7, 8, 9, 9, 10, 10, 12, 15, 50
This data is positively skewed because the value "50" is much higher than the rest of the data points, pulling the mean to the right.
Conclusion:
Skewness is a valuable measure in statistics that helps in identifying the asymmetry of a data distribution. Understanding skewness can aid in selecting the right statistical tools and transformations, ensuring that the data analysis is accurate and reliable.
Comments
Post a Comment