🎛️

Category
Statistics 📊
Published on
October 18, 2023
Updated on

# What are Non-Parametric tests?

They are statistical tests that don't assume a specific distribution for the data. They are useful when you can't meet the assumptions for parametric tests, like normality or homogeneity of variances.

## Common Non-Parametric Tests:

1. Mann-Whitney U Test: Compare two independent samples. Equivalent to the independent t-test but for non-normally distributed data.
2. Wilcoxon Signed-Rank Test: Compare two paired samples. Non-parametric alternative to the paired t-test.
3. Kruskal-Wallis H Test: Compare more than two independent samples. Non-parametric version of ANOVA.
4. Spearman's Rank Correlation: Measures the strength and direction of the relationship between two ranked variables. Alternative to Pearson’s correlation.
5. Chi-Square Test: Tests the association between two categorical variables.

# When to use non-parametric tests

## ➕ Pros

• Fewer assumptions, especially with regards to the distributions
• More robust to outliers
• Suitable for ordinal, nominal, and other non-continuous data types.
• Can be more reliable when sample sizes are small

## ➖ Cons

• Generally less powerful than parametric tests
• Can be more computationally intensive
• Provide fewer insights (mean, standard deviation, confidence interval…)

# Mann-Whitney U

The Mann-Whitney U test is a non-parametric test that is used to compare two independent groups. It does not assume that the data is normally distributed. This makes it useful for comparing data that is not normally distributed, such as data with outliers or data with different shapes. In particular, it can be used to compare medians between groups.

Here’s how it works:

1. Rank the observations in both groups from least to greatest. Assign the lowest rank of 1 to the smallest observation, the rank of 2 to the next smallest observation, and so on.
2. Calculate the Mann-Whitney U statistic using the following formula:
1. $U = T_1 - n_1(n_1+1)/2$

where:

2. $T_1$ is the sum of the ranks of the observations in the first group
3. $n_1$ is the number of observations in the first group.
3. Look up the critical value of the smallest U of both groups (given sample sizes and significance level) in a Mann-Whitney U table, to check for significance.

In Python, it can be implemented very easily with Scipy:

from scipy.stats import mannwhitneyu
mannwhitneyu(data_a, data_b)
MannwhitneyuResult(statistic=13651.5, pvalue=0.64364)

# Kolmogorov-Smirnov

The Kolmogorov-Smirnov test (or K-S test) is a nonparametric test of the equality of one-dimensional probability distributions, that can be used to compare a sample with a reference probability distribution (one-sample K-S test), or to compare the distributions of two samples (two-sample K-S test).

Here is how it works step by step:

1. Calculate the empirical distribution function (EDF) for each group.
2. Calculate the maximum absolute difference between the two EDFs. This is the D-statistic.
3. Look up the critical value of D in a Kolmogorov-Smirnov table, to check for significance.

As for Mann-Whitney, the implementation with Scipy is straightforward:

from scipy.stats import ks_2samp
ks_2samp(data_a, data_b)
KstestResult(statistic=0.07889, pvalue=0.65037)