Understanding One Way ANOVA
In the realm of statistical analysis, ANOVA (Analysis of Variance) is a powerful tool that allows researchers to uncover significant differences between multiple groups or treatments. Whether you're conducting a scientific study, analyzing market research data, or simply curious about statistical methodologies, understanding ANOVA can provide valuable insights into the relationships and patterns within your data.
What is ANOVA?
ANOVA is a statistical technique that compares the means of two or more groups to determine if there are any significant differences among them. It assesses the variability within and between the groups to draw meaningful conclusions about the factors influencing the observed differences. ANOVA is particularly useful when you want to test the null hypothesis that there are no significant differences between the groups.
Types of ANOVA
There are various types of ANOVA, each suited for different scenarios. The most common types include oneway ANOVA, twoway ANOVA, and repeated measures ANOVA. Oneway ANOVA analyzes the effects of a single factor, while twoway ANOVA examines the influence of two factors simultaneously. Repeated measures ANOVA is used when data is collected from the same subjects over multiple time points or conditions.
graph TD
A[ANOVA] > B{One Way ANOVA};
A > C{Two Way ANOVA};
A > D{Repeated Measures ANOVA};
FDistribution
Before starting the ANOVA, we have to learn FDistribution because this statistical tests are based on the F  Distribution. The F distribution is a probability distribution that arises in the context of statistical inference, particularly in analysis of variance (ANOVA) and regression analysis. It is named after the statistician Sir Ronald Fisher, who developed many fundamental concepts in statistics.
This is a continuous probability distribution by using the ChiSquare Distribution \((χ^2)\) and every ChiSquare distribution has one one parameter  degrees of freedom (\(df\)).
Lets assume, we have 2 Chi  Square Distribution \((χ^2_1)\) & \((χ^2_2)\) and their degrees of freedoms are \(df_1\) and \(df_2\) respectively. Then
Because of the ChiSquare distribution takes only nonnegative values, the F distribution doesn't have nonnegative values. It is positively skewed and has different shapes depending on the degrees of freedom associated with it. This is commonly used to test hypothesis about the equality of two variances in different samples or populations. The F distribution is widely used in various fields of research, including psychology, education, economics and the natural and social sciences, for hypothesis testing and model comparison.
The Fstatistics is calculated by dividing the ratio of two sample variances or mean squares from an ANOVA table. This value is then compared to critical values from the Fdistribution to determine statistical significance.
One Way ANOVA
Oneway ANOVA (Analysis of Variance) is a statistical technique that allows researchers to compare the means of three or more groups to determine if there are any significant differences among them. This powerful tool is widely used in various fields, from social sciences and psychology to business and healthcare. Understanding the basics of oneway ANOVA can help unravel hidden patterns and group distinctions within your data.
Oneway ANOVA is a parametric statistical test that examines the variability between multiple groups based on a single independent variable, also known as a factor. The goal is to determine if the observed differences in means across groups are statistically significant or simply due to chance. This type of ANOVA is particularly useful when you want to compare more than two groups simultaneously.
Working principle of Oneway ANOVA
Oneway ANOVA works by decomposing the total variability in the data into two components: the variability between groups and the variability within groups. It then calculates an Fstatistic by comparing the ratio of betweengroup variability to withingroup variability. The Fstatistic follows an F distribution, which is used to assess the statistical significance of the group differences.
Steps to calculate One Way ANOVA
 Define the null (\(H_0\)) and alternative hypothesis (\(H_1\)). The null hypothesis is that all the group means are equal. The \(H_1\) is that at least one group is significantly different from others.
 Calculate the overall mean (grand mean) of all the groups combined and mean of all the groups individually.
 Calculate the "betweengroup" and "withingroup" sum of squares (SS) and their respective degrees of freedom.
 Calculate the "betweengroup" and "withingroup" mean squares (MS) by dividing their respective sum of the squares by their degrees of freedom.
 Calculate the FStatistics by dividing the "betweengroup" mean square by the "withingroup" mean square.
 By using the FStatistics, calculate the PValue.
 By comparing the Pvalue with your threshold (\(\alpha\)), either accept or reject the Null Hypothesis.
This calculation can be represented by The ANOVA Table
Source of variance  Sums of Squares  Degrees of freedom  Mean square  FStatistics  Pvalue 

Between groups  \(SS_{b}\)  k1  \(MS_{b}\)  \(MS_b/MS_w\)  p 
Within groups  \(SS_{w}\)  Nk  \(MS_{w}\)     
Total  \(SS_{t}\)  N1       
Hands on Example
Let's see we have a dataset like the below one. We have 3 groups (A, B and C) and for each group has different numerical values.
A  B  C 

3  1  8 
6  8  6 
3  9  10 
You can assume the "Pclass" and "Age" column from the famous titanic dataset. The "Pclass" has 3 categories e.g., class1, class2 and class3 and the numericals are the "Age" column values. The "Pclass" is called factors and categories/groups are called levels. For every level can have different number of values. For each value of the levels are called individuals.
 Step1: Define the \(H_0\) hypothesis.
Define the \(H_1\). \(H_1 =\) Atleast one mean is significantly different.
 Step2: Now, for every group we have to find the mean. So,
And the grand mean is
 Step3: Now time to calculate the SS (Sum of Squares).
and the degrees of freedom is
where, \(N\) is the total individuals.
Now we will calculate the Sum of Squares Between.
$$ \large SS_{between} = \sum_{j=1}^{levels}(\overline{X_j}  \overline{X})^2*n_j = 24 $$.
Where \(k\) is the number of levels.
We can also calculate the Sum of Squares Within.
You can notice that
 Step4: Now time to calculate the Mean of Squares from the Sum of Squares.
 Step5: Now time to calculate the FStatistics.
 Step6: By using the FStatistics value, we will calculate the PValue. For that you can use the below python code.
import scipy.stats as stats
f_statistic = 1.384 # The Fstatistic value you've calculated
df1 = 2 # Degrees of freedom for the numerator (between groups)
df2 = 6 # Degrees of freedom for the denominator (within groups)
p_value = stats.f.sf(f_statistic, df1, df2)
print("Pvalue:", p_value)
 Step7: The Pvalue is much greater than the significant value (\(\alpha = 0.05\)). So, we can't find enough evidence to reject the null hypothesis. That's why we can say that the groups/levels are taken from the same population.
Assumptions of One Way ANOVA

Independence: The observations within and between groups should be independent of each other. This means that the outcome of one observation should not influence the outcome of another. Independence is typically achieved through random sampling or random assignment of subjects to groups.

Normality: The data within each group should be approximately normally distributed. While oneway ANOVA is considered to be robust to moderate violations of normality, severe deviations may affect the accuracy of the test results. If normality is in doubt, nonparametric alternatives like the ShapiroWilk test can be considered.

Homogeneity of variances: The variances of the populations from which the samples are drawn should be equal, or atleast approximately so. This assumption is known as homoscedasticity. If the variances are substantially different, the accuracy of the test results may be compromised. Levene's test or Bartlett's test can be used to assess the homogeneity of variance. If this assumption is violated, alternative tests such as Welch's ANOVA can be used.
Practical example with code
# import the necessary libraries
import seaborn as sns
import pingouin as pg # pip install pingouin
# load the famous titanic dataset
df = sns.load_dataset('titanic')
# calculate the ANOVA table
anova_table = pg.anova(data=df, dv='age', between='pclass', detailed=True)
print(anova_table)
The result of the above is like this:
Source  SS  DF  MS  F  punc  np2 

pclass  20929.627754  2  10464.813877  57.443484  7.487984e24  0.139107 
Within  129527.008190  711  182.175820  NaN  NaN  NaN 
The punc is the Pvalue of this ANOVA test. You can see that the Pvalue is very less compared to the significant level (\(\alpha = 0.05\)). So we can get enough evidence to reject the Null Hypothesis (\(H_0\)). That's mean atleast one of the category of the pclass is significantly different from the others. But the ANOVA test can't tell which group(s) is/are significantly different from others. To know which group(s) is/are significantly different, we have to do PostHoc test.
PostHoc Test
While ANOVA determines if there are significant differences between groups, it does not reveal the specific pairings of groups that differ. Post hoc tests address this limitation by allowing researchers to make multiple pairwise comparisons. These tests provide a more detailed understanding of the group differences, enabling researchers to draw more precise conclusions. Generally, it has two types:
 Bonferroni correction
 Tukey's HSD (Honestly Significant Differece) Test
Bonferroni correction
In this test, perform all the possible ttest and then compare the result. Or, you can do few number of tests and compare them. This method adjusts the significance level (\(\alpha\)) by dividing it by the number of comparisons being made. It is a conservation The Bonferroni correction is a conservative approach that adjusts the significance level for each comparison to maintain the overall familywise error rate. This method divides the desired alpha level by the number of pairwise comparisons. While effective in controlling Type I error, the Bonferroni correction may be overly stringent and increase the chance of Type II error.
import scipy.stats as stats
for class1, class2 in [(1,2), (2, 3), (3, 1)]:
print(f"Class {class1} vs Class {class2}")
print(stats.ttest_ind(df[df['pclass'] == class1]['age'].dropna(),
df[df['pclass'] == class2]['age'].dropna()))
print()
And the result is like this
Class 1 vs Class 2
Ttest_indResult(statistic=5.485187676773201, pvalue=7.835568991415144e08)
Class 2 vs Class 3
Ttest_indResult(statistic=3.927800191020872, pvalue=9.715078600777852e05)
Class 3 vs Class 1
Ttest_indResult(statistic=10.849122601201033, pvalue=6.134470007830625e25)
If we see the result, then the Class 1 vs Class 2 test and Class 3 vs Clas 1 test results are extremly significant. So, the Class 1 of "pclass" is significantly different from the others. But if we take significant level \(\alpha = \frac{0.05}{3} = 0.0167\) then all the groups are significantly different from each other.
Tukey's HSD Test
Tukey's HSD test is widely used and provides simultaneous confidence intervals for all possible pairwise comparisons. It controls the familywise error rate, making it a popular choice. Tukey's HSD test is suitable for equal and unequal sample sizes.
# import the required libraries
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import matplotlib.pyplot as plt
import pandas as pd
# select our columns and drop the NaN values
required_df = df[['age', 'pclass']].dropna()
tukey = pairwise_tukeyhsd(endog=required_df['age'], groups=required_df['pclass'], alpha=0.05)
tukey.plot_simultaneous()
plt.show()
The result of the above code is
You can also see the summary of this test by running the below code.
pd.DataFrame(tukey.summary())
The result is
Disadvantages of One Way ANOVA
While oneway ANOVA is a useful statistical technique, it also has certain drawbacks that researchers should be aware of. Here are some common drawbacks of oneway ANOVA:
Assumption of Homogeneity of Variances: Oneway ANOVA assumes that the variances across all groups are equal. Violation of this assumption, known as heteroscedasticity, can affect the accuracy and validity of the results. It may lead to inflated or deflated Type I error rates, making it challenging to draw accurate conclusions.
Sensitivity to Outliers: Oneway ANOVA can be sensitive to outliers, particularly when the sample sizes are small. Outliers can significantly impact the mean and variance, potentially influencing the results and leading to incorrect interpretations.
Assumption of Normality: Oneway ANOVA assumes that the data within each group follows a normal distribution. If the assumption is violated, it can affect the validity of the results. However, oneway ANOVA is often robust to violations of normality, especially when the sample sizes are large.
Limited to One Independent Variable: As the name suggests, oneway ANOVA is designed to analyze the effects of a single independent variable or factor on a dependent variable. It is not suitable for investigating the influence of multiple independent variables or complex relationships between variables. In such cases, other statistical techniques like factorial ANOVA or regression analysis may be more appropriate.
Posthoc Interpretation: While oneway ANOVA can identify significant group differences, it does not provide specific information about which groups differ from each other. Posthoc tests are often necessary to make pairwise comparisons, but these tests increase the chances of Type I errors. Multiple posthoc tests also require careful consideration of familywise error rate control methods.
Lack of Causal Inference: Oneway ANOVA examines associations and differences between groups, but it does not establish causality. It only provides evidence for the existence of group differences, not the underlying causes or mechanisms responsible for those differences. Causal relationships may require additional experimental designs or more sophisticated statistical analyses.
Sample Size Considerations: Oneway ANOVA performs better with larger sample sizes. Small sample sizes can limit the statistical power of the analysis, making it difficult to detect significant group differences accurately. It is crucial to ensure an adequate sample size to increase the reliability and generalizability of the findings.
Applications of ANOVA
Hyperparameter tuning: When selecting the best hyperparameters for a machine learning model, oneway ANOVA can be used to compare the performance of models with different hyperparameter settings. By treating each hyperparameter settings as a group, you can perform oneway ANOVA to determine if there are any significant differences in performance across the various settings.
Feature selection: Oneway ANOVA can be used as a univariate feature selection method to idenity features that are significantly associated with the target variable, especially when the target variable is categorical with more than two levels. In this context, the oneway ANOVA is performed for each feature, and features with low pvalues are considered to be more relevant for prediction.
Algorithm comparison: When comparing the performance of different machine learning algorithms, oneway ANOVA can be used to determine if there are any significant differences in their performance metrics (e.g., accuracy, F1 score, etc.) across multiple runs or crossvalidation folds. This can help you decide which algorithm is the most suitable for a specific problem.
Model stability assessment: Oneway ANOVA can be used to assess the stability of a machine learning model by comparing its performance across different random seeds or initializations. If the model's performance varies significantly between different initializations, it may indicate that the model is unstable or highly sensitive to the choice of initial conditions.