Comparing GPAs of Days of the Week at UW Madison since 2006¶

This notebook performs a SQL query on the UW Madison database and does a hypothesis test on each day's GPA compared against all of the other days' GPAs

For each day:

$H_0$ = There is no difference between this day's GPAs and the other days' GPAs

$H_A$ = There is a difference between this day's GPAs and the other days' GPAs

Import necessary Python libraries and modules¶

from sqlalchemy import create_engine
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from statsmodels.stats.multitest import multipletests

Query the database to create one overall Pandas dataframe which is then filtered to remove any classes where the GPA is 4.0¶

engine = create_engine('postgresql:///uw_madison')

query = """SELECT * FROM days_gpas;"""

all_days_df = pd.read_sql(query, engine)
all_days_df.describe()

all_days_df.head()

all_days_df = all_days_df.loc[all_days_df['section_gpa'] != 4.0]
all_days_df.describe()

The overall dataframe is then used to make dataframes for each day and for all other days. E.g. a Monday dataframe with a dataframe for all days besides Monday. Then randomly pick 70 from each dataframe to make plots and conduct hypothesis testing on.¶

days = ['mon', 'tues', 'wed', 'thurs', 'fri', 'sat', 'sun']

gpas = []
p_vals = {}
one_d_p_vals = []

for day in days:
    day_df = all_days_df.loc[all_days_df[str(day)] == 'true', 'section_gpa']
    day_choice = np.random.choice(day_df, size=70, replace=False)
    
    not_day_df = all_days_df.loc[all_days_df[str(day)] != 'true', 'section_gpa']
    not_day_choice = np.random.choice(not_day_df, size=70, replace=False)
    gpas.append({"day": day_choice, "not_day": not_day_choice})
    p_vals[str(day)] = stats.ttest_ind(day_choice, not_day_choice, equal_var=False)
    one_d_p_vals.append(stats.ttest_ind(day_choice, not_day_choice, equal_var=False)[1])

Generate plots comparing each day against all other days¶

full_days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

fig, ax = plt.subplots(nrows=7, ncols=1, figsize=(15,20))

counter = 0

for row in range(ax.shape[0]):
    ax[row].hist(gpas[counter]["day"], alpha=0.5, bins=25, range=(2.5, 4.0), label=f'{full_days[counter]} GPAs')
    ax[row].axvline(gpas[counter]["day"].mean(), color='#1f77b4', alpha=1, linestyle='dashed',label=f'{full_days[counter]} Mean')
    ax[row].hist(gpas[counter]["not_day"], color='#ff7f0e', alpha=0.5, bins=25, range=(2.5, 4.0), label=f'GPAs from Days Other than {full_days[counter]}')
    ax[row].axvline(gpas[counter]["not_day"].mean(), color='#ff7f0e', alpha=1, linestyle='dashed',label=f'Days Other than {full_days[counter]} Mean')
    ax[row].legend()
    ax[row].set_xlabel('GPA');
    counter += 1
    
fig.tight_layout()

Display the T statistic and p-values for each day against all other days¶

p_vals

{'mon': Ttest_indResult(statistic=0.3861586905726627, pvalue=0.6999807413618497),
 'tues': Ttest_indResult(statistic=1.158900732637778, pvalue=0.24868242363225904),
 'wed': Ttest_indResult(statistic=0.1702067405316272, pvalue=0.865101860131468),
 'thurs': Ttest_indResult(statistic=-0.7744049316301642, pvalue=0.4400190061996502),
 'fri': Ttest_indResult(statistic=-2.8518213298163033, pvalue=0.0050304608883447715),
 'sat': Ttest_indResult(statistic=4.211019878637128, pvalue=4.8332157044897395e-05),
 'sun': Ttest_indResult(statistic=5.624870922850782, pvalue=1.1384256658117605e-07)}

We can see from the plots and the T statistics of the random samples from the weekday populations that the GPAs are not different to a statistically significant amount. In these cases, we fail to reject the null hypothesis for each weekday.

However, the classes that meet on weekends have GPAs that are statistically distinct from the other days. For Saturday and Sunday, we reject the null hypothesis.

Use Bonferroni Correction¶

multipletests(one_d_p_vals)

(array([False, False, False, False,  True,  True,  True]),
 array([9.92773724e-01, 8.64354734e-01, 8.64354734e-01, 8.64354734e-01,
        3.30240674e-03, 2.19425913e-06, 5.30000743e-06]),
 0.007300831979014655,
 0.0071428571428571435)

When using the Bonferroni correction to account for multiple tests, we can see that we fail to reject the null hypothesis for Monday through Thursday, and reject the null hypothesis for Friday through Sunday.

Due to the random selection of classes however, the results can change.

	section_number	section_gpa
count	175962.000000	175962.000000
mean	5.318677	3.414629
std	11.613988	0.363838
min	1.000000	0.000000
25%	1.000000	3.154255
50%	1.000000	3.447619
75%	4.000000	3.716216
max	234.000000	3.998418

	uuid	name	mon	tues	wed	thurs	fri	sat	sun	subject_name	section_number	section_gpa
0	00015734-b612-3152-bf5f-7f6855e1c0c0	Fundamentals of Analytical Sci	false	false	true	false	false	false	false	Chemistry	1	3.117647
1	00015734-b612-3152-bf5f-7f6855e1c0c0	Fundamentals of Analytical Sci	false	true	false	true	false	false	false	Chemistry	1	3.117647
2	0002389b-0bda-3f47-b5e7-e9d8973cb2e9	Corporate&Special Info Serv	true	false	false	false	false	false	false	Library and Information Studies	1	3.818182
3	00028b06-6e42-3a3e-b484-69fd61baf978	Spectrochemical Measuremts	true	false	true	false	true	false	false	Chemistry	1	3.425926
4	0006b3f8-3403-35d5-bda2-9489e3c84434	Consultatn&Suprvsn-Soc Work	false	true	false	false	false	false	false	Social Work	1	3.730769

	section_number	section_gpa
count	175962.000000	175962.000000
mean	5.318677	3.414629
std	11.613988	0.363838
min	1.000000	0.000000
25%	1.000000	3.154255
50%	1.000000	3.447619
75%	4.000000	3.716216
max	234.000000	3.998418