This notebook performs a SQL query on the UW Madison database and does a hypothesis test comparing morning and afternoon class GPAs.
$H_0$ = There is no difference between GPAs of morning and afternoon classes.
$H_A$ = There is a difference between GPAs of morning and afternoon classes.
from sqlalchemy import create_engine
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
engine = create_engine('postgresql:///uw_madison')
morning_df = pd.read_sql('SELECT * FROM morning_classes_gpas;', engine)
morning_df.describe()
afternoon_df = pd.read_sql('SELECT * FROM afternoon_classes_gpas;', engine)
afternoon_df.describe()
morning_minus_four_os = morning_df.loc[morning_df['section_gpa'] != 4.0]['section_gpa']
morning_minus_four_os.describe()
afternoon_minus_four_os = afternoon_df.loc[afternoon_df['section_gpa'] != 4.0]['section_gpa']
afternoon_minus_four_os.describe()
morning_choice = np.random.choice(morning_minus_four_os, size=2000, replace=False)
afternoon_choice = np.random.choice(afternoon_minus_four_os, size=2000, replace=False)
fig, ax = plt.subplots()
ax.hist(morning_choice, alpha=0.5, color='#1f77b4', bins=50, range=(1.0, 4.0), label='Morning Classes')
ax.axvline(morning_choice.mean(), alpha=1, color='#1f77b4', linestyle='dashed', label='Morning Class Mean')
ax.hist(afternoon_choice, alpha=0.5, bins=50, color='#ff7f0e', range=(1.0, 4.0), label='Afternoon Classes')
ax.axvline(afternoon_choice.mean(), alpha=1, color='#ff7f0e', linestyle='dashed', label='Afternoon Class Mean')
ax.legend()
ax.set_xlabel('GPA');
stats.ttest_ind(morning_choice, afternoon_choice, equal_var=False)
This shows that there is not a statistically significant difference between the morning and afternoon class GPAs.
Thus, we fail to reject the null hypothesis that grades are the same between the morning and afternoon.