CMPS3160 Project: Delays & Declines
Happy Herold, Syed Mehran
This project is a collaboration between Happy Herold and Syed Mehran for CMPS-3160: Intro to Data Science. The authors intend to investigate how airline operational performance can predict stock prices. To investigate, the authors have compiled data from the last 6 months on airline financials and performance.
Project Goals:
Authors hope to learn about how operational inefficiency contributes to stock prices.
Authors would also like to learn how soon the market responds to inefficiencies
Authors want to investigate how one could use this data to develop a model that can predict price shifts due to inefficiency
Milestone 1:
The authors intend to investigate how airline operational performance can predict stock prices. The data sets for this project are stock market data obtained from Yahoo Finance and domestic flight data from the Bureau of Transportation Statistics TranStats Marketing Carrier On-Time Performance table. To investigate, the authors have compiled data from the last 6 months of available data (01/01/2023 to 05/31/2023) on airline financials and performance. With the exception of the aviation index the 9 major airlines from the stock market data correspond to the market carriers in the flight data.
Since the aim of this project is to predict the stock prices based on airline performance, the stock market data provides actual values of the stock prices. One of the questions that the project aims to answer is whether or not the performance of a stock can be predicted based on the performance of an airline, and does the performance of multiple airlines impact the stock price of a single airline. As part of the ETL process the daily returns for each stock were calculated and used in creating a correlation matrix for daily returns. Time series data of the trading volume, open price, and closing price were also plotted.
To measure airline performance in this project flight data from the Marketing Carrier On-Time Performance table. The flight data contains flight characteristics including date, market carrier, operating carrier, tail number, origin and destination, scheduled departure time, actual departure time, scheduled arrival time, actual arrival time, canceled and diverted flights as well as other key features of the flight. The flight data does not specifically include airline performance metrics. One of the questions that the project aims to answer is if an airline’s flight data can be used to create a metric that can quantify the performance of an airline and can these metrics be used to compare different airlines and be able to predict the stock price. During the ETL process daily number of flights, daily number of cancellations, and proportion of flights that are operated by the market carrier have been calculated using the flight data and plotted.
****Data Set Websites:****
Collaboration Plan:
A private project git repository has been created to store the data and the Colab Jupyter notebook. Authors intend to meet once a week to discuss weekly project goals, raise questions, and develop the final product. Meetings will take place by Zoom or in-person and the pair will communicate using text messaging. The weekly goals and project to-do list are shared in a OneNote document that is shared between the pair. Up until Milestone 2 the authors will continue to calculate various airline performance metrics from the flight data and begin predicting the stock prices based on these metrics with different scikit-learn regression models. After completing Milestone 2, the authors plan on gathering data visualizations and the model to develop a final website and presentation that clearly explains the aim of the project, the steps and tools utilized in the project, and the findings from the analysis and model created.
!git clone https://github.com/syedmehran/cmps3160-project.git
%cd cmps3160-project/
Cloning into 'cmps3160-project'... remote: Enumerating objects: 570, done. remote: Counting objects: 100% (165/165), done. remote: Compressing objects: 100% (93/93), done. remote: Total 570 (delta 101), reused 131 (delta 72), pack-reused 405 Receiving objects: 100% (570/570), 688.84 MiB | 18.42 MiB/s, done. Resolving deltas: 100% (331/331), done. Updating files: 100% (353/353), done. /content/cmps3160-project
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob
import yfinance as yf
import datetime
Stock data ETL¶
#Load stock data in
financ_UAL = yf.download('UAL', start = '2015-04-30') # united
financ_ALK = yf.download('ALK', start = '2015-04-30') # alaskan
financ_AAL = yf.download("AAL", start = '2015-04-30') # american
financ_ALGT = yf.download("ALGT", start = '2015-04-30') # allegiant
financ_HA = yf.download("HA", start = '2015-04-30') # hawaiian
financ_LUV = yf.download("LUV", start = '2015-04-30') # southwest
financ_SAVE = yf.download("SAVE", start = '2015-04-30') # spirit
financ_JBLU = yf.download("JBLU", start = '2015-04-30') # jetblue
financ_DAL = yf.download("DAL", start = '2015-04-30') # delta
financ_JETS = yf.download("JETS", start = '2015-04-30') # JETS etf index
financ_FRONTIER = yf.download("ULCC", start='2015-04-30')
financ_OIL = yf.download('CL=F', start = '2015-04-30')
financ_airlines = {
'United': financ_UAL,
'Alaskan': financ_ALK,
'American': financ_AAL,
'Allegiant': financ_ALGT,
'JetBlue': financ_JBLU,
'Hawaiian': financ_HA,
'Southwest': financ_LUV,
'Spirit': financ_SAVE,
'Delta': financ_DAL,
'Frontier': financ_FRONTIER,
'JETS' : financ_JETS
}
airline_colors = {
'United': '#1f77b4', # muted blue
'Alaskan': '#2ca02c', # cooked asparagus green
'American': '#d62728', # brick red
'Allegiant': '#ff7f0e', # safety orange
'JetBlue': '#9467bd', # muted purple
'Hawaiian': '#e377c2', # raspberry yogurt pink
'Southwest': '#7f7f7f', # middle gray
'Spirit': '#bcbd22', # curry yellow-green
'Delta': '#17becf', # blue-teal
'JETS': '#8c564b', # chestnut brown
'Frontier': '#c7c7c7' # silver gray
}
[*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed
# Load in airline financial data
financials_2023 = pd.read_csv('../cmps3160-project/finance-data/T_F41SCHEDULE_P12_2023.csv')
financials_2022 = pd.read_csv('../cmps3160-project/finance-data/T_F41SCHEDULE_P12_2022.csv')
financials_2021 = pd.read_csv('../cmps3160-project/finance-data/T_F41SCHEDULE_P12_2021.csv')
financials_2020 = pd.read_csv('../cmps3160-project/finance-data/T_F41SCHEDULE_P12_2020.csv')
financials_2020.tail()
NET_INCOME | OP_PROFIT_LOSS | TRANS_REV_PAX | TOTAL_PROPERTY | PROP_FREIGHT | PROP_BAG | TOTAL_CHARTER | CHARTER_PAX | CHARTER_PROP | ... | UNIQUE_CARRIER | UNIQUE_CARRIER_NAME | CARRIER | CARRIER_NAME | UNIQUE_CARRIER_ENTITY | REGION | CARRIER_GROUP_NEW | CARRIER_GROUP | YEAR | QUARTER | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
408 | 262685.0 | -597883.0 | NaN | 726.0 | NaN | 444323.0 | NaN | NaN | NaN | 7892.0 | ... | FX | Federal Express Corporation | FX | Federal Express Corporation | 16196 | A | 3 | 3 | 2020 | 3 |
409 | 688988.0 | 1395899.0 | NaN | 0.0 | NaN | 2785917.0 | NaN | NaN | NaN | 7598.0 | ... | FX | Federal Express Corporation | FX | Federal Express Corporation | 06200 | D | 3 | 3 | 2020 | 3 |
410 | 908014.0 | 1130748.0 | NaN | 394.0 | NaN | 2566955.0 | NaN | NaN | NaN | 1195.0 | ... | FX | Federal Express Corporation | FX | Federal Express Corporation | 06200 | D | 3 | 3 | 2020 | 1 |
411 | 926627.0 | 1363349.0 | NaN | 302.0 | NaN | 2502021.0 | NaN | NaN | NaN | 33790.0 | ... | FX | Federal Express Corporation | FX | Federal Express Corporation | 06200 | D | 3 | 3 | 2020 | 2 |
412 | 1188167.0 | 1624892.0 | NaN | 45.0 | NaN | 2986833.0 | NaN | NaN | NaN | 6047.0 | ... | FX | Federal Express Corporation | FX | Federal Express Corporation | 06200 | D | 3 | 3 | 2020 | 4 |
5 rows × 52 columns
financials = pd.concat([financials_2023, financials_2022, financials_2021, financials_2020], ignore_index=True)
for airline, df in financ_airlines.items():
print(f"{airline} DataFrame size: {df.shape}")
print(f"Null values in {airline}:")
null_counts = df.isnull().sum() # counts of null values in each column
print(null_counts[null_counts > 0]) # print columns with null values
print("\n")
print(financ_OIL.isnull().values.any())
financ_OIL.shape
United DataFrame size: (2173, 6) Null values in United: Series([], dtype: int64) Alaskan DataFrame size: (2173, 6) Null values in Alaskan: Series([], dtype: int64) American DataFrame size: (2173, 6) Null values in American: Series([], dtype: int64) Allegiant DataFrame size: (2173, 6) Null values in Allegiant: Series([], dtype: int64) JetBlue DataFrame size: (2173, 6) Null values in JetBlue: Series([], dtype: int64) Hawaiian DataFrame size: (2173, 6) Null values in Hawaiian: Series([], dtype: int64) Southwest DataFrame size: (2173, 6) Null values in Southwest: Series([], dtype: int64) Spirit DataFrame size: (2173, 6) Null values in Spirit: Series([], dtype: int64) Delta DataFrame size: (2173, 6) Null values in Delta: Series([], dtype: int64) Frontier DataFrame size: (682, 6) Null values in Frontier: Series([], dtype: int64) JETS DataFrame size: (2173, 6) Null values in JETS: Series([], dtype: int64) False
(2172, 6)
This indicates that the data downloaded from Yahoo Finance is clean and does not have missing values that need to be handled.
financ_airline_codes = {'United': 'UAL', 'Alaskan': 'ALK', 'American': 'AAL',
'Allegiant': 'ALGT', 'JetBlue': 'JBLU', 'Hawaiian': 'HA',
'Southwest': 'LUV', 'Spirit': 'SAVE', 'Delta': 'DAL',
'JETS': 'JETS'}
flight_airline_codes = {'American':'AA', 'Delta':'DL', 'Southwest':'WN',
'United':'UA', 'Alaskan':'AS', 'Spirit': 'NK',
'Hawaiian':'HA', 'Allegiant':'G4', 'JetBlue':'B6',
'Frontier':'F9'}
financ_JETS.head() # displaying JETS ETF to check data and columns
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
2015-04-30 | 24.493999 | 24.579000 | 24.129999 | 24.188000 | 23.045496 | 7300 |
2015-05-01 | 24.700001 | 24.969000 | 24.379999 | 24.870001 | 23.695282 | 182800 |
2015-05-04 | 25.100000 | 25.100000 | 24.600000 | 24.660000 | 23.495201 | 209000 |
2015-05-05 | 24.480000 | 24.480000 | 24.000000 | 24.000000 | 22.866375 | 75400 |
2015-05-06 | 24.010000 | 24.110001 | 23.590000 | 23.980000 | 22.847322 | 36900 |
plt.figure(figsize=(14,7))
for airline, df in financ_airlines.items():
plt.plot(df['Close'], label=airline, color=airline_colors[airline])
plt.title('Time Series Graph of Closing Prices ($)')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.legend()
plt.show()
A curiosity to note here would be the y-axis (-50 to 300). The graph is indicating there was a time when oil had gone negative. Upon further research, on 4/20/20, oil reached a low of -$37
Correlational Analysis¶
Now, we want to calculate the daily returns on each stock. We can use a for function here to loop through our airlines dictionary from earlier. Since the function produces an NA for the first date, we will fill that in with 0. We want to investigate daily returns as a percentage change of the previous day, so setting the first day equal to NA makes sense.
This will allow us to calculate how correlated each stock's movement is with one another.
for airline, df in financ_airlines.items():
df['Daily Return'] = df['Close'].pct_change().dropna()
financ_OIL['Daily Return'] = financ_OIL['Close'].pct_change().dropna()
combined_returns = pd.DataFrame()
for airline, df in financ_airlines.items():
combined_returns[airline] = df['Daily Return']
mean_returns = combined_returns.mean()
cov_returns = combined_returns.cov()
std_returns = combined_returns.std()
combined_returns.dropna(inplace=True)
combined_returns.head()
United | Alaskan | American | Allegiant | JetBlue | Hawaiian | Southwest | Spirit | Delta | Frontier | JETS | |
---|---|---|---|---|---|---|---|---|---|---|---|
Date | |||||||||||
2021-04-05 | 0.025424 | 0.023025 | 0.015507 | 0.021900 | 0.039273 | 0.005968 | 0.027243 | 0.003480 | 0.029080 | 0.014854 | 0.020702 |
2021-04-06 | 0.003205 | 0.037277 | -0.007016 | 0.007277 | 0.003779 | 0.002596 | 0.017945 | 0.021339 | 0.027861 | 0.031887 | 0.008330 |
2021-04-07 | 0.001513 | -0.018714 | -0.005403 | -0.005557 | -0.009412 | -0.026257 | -0.009048 | -0.019849 | -0.028074 | 0.042553 | -0.009698 |
2021-04-08 | -0.002350 | 0.004561 | -0.011701 | 0.001916 | 0.001900 | -0.003038 | -0.000945 | 0.003730 | -0.007769 | 0.032556 | -0.002902 |
2021-04-09 | -0.016827 | -0.000825 | -0.004651 | 0.007211 | -0.001423 | -0.006476 | -0.000630 | -0.011680 | -0.010841 | 0.005647 | -0.006912 |
Creating a correlation matrix so we can examine relationships between shifts in daily returns between all companies.
correlation_matrix = combined_returns.corr()
import seaborn as sns
#create correlation matrix
cmap = sns.diverging_palette(10, 130, as_cmap=True)
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap = 'Blues', vmin=-1, vmax=1,
xticklabels=correlation_matrix.columns,
yticklabels=correlation_matrix.columns)
plt.title('Correlation Matrix of Daily Returns')
plt.show()
The matrix suggests that while the airline industry's stocks move together to a large extent. Analyzing these correlations over different time lags or during specific events, such as oil price shocks or global events affecting travel, to see if and how these relationships change.
# combine oil returns with airline returns
financ_OIL['Daily Return'] = financ_OIL['Close'].pct_change()
combined_corr_data = financ_OIL[['Daily Return']].rename(columns={'Daily Return': 'Oil Return'})
for airline, df in financ_airlines.items():
combined_corr_data[airline] = df['Close'].pct_change()
combined_corr_data.dropna(inplace = True)
combined_corr_data.head()
Oil Return | United | Alaskan | American | Allegiant | JetBlue | Hawaiian | Southwest | Spirit | Delta | Frontier | JETS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | ||||||||||||
2021-04-05 | -0.045565 | 0.025424 | 0.023025 | 0.015507 | 0.021900 | 0.039273 | 0.005968 | 0.027243 | 0.003480 | 0.029080 | 0.014854 | 0.020702 |
2021-04-06 | 0.011594 | 0.003205 | 0.037277 | -0.007016 | 0.007277 | 0.003779 | 0.002596 | 0.017945 | 0.021339 | 0.027861 | 0.031887 | 0.008330 |
2021-04-07 | 0.007416 | 0.001513 | -0.018714 | -0.005403 | -0.005557 | -0.009412 | -0.026257 | -0.009048 | -0.019849 | -0.028074 | 0.042553 | -0.009698 |
2021-04-08 | -0.002844 | -0.002350 | 0.004561 | -0.011701 | 0.001916 | 0.001900 | -0.003038 | -0.000945 | 0.003730 | -0.007769 | 0.032556 | -0.002902 |
2021-04-09 | -0.004698 | -0.016827 | -0.000825 | -0.004651 | 0.007211 | -0.001423 | -0.006476 | -0.000630 | -0.011680 | -0.010841 | 0.005647 | -0.006912 |
delays = [7, 14, 30,60,90,180,365]
airline_names = ['United', 'Alaskan', 'American', 'Allegiant', 'JetBlue', 'Hawaiian', 'Southwest', 'Spirit', 'Delta', 'JETS']
correlation_results = pd.DataFrame(index=airline_names, columns=delays)
for delay in delays:
for airline in airline_names:
shifted_returns = combined_corr_data[airline].shift(-delay)
correlation = combined_corr_data['Oil Return'].corr(shifted_returns)
correlation_results.at[airline, delay] = correlation
correlation_results.head(10)
7 | 14 | 30 | 60 | 90 | 180 | 365 | |
---|---|---|---|---|---|---|---|
United | 0.008827 | 0.001552 | -0.012239 | 0.025134 | -0.063869 | 0.002089 | -0.03858 |
Alaskan | -0.023877 | -0.000322 | 0.021835 | -0.010173 | -0.05357 | 0.036831 | -0.015939 |
American | -0.017728 | 0.001487 | 0.006128 | 0.044039 | -0.058847 | 0.016144 | -0.029939 |
Allegiant | -0.010919 | -0.022085 | 0.015095 | 0.049887 | -0.068738 | 0.003521 | -0.03009 |
JetBlue | -0.023745 | 0.018836 | 0.042487 | 0.029154 | -0.052595 | 0.075384 | -0.042281 |
Hawaiian | -0.026035 | 0.021067 | 0.001887 | 0.019371 | 0.009924 | -0.023589 | -0.073234 |
Southwest | -0.031236 | -0.004643 | 0.020085 | 0.010116 | 0.002631 | 0.000623 | -0.040111 |
Spirit | 0.027841 | 0.004721 | 0.003411 | 0.003366 | -0.008387 | 0.056605 | -0.043658 |
Delta | 0.000773 | -0.013916 | 0.00983 | 0.015437 | -0.042934 | 0.012249 | -0.018705 |
JETS | -0.021892 | 0.001033 | 0.005418 | 0.02165 | -0.051936 | 0.026966 | -0.056063 |
import scipy.stats as stats
significance_results = {}
n = 2160
correlations = {
'United': [-0.049228, -0.030873, -0.025656, -0.079161, -0.012919, -0.016999, -0.004876],
'Alaskan': [-0.041154, -0.019935, -0.021779, -0.066841, -0.004358, 0.001154, -0.000124],
'American': [-0.056931, -0.016334, 0.00937, -0.070561, -0.003926, -0.005263, 0.007494],
'Allegiant': [-0.030043, -0.073951, -0.046508, -0.029428, -0.025044, -0.018937, 0.008539],
'JetBlue': [-0.061789, -0.036663, -0.01862, -0.05643, -0.004846, 0.003824, 0.003574],
'Hawaiian': [-0.058751, -0.029893, -0.040023, -0.045276, 0.003318, -0.037189, 0.001015],
'Southwest': [-0.042116, -0.020714, -0.035552, -0.045155, 0.006958, -0.007177, 0.000559],
'Spirit': [-0.029643, 0.01245, -0.048053, -0.058677, -0.006752, 0.001502, 0.004846],
'Delta': [-0.068438, -0.004271, -0.015327, -0.060281, -0.007192, -0.008411, -0.003624],
'JETS': [-0.061341, -0.02731, -0.032695, -0.061719, -0.007318, -0.005128, -0.003519]
}
for airline, corr_values in correlations.items():
significance_results[airline] = {}
for delay_index, r in enumerate(corr_values):
df = n - 2
t_stat = r * (df ** 0.5) / ((1 - r ** 2) ** 0.5)
p_value = 2 * (1 - stats.t.cdf(abs(t_stat), df))
significance_results[airline][delays[delay_index]] = p_value
# Display the p-values
for airline in significance_results:
print(f"Airline: {airline}")
for delay in significance_results[airline]:
print(f"Delay {delay} days: p-value = {significance_results[airline][delay]:.4f}")
Airline: United Delay 7 days: p-value = 0.0221 Delay 14 days: p-value = 0.1515 Delay 30 days: p-value = 0.2333 Delay 60 days: p-value = 0.0002 Delay 90 days: p-value = 0.5484 Delay 180 days: p-value = 0.4297 Delay 365 days: p-value = 0.8208 Airline: Alaskan Delay 7 days: p-value = 0.0558 Delay 14 days: p-value = 0.3544 Delay 30 days: p-value = 0.3117 Delay 60 days: p-value = 0.0019 Delay 90 days: p-value = 0.8396 Delay 180 days: p-value = 0.9573 Delay 365 days: p-value = 0.9954 Airline: American Delay 7 days: p-value = 0.0081 Delay 14 days: p-value = 0.4480 Delay 30 days: p-value = 0.6634 Delay 60 days: p-value = 0.0010 Delay 90 days: p-value = 0.8553 Delay 180 days: p-value = 0.8069 Delay 365 days: p-value = 0.7278 Airline: Allegiant Delay 7 days: p-value = 0.1628 Delay 14 days: p-value = 0.0006 Delay 30 days: p-value = 0.0307 Delay 60 days: p-value = 0.1716 Delay 90 days: p-value = 0.2446 Delay 180 days: p-value = 0.3790 Delay 365 days: p-value = 0.6916 Airline: JetBlue Delay 7 days: p-value = 0.0041 Delay 14 days: p-value = 0.0885 Delay 30 days: p-value = 0.3871 Delay 60 days: p-value = 0.0087 Delay 90 days: p-value = 0.8219 Delay 180 days: p-value = 0.8590 Delay 365 days: p-value = 0.8681 Airline: Hawaiian Delay 7 days: p-value = 0.0063 Delay 14 days: p-value = 0.1649 Delay 30 days: p-value = 0.0629 Delay 60 days: p-value = 0.0354 Delay 90 days: p-value = 0.8775 Delay 180 days: p-value = 0.0840 Delay 365 days: p-value = 0.9624 Airline: Southwest Delay 7 days: p-value = 0.0503 Delay 14 days: p-value = 0.3359 Delay 30 days: p-value = 0.0986 Delay 60 days: p-value = 0.0359 Delay 90 days: p-value = 0.7465 Delay 180 days: p-value = 0.7389 Delay 365 days: p-value = 0.9793 Airline: Spirit Delay 7 days: p-value = 0.1685 Delay 14 days: p-value = 0.5631 Delay 30 days: p-value = 0.0255 Delay 60 days: p-value = 0.0064 Delay 90 days: p-value = 0.7538 Delay 180 days: p-value = 0.9444 Delay 365 days: p-value = 0.8219 Airline: Delta Delay 7 days: p-value = 0.0015 Delay 14 days: p-value = 0.8427 Delay 30 days: p-value = 0.4765 Delay 60 days: p-value = 0.0051 Delay 90 days: p-value = 0.7383 Delay 180 days: p-value = 0.6960 Delay 365 days: p-value = 0.8663 Airline: JETS Delay 7 days: p-value = 0.0043 Delay 14 days: p-value = 0.2045 Delay 30 days: p-value = 0.1287 Delay 60 days: p-value = 0.0041 Delay 90 days: p-value = 0.7339 Delay 180 days: p-value = 0.8117 Delay 365 days: p-value = 0.8702
# create combined line chart for all airlines with significance stars for correlations
p_values_data = {
'United': [0.0221, 0.1515, 0.2333, 0.0002, 0.5484, 0.4297, 0.8208],
'Alaskan': [0.0558, 0.3544, 0.3117, 0.0019, 0.8396, 0.9573, 0.9954],
'American': [0.0081, 0.4480, 0.6634, 0.0010, 0.8553, 0.8069, 0.7278],
'Allegiant': [0.1628, 0.0006, 0.0307, 0.1716, 0.2446, 0.3790, 0.6916],
'JetBlue': [0.0041, 0.0885, 0.3871, 0.0087, 0.8219, 0.8590, 0.8681],
'Hawaiian': [0.0063, 0.1649, 0.0629, 0.0354, 0.8775, 0.0840, 0.9624],
'Southwest': [0.0503, 0.3359, 0.0986, 0.0359, 0.7465, 0.7389, 0.9793],
'Spirit': [0.1685, 0.5631, 0.0255, 0.0064, 0.7538, 0.9444, 0.8219],
'Delta': [0.0015, 0.8427, 0.4765, 0.0051, 0.7383, 0.6960, 0.8663],
'JETS': [0.0043, 0.2045, 0.1287, 0.0041, 0.7339, 0.8117, 0.8702]
}
# Initialize the plot
plt.figure(figsize=(14, 10))
# Plotting the correlation values for each airline
for airline, corr_values in correlations.items():
p_values = p_values_data[airline]
plt.plot(delays, corr_values, marker='o', label=airline)
# Adding stars for significant correlations
for delay, p_value, corr in zip(delays, p_values, corr_values):
if p_value < 0.05:
plt.plot(delay, corr, 'r*', markersize=12)
plt.title('Correlation of Oil Returns with Airline Returns Over Various Time Delays')
plt.xlabel('Time Delay (days)')
plt.ylabel('Correlation Coefficient')
plt.legend()
plt.grid(False)
plt.show()
The above plot shows the correlation coefficient between oil daily returns and airline returns dependent on time delays set to test the correlation. We see the highest number of significant correlations occur at 60 days. The daily return of airline stocks is correlated with oil returns from 60 days prior.
Load in Data about operational metrics¶
#filter by carriers we are analyzing
financials = financials[financials['CARRIER'].isin(['AA', 'DL', 'WN', 'UA', 'AS', 'F9', 'HA', 'G4', 'B6', 'NK'])]
financials
NET_INCOME | OP_PROFIT_LOSS | TRANS_REV_PAX | TOTAL_PROPERTY | PROP_FREIGHT | PROP_BAG | TOTAL_CHARTER | CHARTER_PAX | CHARTER_PROP | ... | UNIQUE_CARRIER | UNIQUE_CARRIER_NAME | CARRIER | CARRIER_NAME | UNIQUE_CARRIER_ENTITY | REGION | CARRIER_GROUP_NEW | CARRIER_GROUP | YEAR | QUARTER | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | -294047.62 | -294047.62 | 895106.59 | 7515.70 | NaN | 124382.84 | 8503.14 | NaN | 0.00 | 0.0 | ... | AA | American Airlines Inc. | AA | American Airlines Inc. | 10049 | A | 3 | 3 | 2023 | 1 |
3 | -258801.65 | -234102.19 | 5015655.46 | 10224.02 | NaN | 68087.22 | 210466.52 | NaN | 17355.33 | NaN | ... | UA | United Air Lines Inc. | UA | United Air Lines Inc. | 0A875 | D | 3 | 3 | 2023 | 1 |
4 | -233930.65 | -295070.82 | 1464226.29 | 0.00 | NaN | 101.75 | 104368.02 | NaN | 224.13 | 0.0 | ... | B6 | JetBlue Airways | B6 | JetBlue Airways | 06673 | D | 3 | 3 | 2023 | 1 |
5 | -185853.00 | 437900.00 | 5914695.00 | 9697.00 | NaN | 60430.00 | 172908.00 | NaN | 77822.00 | 85.0 | ... | DL | Delta Air Lines Inc. | DL | Delta Air Lines Inc. | 01260 | D | 3 | 3 | 2023 | 1 |
6 | -153850.00 | -274205.00 | 4697111.00 | 587.00 | NaN | 38321.00 | 15128.00 | NaN | 5787.00 | 0.0 | ... | WN | Southwest Airlines Co. | WN | Southwest Airlines Co. | 06725 | D | 3 | 3 | 2023 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1336 | 6542.00 | 22237.00 | 637215.00 | 0.00 | NaN | 9743.00 | 26125.00 | NaN | 3724.00 | 0.0 | ... | DL | Delta Air Lines Inc. | DL | Delta Air Lines Inc. | 10260 | L | 3 | 3 | 2020 | 1 |
1350 | 9127.98 | 14530.18 | 30357.19 | 0.00 | NaN | 0.00 | 4681.78 | NaN | 0.00 | 0.0 | ... | F9 | Frontier Airlines Inc. | F9 | Frontier Airlines Inc. | 16461 | L | 3 | 3 | 2020 | 1 |
1356 | 11141.00 | 30232.00 | 744195.00 | 5898.00 | NaN | 58152.00 | 16292.00 | NaN | 8705.00 | 0.0 | ... | DL | Delta Air Lines Inc. | DL | Delta Air Lines Inc. | 10261 | A | 3 | 3 | 2020 | 1 |
1366 | 17296.17 | -80572.10 | 130274.24 | 0.00 | NaN | 0.00 | 17308.07 | NaN | 0.00 | 0.0 | ... | F9 | Frontier Airlines Inc. | F9 | Frontier Airlines Inc. | 06460 | D | 3 | 3 | 2020 | 2 |
1384 | 29778.84 | 38659.28 | 299862.94 | 0.00 | NaN | 0.00 | 57336.92 | NaN | 8918.84 | 0.0 | ... | G4 | Allegiant Air | G4 | Allegiant Air | 06035 | D | 3 | 3 | 2020 | 1 |
358 rows × 52 columns
The above dataframe has data about carriers we are not analyzing, so we can remove them. We then want to group operational profits/losses by unique carrier.
grouped_income = pd.DataFrame()
grouped_income = financials.groupby(['UNIQUE_CARRIER', 'YEAR', 'QUARTER'])['OP_PROFIT_LOSS'].sum()
grouped_income_df = grouped_income.reset_index()
grouped_income_new = pd.DataFrame()
grouped_income_new = financials.groupby(['UNIQUE_CARRIER', 'YEAR', 'QUARTER'])['OP_PROFIT_LOSS'].sum()
grouped_income_df_new = grouped_income.reset_index()
# set back quarter by 1 since these operating losses aren't known until the future
# we want to use current operating losses (posted at end of quarter) to predict declines
grouped_income_df_new['QUARTER'] = grouped_income_df_new['QUARTER'] - 1
for i in grouped_income_df_new.index:
if grouped_income_df_new.at[i, 'QUARTER'] == 0:
grouped_income_df_new.at[i, 'QUARTER'] = 4
grouped_income_df_new.at[i, 'YEAR'] -= 1
grouped_income_df_new
UNIQUE_CARRIER | YEAR | QUARTER | OP_PROFIT_LOSS | |
---|---|---|---|---|
0 | AA | 2019 | 4 | -2535836.17 |
1 | AA | 2020 | 1 | -4459137.82 |
2 | AA | 2020 | 2 | -4945286.70 |
3 | AA | 2020 | 3 | -2438663.13 |
4 | AA | 2020 | 4 | -3439576.16 |
... | ... | ... | ... | ... |
135 | WN | 2022 | 1 | 1158160.00 |
136 | WN | 2022 | 2 | 395148.00 |
137 | WN | 2022 | 3 | -385541.00 |
138 | WN | 2022 | 4 | -284229.00 |
139 | WN | 2023 | 1 | 794871.00 |
140 rows × 4 columns
plt.figure(figsize=(10, 6))
for carrier in grouped_income_df['UNIQUE_CARRIER'].unique():
subset = grouped_income_df[grouped_income_df['UNIQUE_CARRIER'] == carrier]
plt.plot(subset['YEAR'].astype(str) + ' Q' + subset['QUARTER'].astype(str),
subset['OP_PROFIT_LOSS'], label=carrier, marker='o')
plt.title('Operational Profit/Loss by Quarter for Market Carriers (2020-2023)')
plt.xlabel('Year and Quarter')
plt.ylabel('Operational Profit/Loss')
plt.xticks(rotation=45)
plt.legend()
plt.grid(False)
plt.tight_layout()
plt.show()
All carriers experienced fluctuations in their operational profits/losses across the quarters. Most carriers had a downward trend starting in Q1 2021, with losses peaking in Q2 or Q3 of the same year. A general recovery seems to occur from Q3 2021 onwards, with many lines showing an upward trend. However, in the first two quarters of 2023, there appears to be a significant split with some carriers showing sharp increases in profits while others show decreases or remain relatively stable.
Stock Price Trend Transition Probabilities¶
Process Overview¶
For each stock (airline), calculate the monthly price change: $$ \text{Price Change} = \text{Close}_{t} - \text{Close}_{t-1} $$
Label the trend based on the price change: $$ \text{Trend} = \begin{cases} \text{'up'} & \text{if Price Change} > 0 \\ \text{'down'} & \text{if Price Change} < 0 \\ \text{'no change'} & \text{if Price Change} = 0 \end{cases} $$
Calculate the transition probabilities:
- Define the previous month's trend: $$ \text{Prev Trend} = \text{Trend}_{t-1} $$
- Count the transitions (up-up, down-down, up-down, down-up).
- Calculate the probabilities of each transition: $$ P(\text{Transition}) = \frac{\text{Count of Transition}}{\text{Total Transitions}} $$
Transition Matrix¶
The transition matrix is formed as follows: $$ \begin{array}{|c|c|c|c|c|} \hline \text{Airline} & \text{up-up} & \text{up-down} & \text{down-up} & \text{down-down} \\ \hline \text{Airline 1} & P_{\text{up-up}} & P_{\text{up-down}} & P_{\text{down-up}} & P_{\text{down-down}} \\ \text{Airline 2} & \dots & \dots & \dots & \dots \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ \hline \end{array} $$
Here, $P_\text{transition type}$ represents the probability of a specific transition type for an airline.
for airline, df in financ_airlines.items():
# calculate difference in two day consecutive 'Close' price
df['Price Change'] = df['Close'].diff()
# label if price went up or down
df['Trend'] = df['Price Change'].apply(lambda x: 'up' if x > 0 else ('down' if x < 0 else 'no change'))
def calculate_transition_probabilities(df):
# shifted column to compare each day's trend with the previous day's
df['Prev Trend'] = df['Trend'].shift(1)
# transition state
transitions = ['up-up', 'down-down', 'up-down', 'down-up']
# initialize dictionary to hold the counts of each transition
transition_counts = {transition: 0 for transition in transitions}
# loop through and count the transitions
for prev, current in zip(df['Prev Trend'][1:], df['Trend'][1:]): # Skip the first entry (NaN)
if prev == 'up' and current == 'up':
transition_counts['up-up'] += 1
elif prev == 'down' and current == 'down':
transition_counts['down-down'] += 1
elif prev == 'up' and current == 'down':
transition_counts['up-down'] += 1
elif prev == 'down' and current == 'up':
transition_counts['down-up'] += 1
# sum the total number of transitions and calculate probabilities
total_transitions = sum(transition_counts.values())
transition_probabilities = {transition: count / total_transitions for transition, count in transition_counts.items()}
return transition_probabilities
transitions = ['up-up', 'up-down', 'down-up', 'down-down']
transition_matrix = pd.DataFrame(columns=transitions)
for airline, df in financ_airlines.items():
transition_probs = calculate_transition_probabilities(df)
transition_matrix.loc[airline] = transition_probs.values()
print(transition_matrix)
up-up up-down down-up down-down United 0.266790 0.239463 0.247337 0.246410 Alaskan 0.250348 0.238737 0.255922 0.254993 American 0.250117 0.264610 0.242637 0.242637 Allegiant 0.236793 0.264133 0.249305 0.249768 JetBlue 0.225991 0.255614 0.258958 0.259436 Hawaiian 0.233882 0.252235 0.257412 0.256471 Southwest 0.252666 0.242930 0.252202 0.252202 Spirit 0.215377 0.263085 0.261232 0.260306 Delta 0.260159 0.231667 0.255021 0.253153 Frontier 0.203927 0.324773 0.234139 0.237160 JETS 0.264720 0.235045 0.249647 0.250589
plt.figure(figsize=(10, 8))
plt.title('Transition Probability Heatmap')
sns.heatmap(transition_matrix, annot=True, cmap='Blues', fmt=".3f")
plt.title('Transition Probability Heatmap')
plt.ylabel('Airlines')
plt.xlabel('Transition Type')
plt.show()
The heatmap of trend transition probabilities suggests the following:
- Stocks with higher 'up-up' and 'down-down' probabilities might show stronger trend persistence, meaning that once they start moving in a certain direction, they are more likely to continue in that direction for at least another day.
- Conversely, stocks with higher 'up-down' and 'down-up' probabilities tend to have more frequent trend reversals, suggesting higher volatility and possibly less predictable movements.
- The relative size of these probabilities can inform investors about the nature of the stock's price movements, which can be important for trading strategies that depend on momentum or mean reversion.
Marketing Carrier On-Time Performance Data¶
Marketing Carrier On-Time Performance Data ETL¶
This flight data was obtained from the Bureau of Transportation Statistics Airline On-Time Performance Data Base. The data table used for the following data is Marketing Carrier On-Time Performance (Beginning January 2018).
On-Time : Marketing Carrier On-Time Performance (Beginning January 2018)
# Load in flight Data
flight_cols = ['MKT_UNIQUE_CARRIER', 'OP_UNIQUE_CARRIER','YEAR', 'QUARTER', 'MONTH', 'DAY_OF_MONTH',
'DAY_OF_WEEK', 'FL_DATE', 'MKT_CARRIER','OP_CARRIER', 'TAIL_NUM',
'ORIGIN', 'DEST','CRS_DEP_TIME', 'DEP_TIME', 'DEP_DELAY',
'DEP_DELAY_NEW', 'DEP_DEL15','DEP_DELAY_GROUP',
'TAXI_OUT', 'WHEELS_OFF','WHEELS_ON', 'TAXI_IN', 'CRS_ARR_TIME',
'ARR_TIME', 'ARR_DELAY','ARR_DELAY_NEW', 'ARR_DEL15', 'ARR_DELAY_GROUP',
'CANCELLED', 'CANCELLATION_CODE', 'DIVERTED']
airline_code = ['AA', 'DL', 'WN', 'UA', 'AS', 'F9', 'HA', 'G4', 'B6', 'NK']
file_dict = {}
for airline in airline_code:
file_dict[airline] = glob.glob('../cmps3160-project/202*-*/{}*.csv'.format(airline))
Due to upload size restrictions on git the orginal files had to be broken into multiple smaller files. Below the function combines all files with the same market carrier into one DataFrame.
dtype_cols = {'MKT_UNIQUE_CARRIER':'str', 'OP_UNIQUE_CARRIER':'str','FL_DATE':'str', 'MKT_CARRIER':'str','OP_CARRIER':'str', 'TAIL_NUM':'str',
'ORIGIN':'str', 'DEST':'str','CANCELLED':'int', 'CANCELLATION_CODE':'str', 'DIVERTED':'int'}
def df_airline(input):
frame = pd.read_csv(file_dict[input][0], usecols = flight_cols, dtype = dtype_cols)
for file in file_dict[input][1:]:
frame2 = pd.read_csv(file, usecols = flight_cols, dtype = dtype_cols)
frame = pd.concat([frame, frame2], ignore_index=True)
frame['FL_DATE'] = frame['FL_DATE'].apply(lambda x: x[:-12])
frame['FL_DATE'] = pd.to_datetime(frame['FL_DATE'])
frame = pd.get_dummies(frame, columns=['CANCELLATION_CODE'])
can_cols_names = {'CANCELLATION_CODE_A':'CANCEL_CODE_CARRIER', 'CANCELLATION_CODE_B':'CANCEL_CODE_WEATHER', 'CANCELLATION_CODE_C':'CANCEL_CODE_NAS', 'CANCELLATION_CODE_D':'CANCEL_CODE_SECURITY'}
frame.rename(columns=can_cols_names, inplace=True)
return frame
df_AA = df_airline('AA') # american
df_DL = df_airline('DL') # delta
df_WN = df_airline('WN') # southwest
df_UA = df_airline('UA') # united
df_AS = df_airline('AS') # alaskan
df_NK = df_airline('NK') # spirit
df_HA = df_airline('HA') # hawaiian
df_G4 = df_airline('G4') # allegiant
df_B6 = df_airline('B6') # jetblue
df_F9 = df_airline('F9') # frontier
flight_airlines = {'AA':df_AA, 'DL':df_DL, 'WN':df_WN,
'UA':df_UA, 'AS':df_AS, 'NK': df_NK,
'HA':df_HA, 'G4':df_G4, 'B6':df_B6,
'F9':df_F9}
The following is a DataFrame with all flights that took place from June 2022 to June 2023. Each row of the DataFrame corresponds to one flight.
flight_data = pd.concat([df_AA, df_DL, df_WN, df_UA, df_AS, df_NK, df_HA, df_G4, df_B6, df_F9], axis=0)
flight_data.head()
MKT_UNIQUE_CARRIER | YEAR | QUARTER | MONTH | DAY_OF_MONTH | DAY_OF_WEEK | FL_DATE | MKT_CARRIER | OP_UNIQUE_CARRIER | OP_CARRIER | ... | ARR_DELAY | ARR_DELAY_NEW | ARR_DEL15 | ARR_DELAY_GROUP | CANCELLED | DIVERTED | CANCEL_CODE_CARRIER | CANCEL_CODE_WEATHER | CANCEL_CODE_NAS | CANCEL_CODE_SECURITY | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AA | 2023 | 2 | 5 | 16 | 2 | 2023-05-16 | AA | AA | AA | ... | -32.0 | 0.0 | 0.0 | -2.0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 |
1 | AA | 2023 | 2 | 5 | 16 | 2 | 2023-05-16 | AA | AA | AA | ... | 4.0 | 4.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 |
2 | AA | 2023 | 2 | 5 | 16 | 2 | 2023-05-16 | AA | AA | AA | ... | 12.0 | 12.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 |
3 | AA | 2023 | 2 | 5 | 16 | 2 | 2023-05-16 | AA | AA | AA | ... | -9.0 | 0.0 | 0.0 | -1.0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 |
4 | AA | 2023 | 2 | 5 | 16 | 2 | 2023-05-16 | AA | AA | AA | ... | -28.0 | 0.0 | 0.0 | -2.0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 |
5 rows × 35 columns
The function delay below calculate the number of delays that the market carrier airline has each day. This can be used to calculate the number of arrival delays or departure delays. The FAA considers a flight that is more that 15 minutes late delay. The columns DEP_DEL15 and ARR_DEL15 are binary with 0 representing not delayed and 1 representing delayed.
# Calculate number of delays using the binary column DEP_DEL15 or ARR_DEL15
def delays(DataFrame, airline, column):
df = DataFrame.groupby('FL_DATE')[column].sum()
df.rename('{a}_NUM_{b}'.format(a=airline, b=column), inplace=True)
return df
The following function counts the number of flights an airline as the market carrier flies and the number of cancelations that the airline has in a given day as the market carrier. Two series are created one that counts the number of flights each day and one that counts the number of cancelations each day.
# Calculates the number of flight per day and number of cancelations per day
def num_flights(DataFrame, airline):
cancelled = DataFrame.groupby('CANCELLED')
df_FL = cancelled.get_group(0)['FL_DATE'].value_counts()
df_FL.rename('{}_NUM_FL'.format(airline), inplace=True)
df_CAN = cancelled.get_group(1)['FL_DATE'].value_counts()
df_CAN.rename('{}_NUM_CAN'.format(airline), inplace=True)
return df_FL, df_CAN
# Calculates the different types of cancelations
def cancel_type(DataFrame, airline):
cancelled = DataFrame.groupby('CANCELLED')
df_CAN = cancelled.get_group(1)
df_CAN = df_CAN.drop(['MKT_UNIQUE_CARRIER', 'YEAR', 'QUARTER', 'MONTH', 'DAY_OF_MONTH',
'DAY_OF_WEEK', 'MKT_CARRIER', 'OP_UNIQUE_CARRIER',
'OP_CARRIER', 'TAIL_NUM', 'ORIGIN', 'DEST', 'CRS_DEP_TIME', 'DEP_TIME',
'DEP_DELAY', 'DEP_DELAY_NEW', 'DEP_DEL15', 'DEP_DELAY_GROUP',
'TAXI_OUT', 'WHEELS_OFF', 'WHEELS_ON', 'TAXI_IN', 'CRS_ARR_TIME',
'ARR_TIME', 'ARR_DELAY', 'ARR_DELAY_NEW', 'ARR_DEL15',
'ARR_DELAY_GROUP', 'CANCELLED', 'DIVERTED'], axis=1)
df_CAN = df_CAN.groupby('FL_DATE').sum()
return df_CAN
Using the functions above a DataFrame is created with each market carrier airline series as a column. Each of the functions below has the flight date as the index and the values indicated in the comments for each airline are the columns of the DataFrame.
# Number of Departure Delays
df_DEP_DEL = delays(df_AA, 'AA', 'DEP_DEL15')
# Number of Arrival Delays
df_ARR_DEL = delays(df_AA, 'AA', 'ARR_DEL15')
# DataFrames for Number of Flights and Number of Cancelations
df_NUM_FL, df_NUM_CAN = num_flights(df_AA, 'AA')
# Proportion of ALL flights that are canceled
df_PROP_CAN_FL = pd.DataFrame()
df_PROP_CAN_FL['AA_PROP_CAN_FL'] = df_NUM_CAN/df_NUM_FL
for airline in airline_code[1:]:
df_DEP_DEL = pd.concat([df_DEP_DEL, delays(flight_airlines[airline], airline, 'DEP_DEL15')], axis=1)
df_ARR_DEL = pd.concat([df_ARR_DEL, delays(flight_airlines[airline], airline,'ARR_DEL15')], axis=1)
num_fl, num_can = num_flights(flight_airlines[airline], airline)
prop = pd.DataFrame()
prop['{}_PROP_CAN_FL'.format(airline)] = num_can / num_fl
df_NUM_FL = pd.concat([df_NUM_FL, num_fl], axis=1)
df_NUM_CAN = pd.concat([df_NUM_CAN, num_can], axis=1)
df_PROP_CAN_FL = pd.concat([df_PROP_CAN_FL, prop], axis=1)
df_DEP_DEL = df_DEP_DEL.sort_index()
df_ARR_DEL = df_ARR_DEL.sort_index()
df_NUM_FL = df_NUM_FL.sort_index()
df_NUM_CAN = df_NUM_CAN.sort_index()
df_PROP_CAN_FL = df_PROP_CAN_FL.sort_index()
Flight Data EDA¶
The bar plot below so the total daily departure delays for all market carrier airlines. The bar plot is a stacked bar plot with various colors showing that airlines number of delays. Since departure delays often result in arrival delays the plots very simillar.
# Plotting Total Daily Depature Delays
fig, ax = plt.subplots(figsize=(14,7))
bottom = np.zeros(len(df_ARR_DEL))
for col in df_DEP_DEL.columns:
p = ax.bar(df_DEP_DEL.index.strftime('%Y-%m-%d'), df_DEP_DEL[col], label = col[0:2], bottom = bottom)
bottom += df_DEP_DEL[col]
plt.title('Total Daily Depature Delays')
plt.xlabel('Date')
plt.ylabel('Number of Departure Delays')
ax.set_xticks(ax.get_xticks()[::7])
plt.xticks(rotation=90)
ax.legend(title = 'Market Carrier', loc='upper right')
plt.show()
# Plotting Total Daily Arrival Delays
fig, ax = plt.subplots(figsize=(14,7))
bottom = np.zeros(len(df_ARR_DEL))
for col in df_ARR_DEL.columns:
p = ax.bar(df_ARR_DEL.index.strftime('%Y-%m-%d'), df_ARR_DEL[col], label = col[0:2], bottom = bottom)
bottom += df_ARR_DEL[col]
plt.title('Total Arrival Delays')
plt.xlabel('Date')
plt.ylabel('Number of Arrival Delays')
ax.set_xticks(ax.get_xticks()[::7])
plt.xticks(rotation=90)
ax.legend(title = 'Market Carrier', loc='upper right')
plt.show()
Another way to plot departure delays is to look at the delays by airline. A line plot was used to show this below.
# Plotting Market Carrier Daily Depature Delays
fig, ax = plt.subplots(figsize=(14,7))
legend_DEP_DEL = {}
for col in df_DEP_DEL.columns:
plt.plot(df_DEP_DEL.index.strftime('%Y-%m-%d'), df_DEP_DEL[col])
legend_DEP_DEL[col[0:2]] = col
plt.title('Market Carrier Daily Depature Delays')
plt.xlabel('Date')
plt.ylabel('Number of Depature Delays')
ax.set_xticks(ax.get_xticks()[::7])
plt.xticks(rotation=90)
ax.legend(legend_DEP_DEL, title = 'Market Carrier', loc='upper right')
plt.show()
The next two plots show the number of flights by airline and the number of cancelations by airline.
#Plotting Market Carrier Daily Number of Flights
fig, ax = plt.subplots(figsize=(14,7))
legend_NUM_FL= {}
for col in df_NUM_FL.columns:
plt.plot(df_NUM_FL.index.strftime('%Y-%m-%d'), df_NUM_FL[col])
legend_NUM_FL[col[0:2]] = col
plt.title('Market Carrrier Daily Number of Flights')
plt.xlabel('Date')
plt.ylabel('Number of Flights')
ax.set_xticks(ax.get_xticks()[::7])
plt.xticks(rotation=90)
ax.legend(legend_NUM_FL, title = 'Market Carrier', loc='upper right')
plt.show()
#Plotting Market Carrier Daily Number of Cancelations
fig, ax = plt.subplots(figsize=(14,7))
legend_NUM_CAN= {}
for col in df_NUM_CAN.columns:
plt.plot(df_NUM_CAN.index.strftime('%Y-%m-%d'), df_NUM_CAN[col])
legend_NUM_CAN[col[0:2]] = col
plt.title('Market Carrier Daily Number of Canelations')
plt.xlabel('Date')
plt.ylabel('Number of Canelations')
ax.set_xticks(ax.get_xticks()[::7])
plt.xticks(rotation=90)
ax.legend(legend_NUM_CAN, title = 'Market Carrier', loc='upper right')
plt.show()
# Create a Correlation Martix for the Proportion of Cancelled Flights by Market Carrier
MKT_code = {'AA':'American', 'DL':'Delta', 'WN':'Southwest',
'UA':'United', 'AS': 'Alaskan', 'NK':'Spirit',
'HA':'Hawaiian', 'G4':'Allegiant', 'B6':'JetBlue',
'F9':'Frontier'}
prop_can_corr_matrix = df_PROP_CAN_FL.corr()
label_names = []
for item in list(prop_can_corr_matrix.columns):
code = item[0:2]
name = MKT_code[code]
label_names.append(name)
cmap = sns.diverging_palette(10, 130, as_cmap=True)
plt.figure(figsize=(10, 8))
sns.heatmap(prop_can_corr_matrix, annot=True, cmap = 'Blues', vmin=0, vmax=1,
xticklabels=label_names,
yticklabels=label_names)
plt.title('Correlation Matrix of the Proportion of Cancelled Flights (by Market Carrier)')
plt.show()
Since some market carriers use operating carriers to fulfil some of their flights. Looking at the proportion of a market carriers flights that are not operated by the market carrier could help to indicate the market carriers current demands.
def inner_op_mkt(mkt_carrier, airline):
df = mkt_carrier[airline].to_frame().rename(columns={'FL_DATE':'NUM_SCH_FL'}).reset_index()
df = df.set_index('FL_DATE')
op_not_mkt = df[df['OP_UNIQUE_CARRIER']!= airline]
op_not_mkt = op_not_mkt['NUM_SCH_FL'].rename('OP_NOT_MKT').to_frame().reset_index()
op_not_mkt = op_not_mkt.groupby('FL_DATE')['OP_NOT_MKT'].sum()
op_mkt = df[df['OP_UNIQUE_CARRIER'] == airline]
op_mkt = op_mkt['NUM_SCH_FL'].rename('OP_MKT')
df_mkt = op_mkt.to_frame().join(op_not_mkt, on='FL_DATE')
df_mkt['PROP_OP_BY_MKT'] = df_mkt['OP_MKT']/(df_mkt['OP_MKT']+ df_mkt['OP_NOT_MKT'])
df_mkt = df_mkt.drop(['OP_MKT', 'OP_NOT_MKT'], axis=1)
return df_mkt
# status {0, 1}
# Flights = 0
# Canceled = 1
def all_op_mkt(status):
if status == 0 or status == 1:
flights_status = flight_data.groupby('CANCELLED')
fl_status = flights_status.get_group(status)
fl_type = {0:'FL', 1:'CAN'}
else:
return None
airline_code = ['AA', 'DL', 'WN', 'UA', 'AS', 'F9', 'HA', 'G4', 'B6', 'NK']
mkt_carrier = fl_status.groupby(['MKT_UNIQUE_CARRIER', 'OP_UNIQUE_CARRIER'])['FL_DATE'].value_counts()
df_op_mkt = inner_op_mkt(mkt_carrier, airline_code[0]).add_suffix('_{}'.format(airline_code[0]))
df_op_mkt = df_op_mkt.add_prefix('{}_'.format(fl_type[status]))
for airline in airline_code[1:]:
df_mkt = inner_op_mkt(mkt_carrier, airline)
if df_mkt['PROP_OP_BY_MKT'].isnull().all() == True:
df_mkt['PROP_OP_BY_MKT'] = 0
df_mkt = df_mkt.add_suffix('_{}'.format(airline))
else:
df_mkt = df_mkt.add_suffix('_{}'.format(airline))
df_mkt = df_mkt.add_prefix('{}_'.format(fl_type[status]))
df_op_mkt = df_op_mkt.join(df_mkt, on='FL_DATE')
df_op_mkt.dropna(how='all', axis=1, inplace=True)
df_op_mkt.fillna(0, inplace=True)
df_op_mkt = df_op_mkt.sort_index()
return df_op_mkt
FL_op_mkt = all_op_mkt(0) # only flights
CAN_op_mkt = all_op_mkt(1) # only cancelled flight
The two plots below plot the proportion of flights, and cancellations that are operated by the market carrier.
# Plotting Proportion of cancelations operated by the market carrier
fig, ax = plt.subplots(figsize=(14,7))
bottom = np.zeros(len( CAN_op_mkt))
for col in CAN_op_mkt.replace(0,np.nan).dropna(axis=1,how="all").columns:
p = ax.bar(CAN_op_mkt.index.strftime('%Y-%m-%d'), CAN_op_mkt[col], label = col[-2:], bottom = bottom)
bottom += CAN_op_mkt[col]
plt.title('Proportion of Cancellations Operated by Market Carrier')
plt.xlabel('Date')
plt.ylabel('Proportion of Cancellations')
ax.set_xticks(ax.get_xticks()[::7])
plt.xticks(rotation=90)
ax.legend(title = 'Market Carrier', loc='upper right')
plt.show()
Air Carrier Statistics (Form 41 Traffic)- U.S. Carriers¶
T-100 Domestic Segment (U.S. Carriers) Data ETL¶
This flight data was obtained from the Bureau of Transportation Statistics Air Carrier Statistics (Form 41 Traffic)- U.S. Carriers Data Base. The data table used for the following data is T-100 Domestic Segment (U.S. Carriers). This data table contains monthly domestic non-stop segment data reported by U.S. air carriers.
# Load in T-100 Domestic Segment (U.S. Carriers)
T_100_lst = glob.glob('../cmps3160-project/T-100 Domestic Segment (U.S. Carriers)/*.csv')
df_air_stats = pd.read_csv(T_100_lst[0])
for item in T_100_lst[1:]:
frame = pd.read_csv(item)
df_air_stats = pd.concat([df_air_stats, frame], ignore_index=True)
df_air_stats.head()
UNIQUE_CARRIER | DEPARTURES_SCHEDULED | DEPARTURES_PERFORMED | PAYLOAD | SEATS | PASSENGERS | FREIGHT | DISTANCE | RAMP_TO_RAMP | ... | DEST_STATE_NM | DEST_WAC | AIRCRAFT_GROUP | AIRCRAFT_TYPE | AIRCRAFT_CONFIG | YEAR | QUARTER | MONTH | DISTANCE_GROUP | CLASS | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | YV | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 24.0 | 32.0 | ... | Virginia | 38 | 6 | 673 | 1 | 2022 | 3 | 7 | 1 | F |
1 | YV | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 45.0 | 35.0 | ... | Virginia | 38 | 6 | 673 | 1 | 2022 | 3 | 7 | 1 | F |
2 | YV | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 70.0 | 52.0 | ... | Pennsylvania | 23 | 6 | 673 | 1 | 2022 | 1 | 3 | 1 | F |
3 | YV | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 83.0 | 32.0 | ... | Florida | 33 | 6 | 673 | 1 | 2022 | 2 | 5 | 1 | F |
4 | YV | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 83.0 | 52.0 | ... | North Carolina | 36 | 6 | 673 | 1 | 2022 | 2 | 6 | 1 | F |
5 rows × 45 columns
df_stat_groups = df_air_stats.groupby('UNIQUE_CARRIER')
full_seats={}
freight_weight={}
Calculate proportion of full seats
# Proportion of Full Seats
for airline in airline_code:
inner_dict = {}
yr_month_group = df_air_stats.groupby(['YEAR', 'MONTH'])
for i in yr_month_group.groups.keys():
frame = yr_month_group.get_group(i)
yr = frame['YEAR'].unique().tolist()[0]
month = frame['MONTH'].unique().tolist()[0]
month_prop_full = frame['PASSENGERS'].sum() / frame['SEATS'].sum()
inner_dict[i] = month_prop_full
full_seats[airline] = inner_dict
Calculate proportion of weight from freight and mail
# Proportion of Weight from Freight and Mail
for airline in airline_code:
inner_dict = {}
yr_month_group = df_air_stats.groupby(['YEAR', 'MONTH'])
for i in yr_month_group.groups.keys():
frame = yr_month_group.get_group(i)
yr = frame['YEAR'].unique().tolist()[0]
month = frame['MONTH'].unique().tolist()[0]
month_prop_freight = (frame['FREIGHT'].sum() + frame['MAIL'].sum())/ frame['PAYLOAD'].sum()
inner_dict[i] = month_prop_freight
freight_weight[airline] = inner_dict
# DataFrames of proportion of full seats and proportion of weight from freight and mail
df_full_seats = pd.DataFrame.from_dict(full_seats).add_suffix('_PROP_FULL_SEATS')
df_freight_weight = pd.DataFrame.from_dict(freight_weight).add_suffix('_PROP_FREIGHT/MAIL')
df_full_seats.index.rename(['YEAR', 'MONTH'], inplace=True)
df_freight_weight.index.rename(['YEAR', 'MONTH'], inplace=True)
dataframe2 = pd.DataFrame()
date_range = pd.date_range(start=df_AA['FL_DATE'].min(),end=df_AA['FL_DATE'].max())
dataframe2['FL_DATE'] = date_range
dataframe2['YEAR'] = pd.DatetimeIndex(dataframe2['FL_DATE']).year
dataframe2['MONTH'] = pd.DatetimeIndex(dataframe2['FL_DATE']).month
df_prop_full_seats = pd.merge(dataframe2, df_full_seats, on=['YEAR', 'MONTH'], validate="many_to_one")
df_prop_full_seats.drop(['YEAR', 'MONTH'], axis=1, inplace=True)
df_prop_full_seats.set_index('FL_DATE', inplace=True)
df_prop_freight_weight = pd.merge(dataframe2, df_freight_weight, on=['YEAR', 'MONTH'], validate="many_to_one")
df_prop_freight_weight.drop(['YEAR', 'MONTH'], axis=1, inplace=True)
df_prop_freight_weight.set_index('FL_DATE', inplace=True)
def T100_stat_by_airline(summ_stat_frame, type, startdate, enddate):
dataframe2 = pd.DataFrame()
date_range = pd.date_range(start=startdate,end=enddate)
dataframe2['FL_DATE'] = date_range
dataframe2['YEAR'] = pd.DatetimeIndex(dataframe2['FL_DATE']).year
dataframe2['MONTH'] = pd.DatetimeIndex(dataframe2['FL_DATE']).month
df_mean_stats = summ_stat_frame.loc[airline].add_prefix('{a}_{b}_'.format(a=airline, b=type))
df_mean_stats = pd.merge(dataframe2, df_mean_stats, on=['YEAR', 'MONTH'], validate="many_to_one")
df_mean_stats.drop(['YEAR', 'MONTH'], axis=1, inplace=True)
df_mean_stats.set_index('FL_DATE', inplace=True)
return df_mean_stats
# Dictionary of all flight metrics DataFrames
flight_metics = {'Departure' : df_DEP_DEL, 'Arrival' : df_ARR_DEL,
'Fight Count' : df_NUM_FL, 'Cancel Count': df_NUM_CAN,
'Prop Flights Operated by Market' : FL_op_mkt,
'Prop Canceled Operated by Market' : CAN_op_mkt,
'Prop Full Seats' : df_full_seats, 'Prop Freight/Mail': df_freight_weight
}
carrier_group_dict = {'1':'Regional Carriers (including Large, Medium, Commuter, Small Certified)',
'2':'National Carriers', '3':'Major Carriers'}
T100_carrier_groups = df_air_stats.groupby(['CARRIER_GROUP', 'YEAR', 'MONTH'])
T100_mean = T100_carrier_groups['DEPARTURES_SCHEDULED', 'DEPARTURES_PERFORMED', 'PAYLOAD', 'SEATS', 'PASSENGERS',
'FREIGHT', 'MAIL', 'DISTANCE', 'RAMP_TO_RAMP', 'AIR_TIME', 'DISTANCE_GROUP'].mean()
Major_Mean = T100_mean.loc[3, :].add_prefix('Major_Carriers_Mean_')
National_Mean = T100_mean.loc[2, :].add_prefix('National_Carriers_Mean_')
Regional_Mean = T100_mean.loc[2, :].add_prefix('Regional_Carriers_Mean_')
<ipython-input-54-3fba41ab82e6>:6: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead. T100_mean = T100_carrier_groups['DEPARTURES_SCHEDULED', 'DEPARTURES_PERFORMED', 'PAYLOAD', 'SEATS', 'PASSENGERS',
df_air_stats.head()
UNIQUE_CARRIER | DEPARTURES_SCHEDULED | DEPARTURES_PERFORMED | PAYLOAD | SEATS | PASSENGERS | FREIGHT | DISTANCE | RAMP_TO_RAMP | ... | DEST_STATE_NM | DEST_WAC | AIRCRAFT_GROUP | AIRCRAFT_TYPE | AIRCRAFT_CONFIG | YEAR | QUARTER | MONTH | DISTANCE_GROUP | CLASS | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | YV | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 24.0 | 32.0 | ... | Virginia | 38 | 6 | 673 | 1 | 2022 | 3 | 7 | 1 | F |
1 | YV | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 45.0 | 35.0 | ... | Virginia | 38 | 6 | 673 | 1 | 2022 | 3 | 7 | 1 | F |
2 | YV | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 70.0 | 52.0 | ... | Pennsylvania | 23 | 6 | 673 | 1 | 2022 | 1 | 3 | 1 | F |
3 | YV | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 83.0 | 32.0 | ... | Florida | 33 | 6 | 673 | 1 | 2022 | 2 | 5 | 1 | F |
4 | YV | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 83.0 | 52.0 | ... | North Carolina | 36 | 6 | 673 | 1 | 2022 | 2 | 6 | 1 | F |
5 rows × 45 columns
Combining All Data Sets by the Market Airline¶
Combining Airline Operations Performance Metrics and Stock data.
Airline Operations Performance Metrics¶
Airline Operations Performance Metrics were created using the Marketing Carrier On-Time Performance data and the T-100 Domestic Segment (U.S. Carriers) data.
df_AA_op_stat = pd.DataFrame() # american
df_DL_op_stat = pd.DataFrame() # delta
df_WN_op_stat = pd.DataFrame() # southwest
df_UA_op_stat = pd.DataFrame() # united
df_AS_op_stat = pd.DataFrame() # alaskan
df_NK_op_stat = pd.DataFrame() # spirit
df_HA_op_stat = pd.DataFrame() # hawaiian
df_G4_op_stat = pd.DataFrame() # allegiant
df_B6_op_stat = pd.DataFrame() # jetblue
df_F9_op_stat = pd.DataFrame() # frontier
airlines_operations_stats = {'AA':df_AA_op_stat, 'DL':df_DL_op_stat, 'WN':df_WN_op_stat,
'UA':df_UA_op_stat, 'AS':df_AS_op_stat, 'NK': df_NK_op_stat,
'HA':df_HA_op_stat, 'G4':df_G4_op_stat, 'B6':df_B6_op_stat,
'F9':df_F9_op_stat}
startdate = df_AA['FL_DATE'].min()
enddate = df_AA['FL_DATE'].max()
df_airline_stat = df_air_stats.groupby(['UNIQUE_CARRIER', 'YEAR', 'MONTH'])
df_mean_stats = df_airline_stat[['DEPARTURES_SCHEDULED', 'DEPARTURES_PERFORMED', 'PAYLOAD', 'SEATS', 'PASSENGERS','FREIGHT', 'MAIL', 'DISTANCE', 'RAMP_TO_RAMP', 'AIR_TIME']].mean()
df_std_stats = df_airline_stat[['DEPARTURES_SCHEDULED', 'DEPARTURES_PERFORMED', 'PAYLOAD', 'SEATS', 'PASSENGERS','FREIGHT', 'MAIL', 'DISTANCE', 'RAMP_TO_RAMP', 'AIR_TIME']].std()
df_median_stats = df_airline_stat[['DEPARTURES_SCHEDULED', 'DEPARTURES_PERFORMED', 'PAYLOAD', 'SEATS', 'PASSENGERS','FREIGHT', 'MAIL', 'DISTANCE', 'RAMP_TO_RAMP', 'AIR_TIME']].median()
from datetime import datetime
frames_lst = flight_metics.values()
for frame in frames_lst:
cols = frame.columns
#print(frame)
if frame.index.dtype =='datetime64[ns]':
frame.index.rename('FL_DATE', inplace=True)
for airline in airline_code:
air_cols = cols.str.contains(airline)
frame2 = frame.loc[:, air_cols]
airlines_operations_stats[airline] = pd.concat([airlines_operations_stats[airline], frame2], axis=1)
else:
dataframe = pd.DataFrame()
date_range = pd.date_range(start=startdate,end=enddate)
dataframe['FL_DATE'] = date_range
dataframe['YEAR'] = pd.DatetimeIndex(dataframe['FL_DATE']).year
dataframe['MONTH'] = pd.DatetimeIndex(dataframe['FL_DATE']).month
for airline in airline_code:
air_cols = cols.str.contains(airline)
frame2 = frame.loc[:, air_cols]
frame2 = pd.merge(dataframe, frame2, on=['YEAR', 'MONTH'])
frame2.drop(['YEAR', 'MONTH'], axis=1, inplace=True)
frame2.set_index('FL_DATE', inplace=True)
airlines_operations_stats[airline] = pd.concat([airlines_operations_stats[airline], frame2], axis=1)
for airline in airline_code:
frame_can_type = cancel_type(flight_airlines[airline], airline)
for item in ['CANCEL_CODE_CARRIER', 'CANCEL_CODE_WEATHER', 'CANCEL_CODE_NAS','CANCEL_CODE_SECURITY']:
if item in frame_can_type.columns:
continue
else:
frame_can_type[item] = 0
frame3 = T100_stat_by_airline(df_mean_stats, 'MEAN', startdate, enddate)
frame4 = T100_stat_by_airline(df_std_stats, 'STD', startdate, enddate)
frame5 = T100_stat_by_airline(df_median_stats, 'MEDIAN', startdate, enddate)
airlines_operations_stats[airline] = pd.concat([airlines_operations_stats[airline], frame_can_type,frame3, frame4, frame5], axis=1)
airlines_operations_stats[airline]['AIRLINE'] = airline
Stock and Airline Operations Performance Metrics¶
Merge dataframes into one
df_american = pd.DataFrame()
df_delta = pd.DataFrame()
df_southwest = pd.DataFrame()
df_united = pd.DataFrame()
df_alaskan = pd.DataFrame()
df_spirit = pd.DataFrame()
df_hawaiian = pd.DataFrame()
df_allegiant = pd.DataFrame()
df_jetblue = pd.DataFrame()
df_jets = pd.DataFrame()
df_crude_oil = pd.DataFrame()
delay_decline = {'United': df_united, 'Alaskan': df_alaskan, 'American': df_american,
'Allegiant': df_allegiant, 'JetBlue': df_jetblue, 'Hawaiian': df_hawaiian,
'Southwest': df_southwest, 'Spirit': df_spirit, 'Delta': df_delta,
'JETS' : df_jets}
#crude oil data
crude_frame = financ_OIL.reset_index()
mask = (crude_frame['Date'] >= startdate) & (crude_frame['Date'] <= enddate)
crude_frame = crude_frame.loc[mask]
crude_frame.rename(columns={'Date':'DATE'}, inplace=True)
crude_frame = crude_frame.set_index('DATE')
crude_frame = crude_frame.add_prefix('Crude Oil ')
for airline in delay_decline.keys():
#stock data
fin_frame1 = financ_airlines[airline]
fin_frame = fin_frame1.reset_index()
mask = (fin_frame['Date'] >= startdate) & (fin_frame['Date'] <= enddate)
fin_frame = fin_frame.loc[mask]
fin_frame.rename(columns={'Date':'DATE'}, inplace=True)
fin_frame = fin_frame.set_index('DATE')
if airline == 'JETS':
fin_frame = fin_frame.add_prefix('JETS ')
all_airline = pd.DataFrame()
for code in airline_code:
air_frame1 = airlines_operations_stats[code]
air_frame = air_frame1.reset_index()
air_frame.rename(columns={'FL_DATE':'DATE'}, inplace=True)
air_frame = air_frame.set_index('DATE')
all_airline = pd.concat([all_airline, all_airline], axis=1)
else:
#airline data
code = flight_airline_codes[airline]
air_frame1 = airlines_operations_stats[code]
air_frame = air_frame1.reset_index()
air_frame.rename(columns={'FL_DATE':'DATE'}, inplace=True)
air_frame = air_frame.set_index('DATE')
delay_decline[airline] = pd.concat([air_frame, fin_frame, crude_frame], axis=1)
delay_decline['Delta'].head()
DL_NUM_DEP_DEL15 | DL_NUM_ARR_DEL15 | DL_NUM_FL | DL_NUM_CAN | FL_PROP_OP_BY_MKT_DL | CAN_PROP_OP_BY_MKT_DL | DL_PROP_FULL_SEATS | DL_PROP_FREIGHT/MAIL | CANCEL_CODE_CARRIER | CANCEL_CODE_WEATHER | ... | Price Change | Trend | Prev Trend | Crude Oil Open | Crude Oil High | Crude Oil Low | Crude Oil Close | Crude Oil Adj Close | Crude Oil Volume | Crude Oil Daily Return | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DATE | |||||||||||||||||||||
2021-11-01 | 417.0 | 398.0 | 4167 | 6.0 | 0.572354 | 0.0 | 0.800704 | 0.008555 | 6.0 | 0.0 | ... | 1.289997 | up | down | 83.360001 | 84.879997 | 82.739998 | 84.050003 | 84.050003 | 412309.0 | 0.005744 |
2021-11-02 | 244.0 | 233.0 | 3812 | NaN | 0.577650 | 0.0 | 0.800704 | 0.008555 | NaN | NaN | ... | -0.029999 | down | up | 83.870003 | 84.410004 | 82.919998 | 83.910004 | 83.910004 | 429608.0 | -0.001666 |
2021-11-03 | 300.0 | 341.0 | 3837 | NaN | 0.577013 | 0.0 | 0.800704 | 0.008555 | NaN | NaN | ... | 0.850002 | up | down | 83.070000 | 83.080002 | 79.690002 | 80.860001 | 80.860001 | 546577.0 | -0.036349 |
2021-11-04 | 396.0 | 511.0 | 4247 | NaN | 0.565576 | 0.0 | 0.800704 | 0.008555 | NaN | NaN | ... | -0.250000 | down | up | 80.180000 | 83.419998 | 78.250000 | 78.809998 | 78.809998 | 701119.0 | -0.025352 |
2021-11-05 | 439.0 | 467.0 | 4273 | NaN | 0.564241 | 0.0 | 0.800704 | 0.008555 | NaN | NaN | ... | 3.299999 | up | down | 79.370003 | 81.800003 | 78.959999 | 81.269997 | 81.269997 | 531054.0 | 0.031214 |
5 rows × 60 columns
Combing all Stock and Airline Operations Performance Metrics to one DataFrame¶
financials_drop_cols = ['NET_INCOME', 'MAIL',
'TOTAL_PROPERTY', 'PROP_FREIGHT', 'PROP_BAG', 'TOTAL_CHARTER',
'CHARTER_PAX', 'CHARTER_PROP', 'TOTAL_MISC_REV', 'RES_CANCEL_FEES',
'MISC_OP_REV', 'PUB_SVC_REVENUE', 'TRANS_REVENUE', 'OP_REVENUES',
'FLYING_OPS', 'MAINTENANCE', 'PAX_SERVICE', 'AIRCFT_SERVICES',
'PROMOTION_SALES', 'GENERAL_ADMIN', 'GENERAL_SERVICES', 'DEPREC_AMORT',
'TRANS_EXPENSES', 'OP_EXPENSES', 'INTEREST_LONG_DEBT',
'INTEREST_EXP_OTH', 'FOREIGN_EX_GAINS', 'CAP_GAINS_PROP',
'CAP_GAINS_OTHER', 'OTHER_INCOME_NET', 'NON_OP_INCOME',
'INCOME_PRE_TAX', 'INCOME_TAX', 'INCOME_BEFORE_OTH', 'DISCONT_OPS',
'EXTRA_ITEMS', 'INCOME_TAX_EXTRA', 'INCOME_TAX_CREDITS',
'ACCTG_CHANGES', 'AIRLINE_ID', 'UNIQUE_CARRIER_NAME',
'CARRIER', 'CARRIER_NAME', 'UNIQUE_CARRIER_ENTITY', 'REGION',
'CARRIER_GROUP_NEW', 'CARRIER_GROUP']
fin_rep = financials.drop(financials_drop_cols, axis=1)
fin_rep_groups = fin_rep.groupby('UNIQUE_CARRIER')
grouped_income_df_new_g = grouped_income_df_new.groupby('UNIQUE_CARRIER')
dataframe2 = pd.DataFrame()
date_range = pd.date_range(start=df_AA['FL_DATE'].min(),end=df_AA['FL_DATE'].max())
dataframe2['FL_DATE'] = date_range
dataframe2['YEAR'] = pd.DatetimeIndex(dataframe2['FL_DATE']).year
dataframe2['QUARTER'] = pd.DatetimeIndex(dataframe2['FL_DATE']).quarter
all_fin_rep = pd.DataFrame()
for group in grouped_income_df_new_g.groups.keys():
fin_rep1 = grouped_income_df_new_g.get_group(group)
fin_rep1 = fin_rep1.rename(columns={'OP_PROFIT_LOSS':'Prev Quarter OP_PROFIT_LOSS'})
df_fin_rep = pd.merge(dataframe2, fin_rep1, on=['YEAR', 'QUARTER'])
df_fin_rep.drop(['YEAR', 'QUARTER'], axis=1, inplace=True)
df_fin_rep.set_index('FL_DATE', inplace=True)
all_fin_rep = pd.concat([all_fin_rep, df_fin_rep], axis=0)
all_df = pd.DataFrame()
all_fin_rep_groups = all_fin_rep.groupby('UNIQUE_CARRIER')
for item in delay_decline.keys():
if item != 'JETS':
frame1 = all_fin_rep_groups.get_group(flight_airline_codes[item])
frame = delay_decline[item]
frame = frame.reset_index()
frame.rename(columns={'DATE':'Date'}, inplace=True)
frame = frame.set_index('Date')
frame = frame.set_axis(['NUM_DEP_DEL15', 'NUM_ARR_DEL15', 'NUM_FL', 'NUM_CAN',
'FL_PROP_OP_BY_MKT', 'CAN_PROP_OP_BY_MKT', 'PROP_FULL_SEATS',
'PROP_FREIGHT/MAIL', 'CANCEL_CODE_CARRIER', 'CANCEL_CODE_WEATHER',
'CANCEL_CODE_NAS', 'CANCEL_CODE_SECURITY', 'MEAN_DEPARTURES_SCHEDULED',
'MEAN_DEPARTURES_PERFORMED', 'MEAN_PAYLOAD', 'MEAN_SEATS',
'MEAN_PASSENGERS', 'MEAN_FREIGHT', 'MEAN_MAIL',
'MEAN_DISTANCE', 'MEAN_RAMP_TO_RAMP', 'MEAN_AIR_TIME',
'STD_DEPARTURES_SCHEDULED', 'STD_DEPARTURES_PERFORMED',
'STD_PAYLOAD', 'STD_SEATS', 'STD_PASSENGERS', 'STD_FREIGHT',
'STD_MAIL', 'STD_DISTANCE', 'STD_RAMP_TO_RAMP',
'STD_AIR_TIME', 'MEDIAN_DEPARTURES_SCHEDULED',
'MEDIAN_DEPARTURES_PERFORMED', 'MEDIAN_PAYLOAD',
'MEDIAN_SEATS', 'MEDIAN_PASSENGERS', 'MEDIAN_FREIGHT',
'MEDIAN_MAIL', 'MEDIAN_DISTANCE', 'MEDIAN_RAMP_TO_RAMP',
'MEDIAN_AIR_TIME', 'AIRLINE', 'Open', 'High', 'Low', 'Close',
'Adj Close', 'Volume', 'Daily Return',
'Price Change', 'Trend','Prev Trend',
'Crude Oil Open', 'Crude Oil High', 'Crude Oil Low',
'Crude Oil Close', 'Crude Oil Adj Close', 'Crude Oil Volume',
'Crude Oil Daily Return'], axis=1)
frame_comb = pd.concat([frame, frame1], axis=1)
frame_comb = frame_comb.reset_index()
frame_comb = frame_comb.rename(columns={'index':'Date'})
all_df = pd.concat([all_df, frame_comb],ignore_index=True, axis=0)
else:
continue
all_df = all_df.drop('UNIQUE_CARRIER', axis=1)
all_df['Trend'] =all_df['Trend'].map({'up':1, 'down':0})
all_df['Prev Trend'] =all_df['Prev Trend'].map({'up':1, 'down':0})
all_df.head()
Date | NUM_DEP_DEL15 | NUM_ARR_DEL15 | NUM_FL | NUM_CAN | FL_PROP_OP_BY_MKT | CAN_PROP_OP_BY_MKT | PROP_FULL_SEATS | PROP_FREIGHT/MAIL | CANCEL_CODE_CARRIER | ... | Trend | Prev Trend | Crude Oil Open | Crude Oil High | Crude Oil Low | Crude Oil Close | Crude Oil Adj Close | Crude Oil Volume | Crude Oil Daily Return | Prev Quarter OP_PROFIT_LOSS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2021-11-01 | 845.0 | 791.0 | 3756 | 75.0 | 0.432109 | 0.146667 | 0.800704 | 0.008555 | 56.0 | ... | 1.0 | 1.0 | 83.360001 | 84.879997 | 82.739998 | 84.050003 | 84.050003 | 412309.0 | 0.005744 | -1375235.78 |
1 | 2021-11-02 | 507.0 | 478.0 | 3688 | 44.0 | 0.428688 | 0.181818 | 0.800704 | 0.008555 | 37.0 | ... | 0.0 | 1.0 | 83.870003 | 84.410004 | 82.919998 | 83.910004 | 83.910004 | 429608.0 | -0.001666 | -1375235.78 |
2 | 2021-11-03 | 627.0 | 718.0 | 3726 | 30.0 | 0.424047 | 0.300000 | 0.800704 | 0.008555 | 25.0 | ... | 1.0 | 0.0 | 83.070000 | 83.080002 | 79.690002 | 80.860001 | 80.860001 | 546577.0 | -0.036349 | -1375235.78 |
3 | 2021-11-04 | 540.0 | 609.0 | 3797 | 11.0 | 0.429550 | 0.454545 | 0.800704 | 0.008555 | 7.0 | ... | 1.0 | 1.0 | 80.180000 | 83.419998 | 78.250000 | 78.809998 | 78.809998 | 701119.0 | -0.025352 | -1375235.78 |
4 | 2021-11-05 | 420.0 | 421.0 | 3750 | 15.0 | 0.426667 | 0.266667 | 0.800704 | 0.008555 | 11.0 | ... | 1.0 | 1.0 | 79.370003 | 81.800003 | 78.959999 | 81.269997 | 81.269997 | 531054.0 | 0.031214 | -1375235.78 |
5 rows × 62 columns
Becasue we are predicting from t to t+1 we need to shift the columns back by 1 day from the orginal dataframe all_df.
all_df2 = all_df
all_df2 = all_df2.dropna()
shift_cols = ['Open', 'High',
'Low', 'Close', 'Adj Close', 'Volume', 'Daily Return', 'Price Change',
'Crude Oil Open', 'Crude Oil High',
'Crude Oil Low', 'Crude Oil Close', 'Crude Oil Adj Close',
'Crude Oil Volume', 'Crude Oil Daily Return']
for item in shift_cols:
prev = 'Prev {}'.format(item)
all_df2[prev] = all_df2[item].shift(-1)
all_df2 = all_df2.drop(shift_cols, axis=1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1) <ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy all_df2[prev] = all_df2[item].shift(-1)
all_df2.columns
data = all_df2.drop('Date', axis=1)
data = data.dropna()
data3 = data.drop(['MEAN_DEPARTURES_PERFORMED', 'MEAN_PAYLOAD', 'MEAN_SEATS',
'MEAN_PASSENGERS', 'MEAN_FREIGHT', 'MEAN_MAIL', 'MEAN_DISTANCE',
'MEAN_RAMP_TO_RAMP', 'MEAN_AIR_TIME', 'STD_DEPARTURES_SCHEDULED',
'STD_DEPARTURES_PERFORMED', 'STD_PAYLOAD', 'STD_SEATS',
'STD_PASSENGERS', 'STD_FREIGHT', 'STD_MAIL', 'STD_DISTANCE',
'STD_RAMP_TO_RAMP', 'STD_AIR_TIME', 'MEDIAN_DEPARTURES_SCHEDULED',
'MEDIAN_DEPARTURES_PERFORMED', 'MEDIAN_PAYLOAD', 'MEDIAN_SEATS',
'MEDIAN_PASSENGERS', 'MEDIAN_FREIGHT', 'MEDIAN_MAIL', 'MEDIAN_DISTANCE',
'MEDIAN_RAMP_TO_RAMP', 'MEDIAN_AIR_TIME'], axis=1)
data_groups = data3.groupby('AIRLINE')
X = data_groups.get_group('HA').drop(['AIRLINE', 'Trend'], axis=1)
y = data_groups.get_group('HA')['Trend']
Creating a TSNE¶
TSNE using only stock and finacial data¶
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import TSNE
TSNE_fin_data = all_df.drop(['Date','NUM_DEP_DEL15', 'NUM_ARR_DEL15', 'NUM_FL', 'NUM_CAN',
'FL_PROP_OP_BY_MKT', 'CAN_PROP_OP_BY_MKT', 'PROP_FULL_SEATS',
'PROP_FREIGHT/MAIL', 'CANCEL_CODE_CARRIER', 'CANCEL_CODE_WEATHER',
'CANCEL_CODE_NAS', 'CANCEL_CODE_SECURITY', 'MEAN_DEPARTURES_SCHEDULED',
'MEAN_DEPARTURES_PERFORMED', 'MEAN_PAYLOAD', 'MEAN_SEATS',
'MEAN_PASSENGERS', 'MEAN_FREIGHT', 'MEAN_MAIL',
'MEAN_DISTANCE', 'MEAN_RAMP_TO_RAMP', 'MEAN_AIR_TIME',
'STD_DEPARTURES_SCHEDULED', 'STD_DEPARTURES_PERFORMED',
'STD_PAYLOAD', 'STD_SEATS', 'STD_PASSENGERS', 'STD_FREIGHT',
'STD_MAIL', 'STD_DISTANCE', 'STD_RAMP_TO_RAMP',
'STD_AIR_TIME', 'MEDIAN_DEPARTURES_SCHEDULED',
'MEDIAN_DEPARTURES_PERFORMED', 'MEDIAN_PAYLOAD',
'MEDIAN_SEATS', 'MEDIAN_PASSENGERS', 'MEDIAN_FREIGHT',
'MEDIAN_MAIL', 'MEDIAN_DISTANCE', 'MEDIAN_RAMP_TO_RAMP',
'MEDIAN_AIR_TIME'], axis=1)
TSNE_fin_data = TSNE_fin_data.dropna()
X_T = TSNE_fin_data.drop('AIRLINE', axis=1)
y_T = TSNE_fin_data['AIRLINE']
X_T_std = StandardScaler().fit_transform(X_T)
n_components = 2
tsne = TSNE(n_components)
X_tsne = tsne.fit_transform(X_T_std)
X_tsne_data = np.vstack((X_tsne.T, y_T)).T
df_tsne = pd.DataFrame(X_tsne_data, columns=['Dim1', 'Dim2', 'AIRLINE'])
plt.figure(figsize=(8, 8))
sns.scatterplot(data=df_tsne, hue='AIRLINE', x='Dim1', y='Dim2').set(title='TSNE Stock and Financial Data')
plt.show()
TNSE using all data¶
TSNE_data = all_df.drop('Date', axis=1)
TSNE_data = TSNE_data.dropna()
X_T = TSNE_data.drop('AIRLINE', axis=1)
y_T = TSNE_data['AIRLINE']
X_T_std = StandardScaler().fit_transform(X_T)
n_components = 2
tsne = TSNE(n_components)
X_tsne = tsne.fit_transform(X_T_std)
X_tsne_data = np.vstack((X_tsne.T, y_T)).T
df_tsne = pd.DataFrame(X_tsne_data, columns=['Dim1', 'Dim2', 'AIRLINE'])
plt.figure(figsize=(8, 8))
sns.scatterplot(data=df_tsne, hue='AIRLINE', x='Dim1', y='Dim2').set(title='TSNE Flight, Stock, and Financial Data')
plt.show()
Models¶
all_df_airline_groups = all_df.groupby('AIRLINE')
Logistic Regression¶
from sklearn.model_selection import train_test_split
def train_test_split_airline(airline, rs):
frame = all_df_airline_groups.get_group(airline)
frame = frame.dropna()
X = frame.drop(['Trend','Date', 'AIRLINE'], axis=1)
y = frame['Trend']
X_std = StandardScaler().fit_transform(X)
# using the train test split function
X_std_train, X_std_test, y_train, y_test = train_test_split(X_std,y , random_state=rs,test_size=0.25, shuffle=True)
return X_std_train, X_std_test, y_train, y_test
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import mean_squared_error
df_logistic = pd.DataFrame()
for rs in [42, 104, 35]:
air_lst = []
logistic_acc = []
logistic_mse = []
for airline in all_df_airline_groups.groups.keys():
air_lst.append(airline)
X_std_train, X_std_test, y_train, y_test = train_test_split_airline(airline, rs)
# all parameters not specified are set to their defaults
logisticRegr = LogisticRegression()
# fit model on training data
logisticRegr.fit(X_std_train, y_train)
# predict the testing data
y_pred = logisticRegr.predict(X_std_test)
# accuracy
score = logisticRegr.score(X_std_test, y_test)
logistic_acc.append(score)
# MSE
mse = mean_squared_error(y_test, y_pred)
logistic_mse.append(mse)
RS_col = 'Random State {}'.format(rs)
df_l = pd.DataFrame({'Airline':air_lst, 'Accuracy':logistic_acc, 'MSE':logistic_mse})
df_l = df_l.set_index('Airline')
l_cols=[(RS_col,'Accuracy'),(RS_col,'MSE')]
df_l.columns=pd.MultiIndex.from_tuples(l_cols)
df_logistic = pd.concat([df_logistic, df_l], axis=1)
df_logistic
Random State 42 | Random State 104 | Random State 35 | ||||
---|---|---|---|---|---|---|
Accuracy | MSE | Accuracy | MSE | Accuracy | MSE | |
Airline | ||||||
AA | 0.987500 | 0.012500 | 0.962500 | 0.037500 | 0.925000 | 0.075000 |
AS | 0.948718 | 0.051282 | 0.961538 | 0.038462 | 0.974359 | 0.025641 |
B6 | 0.971014 | 0.028986 | 0.942029 | 0.057971 | 0.985507 | 0.014493 |
DL | 0.925373 | 0.074627 | 0.910448 | 0.089552 | 0.970149 | 0.029851 |
G4 | 0.958333 | 0.041667 | 0.958333 | 0.041667 | 0.895833 | 0.104167 |
HA | 0.966667 | 0.033333 | 0.900000 | 0.100000 | 0.933333 | 0.066667 |
NK | 0.949153 | 0.050847 | 0.864407 | 0.135593 | 0.983051 | 0.016949 |
UA | 0.912500 | 0.087500 | 0.912500 | 0.087500 | 0.912500 | 0.087500 |
WN | 0.950000 | 0.050000 | 0.950000 | 0.050000 | 0.875000 | 0.125000 |
Above we appeared to have high accuracy and MSE but this was due to data leakage where it was predicting the outcome of the day using that days information. Below we used the shifted dataframe and obtained much lower results.
def train_test_split_trend(airline, rs):
frame = data_groups.get_group(airline)
frame = frame.dropna()
X = frame.drop(['Trend', 'AIRLINE'], axis=1)
y = frame['Trend']
X_std = StandardScaler().fit_transform(X)
# using the train test split function
X_std_train, X_std_test, y_train, y_test = train_test_split(X_std,y , random_state=rs,test_size=0.25, shuffle=True)
return X_std_train, X_std_test, y_train, y_test
df_logistic = pd.DataFrame()
for rs in [42, 104, 35]:
air_lst = []
logistic_acc = []
logistic_mse = []
for airline in data_groups.groups.keys():
air_lst.append(airline)
X_std_train, X_std_test, y_train, y_test = train_test_split_trend(airline, rs)
# all parameters not specified are set to their defaults
logisticRegr = LogisticRegression()
# fit model on training data
logisticRegr.fit(X_std_train, y_train)
# Predict the testing data
y_pred = logisticRegr.predict(X_std_test)
# Accuracy
score = logisticRegr.score(X_std_test, y_test)
logistic_acc.append(score)
# MSE
mse = mean_squared_error(y_test, y_pred)
logistic_mse.append(mse)
RS_col = 'Random State {}'.format(rs)
df_l = pd.DataFrame({'Airline':air_lst, 'Accuracy':logistic_acc, 'MSE':logistic_mse})
df_l = df_l.set_index('Airline')
l_cols=[(RS_col,'Accuracy'),(RS_col,'MSE')]
df_l.columns=pd.MultiIndex.from_tuples(l_cols)
df_logistic = pd.concat([df_logistic, df_l], axis=1)
df_logistic
Random State 42 | Random State 104 | Random State 35 | ||||
---|---|---|---|---|---|---|
Accuracy | MSE | Accuracy | MSE | Accuracy | MSE | |
Airline | ||||||
AA | 0.612500 | 0.387500 | 0.550000 | 0.450000 | 0.575000 | 0.425000 |
AS | 0.487179 | 0.512821 | 0.500000 | 0.500000 | 0.576923 | 0.423077 |
B6 | 0.449275 | 0.550725 | 0.507246 | 0.492754 | 0.434783 | 0.565217 |
DL | 0.424242 | 0.575758 | 0.409091 | 0.590909 | 0.424242 | 0.575758 |
G4 | 0.458333 | 0.541667 | 0.520833 | 0.479167 | 0.562500 | 0.437500 |
HA | 0.700000 | 0.300000 | 0.666667 | 0.333333 | 0.566667 | 0.433333 |
NK | 0.610169 | 0.389831 | 0.491525 | 0.508475 | 0.559322 | 0.440678 |
UA | 0.425000 | 0.575000 | 0.525000 | 0.475000 | 0.487500 | 0.512500 |
WN | 0.475000 | 0.525000 | 0.500000 | 0.500000 | 0.525000 | 0.475000 |
The results above do not provide us with any additional information to make predictions on whether or not the stock price will go up or down.
Multivariate LSTM¶
As a result of the logistic regression to not work as well aswe had hoped. We decided to change into a time series model. Since the model uses past data there was no need to use the shifted dataframe and therefore we were able to use the all_df dataframe.
lstm_g = all_df.groupby('AIRLINE')
# prepare data for lstm
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
# convert series to supervised learning
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = pd.DataFrame(data)
cols, names = list(), list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
else:
names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
# put it all together
agg = pd.concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg
from sklearn.preprocessing import MinMaxScaler
def preprocessing_lstm(airline):
# load dataset
dataset = lstm_g.get_group(airline)
dataset = dataset.set_index('Date')
dataset = dataset.drop('AIRLINE', axis=1)
values = dataset.values
# ensure all data is float
values = values.astype('float32')
# train the normalization
scaler = MinMaxScaler()
scaler = scaler.fit(values)
# normalize the dataset
normalized = scaler.transform(values)
# frame as supervised learning
reframed = series_to_supervised(normalized, 1, 1)
# drop columns we don't want to predict
reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)
# split into train and test sets
values = reframed.values
n_train_days = int(values.shape[0] // 2.5)
train = values[:n_train_days, :]
test = values[n_train_days:, :]
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
return test_X, test_y, train_X, train_y
from matplotlib import pyplot
def lstm_plot(yhat, history, test_X, test_y, airline, loss_type):
# plot history
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pltitle = '{}: Train and Test Loss from the Multivariate LSTM During Training'.format(airline)
pyplot.title(pltitle)
pyplot.xlabel('Epoch')
ylab = 'Train and Test Loss ({})'.format(loss_type)
pyplot.ylabel(ylab)
pyplot.legend()
pyplot.show()
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
# yhat
yhat = np.concatenate((yhat, test_X[:, 1:]), axis=1)
yhat = yhat[:,0]
# actual
test_y = test_y.reshape((len(test_y), 1))
y = np.concatenate((test_y, test_X[:, 1:]), axis=1)
y = y[:,0]
# calculate RMSE
rmse = np.sqrt(mean_squared_error(y, yhat))
print('Test RMSE: %.3f' % rmse)
return rmse
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
def lstm_model(train_X, train_y, test_X, test_y, n_epochs, n_batch, loss_type):
# design network
model = Sequential()
model.add(LSTM(n_epochs, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss=loss_type, optimizer='adam')
# fit network
history = model.fit(train_X, train_y, epochs=n_epochs, batch_size=n_batch, validation_data=(test_X, test_y), verbose=2, shuffle=False)
# make a prediction
yhat = model.predict(test_X)
return yhat, history
def lstm_run(airline, n_epochs, n_batch, loss_type):
test_X, test_y, train_X, train_y = preprocessing_lstm(airline)
yhat, history = lstm_model(train_X, train_y, test_X, test_y, n_epochs, n_batch, loss_type)
rsme = lstm_plot(yhat, history, test_X, test_y, airline, loss_type)
return rsme
lstm_rsme_dict50 = {}
for airline in lstm_g.groups.keys():
print(airline)
n_epochs = 50
n_batch = 72
loss_type = 'mean_squared_error'
model_rsme = lstm_run(airline, n_epochs, n_batch, loss_type)
feat_lst = [model_rsme, n_epochs, n_batch]
lstm_rsme_dict50[airline] = feat_lst
AA Epoch 1/50 2/2 - 3s - loss: 0.2751 - val_loss: 0.3122 - 3s/epoch - 1s/step Epoch 2/50 2/2 - 0s - loss: 0.1069 - val_loss: 0.1265 - 41ms/epoch - 20ms/step Epoch 3/50 2/2 - 0s - loss: 0.0763 - val_loss: 0.0549 - 41ms/epoch - 20ms/step Epoch 4/50 2/2 - 0s - loss: 0.1156 - val_loss: 0.0407 - 42ms/epoch - 21ms/step Epoch 5/50 2/2 - 0s - loss: 0.1410 - val_loss: 0.0427 - 44ms/epoch - 22ms/step Epoch 6/50 2/2 - 0s - loss: 0.1293 - val_loss: 0.0548 - 39ms/epoch - 19ms/step Epoch 7/50 2/2 - 0s - loss: 0.1008 - val_loss: 0.0793 - 40ms/epoch - 20ms/step Epoch 8/50 2/2 - 0s - loss: 0.0766 - val_loss: 0.1125 - 38ms/epoch - 19ms/step Epoch 9/50 2/2 - 0s - loss: 0.0645 - val_loss: 0.1450 - 40ms/epoch - 20ms/step Epoch 10/50 2/2 - 0s - loss: 0.0618 - val_loss: 0.1674 - 39ms/epoch - 19ms/step Epoch 11/50 2/2 - 0s - loss: 0.0620 - val_loss: 0.1748 - 43ms/epoch - 22ms/step Epoch 12/50 2/2 - 0s - loss: 0.0604 - val_loss: 0.1682 - 43ms/epoch - 22ms/step Epoch 13/50 2/2 - 0s - loss: 0.0563 - val_loss: 0.1524 - 44ms/epoch - 22ms/step Epoch 14/50 2/2 - 0s - loss: 0.0513 - val_loss: 0.1339 - 40ms/epoch - 20ms/step Epoch 15/50 2/2 - 0s - loss: 0.0475 - val_loss: 0.1179 - 42ms/epoch - 21ms/step Epoch 16/50 2/2 - 0s - loss: 0.0459 - val_loss: 0.1076 - 42ms/epoch - 21ms/step Epoch 17/50 2/2 - 0s - loss: 0.0454 - val_loss: 0.1040 - 44ms/epoch - 22ms/step Epoch 18/50 2/2 - 0s - loss: 0.0442 - val_loss: 0.1068 - 42ms/epoch - 21ms/step Epoch 19/50 2/2 - 0s - loss: 0.0417 - val_loss: 0.1150 - 52ms/epoch - 26ms/step Epoch 20/50 2/2 - 0s - loss: 0.0383 - val_loss: 0.1268 - 38ms/epoch - 19ms/step Epoch 21/50 2/2 - 0s - loss: 0.0355 - val_loss: 0.1396 - 41ms/epoch - 21ms/step Epoch 22/50 2/2 - 0s - loss: 0.0338 - val_loss: 0.1502 - 40ms/epoch - 20ms/step Epoch 23/50 2/2 - 0s - loss: 0.0326 - val_loss: 0.1559 - 40ms/epoch - 20ms/step Epoch 24/50 2/2 - 0s - loss: 0.0312 - val_loss: 0.1562 - 39ms/epoch - 20ms/step Epoch 25/50 2/2 - 0s - loss: 0.0291 - val_loss: 0.1521 - 40ms/epoch - 20ms/step Epoch 26/50 2/2 - 0s - loss: 0.0268 - val_loss: 0.1463 - 41ms/epoch - 20ms/step Epoch 27/50 2/2 - 0s - loss: 0.0246 - val_loss: 0.1415 - 42ms/epoch - 21ms/step Epoch 28/50 2/2 - 0s - loss: 0.0228 - val_loss: 0.1396 - 40ms/epoch - 20ms/step Epoch 29/50 2/2 - 0s - loss: 0.0213 - val_loss: 0.1414 - 39ms/epoch - 19ms/step Epoch 30/50 2/2 - 0s - loss: 0.0196 - val_loss: 0.1465 - 40ms/epoch - 20ms/step Epoch 31/50 2/2 - 0s - loss: 0.0180 - val_loss: 0.1534 - 38ms/epoch - 19ms/step Epoch 32/50 2/2 - 0s - loss: 0.0166 - val_loss: 0.1601 - 42ms/epoch - 21ms/step Epoch 33/50 2/2 - 0s - loss: 0.0154 - val_loss: 0.1645 - 39ms/epoch - 20ms/step Epoch 34/50 2/2 - 0s - loss: 0.0142 - val_loss: 0.1660 - 43ms/epoch - 22ms/step Epoch 35/50 2/2 - 0s - loss: 0.0130 - val_loss: 0.1650 - 39ms/epoch - 20ms/step Epoch 36/50 2/2 - 0s - loss: 0.0118 - val_loss: 0.1635 - 39ms/epoch - 20ms/step Epoch 37/50 2/2 - 0s - loss: 0.0108 - val_loss: 0.1631 - 42ms/epoch - 21ms/step Epoch 38/50 2/2 - 0s - loss: 0.0099 - val_loss: 0.1651 - 39ms/epoch - 19ms/step Epoch 39/50 2/2 - 0s - loss: 0.0090 - val_loss: 0.1693 - 40ms/epoch - 20ms/step Epoch 40/50 2/2 - 0s - loss: 0.0082 - val_loss: 0.1745 - 40ms/epoch - 20ms/step Epoch 41/50 2/2 - 0s - loss: 0.0075 - val_loss: 0.1791 - 39ms/epoch - 20ms/step Epoch 42/50 2/2 - 0s - loss: 0.0070 - val_loss: 0.1817 - 42ms/epoch - 21ms/step Epoch 43/50 2/2 - 0s - loss: 0.0064 - val_loss: 0.1825 - 40ms/epoch - 20ms/step Epoch 44/50 2/2 - 0s - loss: 0.0059 - val_loss: 0.1825 - 41ms/epoch - 20ms/step Epoch 45/50 2/2 - 0s - loss: 0.0055 - val_loss: 0.1832 - 41ms/epoch - 20ms/step Epoch 46/50 2/2 - 0s - loss: 0.0051 - val_loss: 0.1853 - 41ms/epoch - 20ms/step Epoch 47/50 2/2 - 0s - loss: 0.0048 - val_loss: 0.1885 - 39ms/epoch - 20ms/step Epoch 48/50 2/2 - 0s - loss: 0.0045 - val_loss: 0.1917 - 41ms/epoch - 21ms/step Epoch 49/50 2/2 - 0s - loss: 0.0042 - val_loss: 0.1937 - 41ms/epoch - 20ms/step Epoch 50/50 2/2 - 0s - loss: 0.0040 - val_loss: 0.1943 - 39ms/epoch - 19ms/step 5/5 [==============================] - 0s 3ms/step
Test RMSE: 0.441 AS Epoch 1/50 2/2 - 3s - loss: 0.2292 - val_loss: 0.2681 - 3s/epoch - 1s/step Epoch 2/50 2/2 - 0s - loss: 0.1045 - val_loss: 0.1762 - 38ms/epoch - 19ms/step Epoch 3/50 2/2 - 0s - loss: 0.1041 - val_loss: 0.1437 - 39ms/epoch - 20ms/step Epoch 4/50 2/2 - 0s - loss: 0.1507 - val_loss: 0.1359 - 40ms/epoch - 20ms/step Epoch 5/50 2/2 - 0s - loss: 0.1747 - val_loss: 0.1336 - 38ms/epoch - 19ms/step Epoch 6/50 2/2 - 0s - loss: 0.1632 - val_loss: 0.1348 - 38ms/epoch - 19ms/step Epoch 7/50 2/2 - 0s - loss: 0.1343 - val_loss: 0.1413 - 40ms/epoch - 20ms/step Epoch 8/50 2/2 - 0s - loss: 0.1060 - val_loss: 0.1521 - 40ms/epoch - 20ms/step Epoch 9/50 2/2 - 0s - loss: 0.0866 - val_loss: 0.1633 - 40ms/epoch - 20ms/step Epoch 10/50 2/2 - 0s - loss: 0.0760 - val_loss: 0.1709 - 39ms/epoch - 20ms/step Epoch 11/50 2/2 - 0s - loss: 0.0703 - val_loss: 0.1723 - 39ms/epoch - 20ms/step Epoch 12/50 2/2 - 0s - loss: 0.0661 - val_loss: 0.1675 - 36ms/epoch - 18ms/step Epoch 13/50 2/2 - 0s - loss: 0.0617 - val_loss: 0.1586 - 40ms/epoch - 20ms/step Epoch 14/50 2/2 - 0s - loss: 0.0576 - val_loss: 0.1483 - 42ms/epoch - 21ms/step Epoch 15/50 2/2 - 0s - loss: 0.0548 - val_loss: 0.1392 - 41ms/epoch - 20ms/step Epoch 16/50 2/2 - 0s - loss: 0.0535 - val_loss: 0.1327 - 39ms/epoch - 19ms/step Epoch 17/50 2/2 - 0s - loss: 0.0528 - val_loss: 0.1289 - 38ms/epoch - 19ms/step Epoch 18/50 2/2 - 0s - loss: 0.0509 - val_loss: 0.1278 - 40ms/epoch - 20ms/step Epoch 19/50 2/2 - 0s - loss: 0.0470 - val_loss: 0.1288 - 39ms/epoch - 20ms/step Epoch 20/50 2/2 - 0s - loss: 0.0417 - val_loss: 0.1315 - 38ms/epoch - 19ms/step Epoch 21/50 2/2 - 0s - loss: 0.0364 - val_loss: 0.1351 - 36ms/epoch - 18ms/step Epoch 22/50 2/2 - 0s - loss: 0.0324 - val_loss: 0.1382 - 39ms/epoch - 19ms/step Epoch 23/50 2/2 - 0s - loss: 0.0298 - val_loss: 0.1393 - 40ms/epoch - 20ms/step Epoch 24/50 2/2 - 0s - loss: 0.0277 - val_loss: 0.1381 - 39ms/epoch - 19ms/step Epoch 25/50 2/2 - 0s - loss: 0.0255 - val_loss: 0.1353 - 40ms/epoch - 20ms/step Epoch 26/50 2/2 - 0s - loss: 0.0234 - val_loss: 0.1323 - 40ms/epoch - 20ms/step Epoch 27/50 2/2 - 0s - loss: 0.0216 - val_loss: 0.1305 - 39ms/epoch - 20ms/step Epoch 28/50 2/2 - 0s - loss: 0.0201 - val_loss: 0.1305 - 41ms/epoch - 20ms/step Epoch 29/50 2/2 - 0s - loss: 0.0187 - val_loss: 0.1322 - 39ms/epoch - 19ms/step Epoch 30/50 2/2 - 0s - loss: 0.0173 - val_loss: 0.1351 - 39ms/epoch - 20ms/step Epoch 31/50 2/2 - 0s - loss: 0.0163 - val_loss: 0.1379 - 39ms/epoch - 19ms/step Epoch 32/50 2/2 - 0s - loss: 0.0157 - val_loss: 0.1393 - 39ms/epoch - 19ms/step Epoch 33/50 2/2 - 0s - loss: 0.0150 - val_loss: 0.1391 - 43ms/epoch - 21ms/step Epoch 34/50 2/2 - 0s - loss: 0.0139 - val_loss: 0.1381 - 36ms/epoch - 18ms/step Epoch 35/50 2/2 - 0s - loss: 0.0127 - val_loss: 0.1373 - 40ms/epoch - 20ms/step Epoch 36/50 2/2 - 0s - loss: 0.0117 - val_loss: 0.1377 - 38ms/epoch - 19ms/step Epoch 37/50 2/2 - 0s - loss: 0.0109 - val_loss: 0.1393 - 40ms/epoch - 20ms/step Epoch 38/50 2/2 - 0s - loss: 0.0102 - val_loss: 0.1417 - 38ms/epoch - 19ms/step Epoch 39/50 2/2 - 0s - loss: 0.0096 - val_loss: 0.1436 - 40ms/epoch - 20ms/step Epoch 40/50 2/2 - 0s - loss: 0.0091 - val_loss: 0.1445 - 41ms/epoch - 20ms/step Epoch 41/50 2/2 - 0s - loss: 0.0085 - val_loss: 0.1445 - 39ms/epoch - 20ms/step Epoch 42/50 2/2 - 0s - loss: 0.0078 - val_loss: 0.1442 - 41ms/epoch - 20ms/step Epoch 43/50 2/2 - 0s - loss: 0.0072 - val_loss: 0.1446 - 43ms/epoch - 21ms/step Epoch 44/50 2/2 - 0s - loss: 0.0067 - val_loss: 0.1458 - 41ms/epoch - 20ms/step Epoch 45/50 2/2 - 0s - loss: 0.0063 - val_loss: 0.1476 - 42ms/epoch - 21ms/step Epoch 46/50 2/2 - 0s - loss: 0.0059 - val_loss: 0.1492 - 40ms/epoch - 20ms/step Epoch 47/50 2/2 - 0s - loss: 0.0056 - val_loss: 0.1501 - 40ms/epoch - 20ms/step Epoch 48/50 2/2 - 0s - loss: 0.0053 - val_loss: 0.1504 - 40ms/epoch - 20ms/step Epoch 49/50 2/2 - 0s - loss: 0.0050 - val_loss: 0.1506 - 42ms/epoch - 21ms/step Epoch 50/50 2/2 - 0s - loss: 0.0047 - val_loss: 0.1513 - 41ms/epoch - 21ms/step 5/5 [==============================] - 0s 3ms/step
Test RMSE: 0.389 B6 Epoch 1/50 2/2 - 3s - loss: 0.3249 - val_loss: 0.1976 - 3s/epoch - 1s/step Epoch 2/50 2/2 - 0s - loss: 0.1353 - val_loss: 0.0787 - 40ms/epoch - 20ms/step Epoch 3/50 2/2 - 0s - loss: 0.0551 - val_loss: 0.0593 - 44ms/epoch - 22ms/step Epoch 4/50 2/2 - 0s - loss: 0.0619 - val_loss: 0.0906 - 38ms/epoch - 19ms/step Epoch 5/50 2/2 - 0s - loss: 0.1049 - val_loss: 0.1106 - 37ms/epoch - 18ms/step Epoch 6/50 2/2 - 0s - loss: 0.1269 - val_loss: 0.1013 - 40ms/epoch - 20ms/step Epoch 7/50 2/2 - 0s - loss: 0.1154 - val_loss: 0.0795 - 39ms/epoch - 20ms/step Epoch 8/50 2/2 - 0s - loss: 0.0880 - val_loss: 0.0639 - 41ms/epoch - 21ms/step Epoch 9/50 2/2 - 0s - loss: 0.0630 - val_loss: 0.0612 - 40ms/epoch - 20ms/step Epoch 10/50 2/2 - 0s - loss: 0.0487 - val_loss: 0.0679 - 40ms/epoch - 20ms/step Epoch 11/50 2/2 - 0s - loss: 0.0442 - val_loss: 0.0764 - 41ms/epoch - 20ms/step Epoch 12/50 2/2 - 0s - loss: 0.0444 - val_loss: 0.0811 - 40ms/epoch - 20ms/step Epoch 13/50 2/2 - 0s - loss: 0.0446 - val_loss: 0.0801 - 40ms/epoch - 20ms/step Epoch 14/50 2/2 - 0s - loss: 0.0432 - val_loss: 0.0753 - 42ms/epoch - 21ms/step Epoch 15/50 2/2 - 0s - loss: 0.0408 - val_loss: 0.0699 - 40ms/epoch - 20ms/step Epoch 16/50 2/2 - 0s - loss: 0.0392 - val_loss: 0.0663 - 40ms/epoch - 20ms/step Epoch 17/50 2/2 - 0s - loss: 0.0394 - val_loss: 0.0652 - 41ms/epoch - 20ms/step Epoch 18/50 2/2 - 0s - loss: 0.0410 - val_loss: 0.0659 - 41ms/epoch - 20ms/step Epoch 19/50 2/2 - 0s - loss: 0.0428 - val_loss: 0.0667 - 42ms/epoch - 21ms/step Epoch 20/50 2/2 - 0s - loss: 0.0431 - val_loss: 0.0670 - 41ms/epoch - 21ms/step Epoch 21/50 2/2 - 0s - loss: 0.0414 - val_loss: 0.0671 - 41ms/epoch - 21ms/step Epoch 22/50 2/2 - 0s - loss: 0.0386 - val_loss: 0.0678 - 40ms/epoch - 20ms/step Epoch 23/50 2/2 - 0s - loss: 0.0357 - val_loss: 0.0692 - 40ms/epoch - 20ms/step Epoch 24/50 2/2 - 0s - loss: 0.0335 - val_loss: 0.0711 - 41ms/epoch - 21ms/step Epoch 25/50 2/2 - 0s - loss: 0.0321 - val_loss: 0.0728 - 43ms/epoch - 22ms/step Epoch 26/50 2/2 - 0s - loss: 0.0313 - val_loss: 0.0737 - 41ms/epoch - 21ms/step Epoch 27/50 2/2 - 0s - loss: 0.0305 - val_loss: 0.0738 - 40ms/epoch - 20ms/step Epoch 28/50 2/2 - 0s - loss: 0.0297 - val_loss: 0.0733 - 40ms/epoch - 20ms/step Epoch 29/50 2/2 - 0s - loss: 0.0289 - val_loss: 0.0728 - 45ms/epoch - 23ms/step Epoch 30/50 2/2 - 0s - loss: 0.0283 - val_loss: 0.0726 - 40ms/epoch - 20ms/step Epoch 31/50 2/2 - 0s - loss: 0.0279 - val_loss: 0.0728 - 41ms/epoch - 20ms/step Epoch 32/50 2/2 - 0s - loss: 0.0273 - val_loss: 0.0733 - 39ms/epoch - 20ms/step Epoch 33/50 2/2 - 0s - loss: 0.0266 - val_loss: 0.0742 - 39ms/epoch - 19ms/step Epoch 34/50 2/2 - 0s - loss: 0.0257 - val_loss: 0.0755 - 42ms/epoch - 21ms/step Epoch 35/50 2/2 - 0s - loss: 0.0247 - val_loss: 0.0770 - 40ms/epoch - 20ms/step Epoch 36/50 2/2 - 0s - loss: 0.0238 - val_loss: 0.0786 - 38ms/epoch - 19ms/step Epoch 37/50 2/2 - 0s - loss: 0.0230 - val_loss: 0.0799 - 37ms/epoch - 18ms/step Epoch 38/50 2/2 - 0s - loss: 0.0223 - val_loss: 0.0808 - 38ms/epoch - 19ms/step Epoch 39/50 2/2 - 0s - loss: 0.0216 - val_loss: 0.0813 - 41ms/epoch - 20ms/step Epoch 40/50 2/2 - 0s - loss: 0.0210 - val_loss: 0.0817 - 38ms/epoch - 19ms/step Epoch 41/50 2/2 - 0s - loss: 0.0203 - val_loss: 0.0821 - 38ms/epoch - 19ms/step Epoch 42/50 2/2 - 0s - loss: 0.0197 - val_loss: 0.0827 - 40ms/epoch - 20ms/step Epoch 43/50 2/2 - 0s - loss: 0.0192 - val_loss: 0.0838 - 41ms/epoch - 20ms/step Epoch 44/50 2/2 - 0s - loss: 0.0186 - val_loss: 0.0851 - 42ms/epoch - 21ms/step Epoch 45/50 2/2 - 0s - loss: 0.0180 - val_loss: 0.0867 - 42ms/epoch - 21ms/step Epoch 46/50 2/2 - 0s - loss: 0.0174 - val_loss: 0.0883 - 40ms/epoch - 20ms/step Epoch 47/50 2/2 - 0s - loss: 0.0169 - val_loss: 0.0898 - 41ms/epoch - 21ms/step Epoch 48/50 2/2 - 0s - loss: 0.0164 - val_loss: 0.0910 - 40ms/epoch - 20ms/step Epoch 49/50 2/2 - 0s - loss: 0.0159 - val_loss: 0.0919 - 54ms/epoch - 27ms/step Epoch 50/50 2/2 - 0s - loss: 0.0154 - val_loss: 0.0928 - 39ms/epoch - 19ms/step 4/4 [==============================] - 0s 4ms/step
Test RMSE: 0.305 DL Epoch 1/50 1/1 - 2s - loss: 0.7080 - val_loss: 0.5901 - 2s/epoch - 2s/step Epoch 2/50 1/1 - 0s - loss: 0.5182 - val_loss: 0.4108 - 37ms/epoch - 37ms/step Epoch 3/50 1/1 - 0s - loss: 0.3629 - val_loss: 0.2714 - 36ms/epoch - 36ms/step Epoch 4/50 1/1 - 0s - loss: 0.2427 - val_loss: 0.1711 - 36ms/epoch - 36ms/step Epoch 5/50 1/1 - 0s - loss: 0.1571 - val_loss: 0.1077 - 37ms/epoch - 37ms/step Epoch 6/50 1/1 - 0s - loss: 0.1044 - val_loss: 0.0770 - 38ms/epoch - 38ms/step Epoch 7/50 1/1 - 0s - loss: 0.0812 - val_loss: 0.0725 - 37ms/epoch - 37ms/step Epoch 8/50 1/1 - 0s - loss: 0.0817 - val_loss: 0.0845 - 36ms/epoch - 36ms/step Epoch 9/50 1/1 - 0s - loss: 0.0971 - val_loss: 0.1023 - 37ms/epoch - 37ms/step Epoch 10/50 1/1 - 0s - loss: 0.1175 - val_loss: 0.1172 - 37ms/epoch - 37ms/step Epoch 11/50 1/1 - 0s - loss: 0.1346 - val_loss: 0.1246 - 37ms/epoch - 37ms/step Epoch 12/50 1/1 - 0s - loss: 0.1439 - val_loss: 0.1235 - 38ms/epoch - 38ms/step Epoch 13/50 1/1 - 0s - loss: 0.1445 - val_loss: 0.1156 - 37ms/epoch - 37ms/step Epoch 14/50 1/1 - 0s - loss: 0.1376 - val_loss: 0.1034 - 37ms/epoch - 37ms/step Epoch 15/50 1/1 - 0s - loss: 0.1257 - val_loss: 0.0895 - 35ms/epoch - 35ms/step Epoch 16/50 1/1 - 0s - loss: 0.1114 - val_loss: 0.0764 - 36ms/epoch - 36ms/step Epoch 17/50 1/1 - 0s - loss: 0.0969 - val_loss: 0.0656 - 37ms/epoch - 37ms/step Epoch 18/50 1/1 - 0s - loss: 0.0839 - val_loss: 0.0582 - 36ms/epoch - 36ms/step Epoch 19/50 1/1 - 0s - loss: 0.0736 - val_loss: 0.0545 - 38ms/epoch - 38ms/step Epoch 20/50 1/1 - 0s - loss: 0.0665 - val_loss: 0.0543 - 37ms/epoch - 37ms/step Epoch 21/50 1/1 - 0s - loss: 0.0624 - val_loss: 0.0570 - 37ms/epoch - 37ms/step Epoch 22/50 1/1 - 0s - loss: 0.0610 - val_loss: 0.0615 - 36ms/epoch - 36ms/step Epoch 23/50 1/1 - 0s - loss: 0.0614 - val_loss: 0.0670 - 39ms/epoch - 39ms/step Epoch 24/50 1/1 - 0s - loss: 0.0631 - val_loss: 0.0725 - 34ms/epoch - 34ms/step Epoch 25/50 1/1 - 0s - loss: 0.0652 - val_loss: 0.0774 - 37ms/epoch - 37ms/step Epoch 26/50 1/1 - 0s - loss: 0.0671 - val_loss: 0.0809 - 37ms/epoch - 37ms/step Epoch 27/50 1/1 - 0s - loss: 0.0684 - val_loss: 0.0830 - 36ms/epoch - 36ms/step Epoch 28/50 1/1 - 0s - loss: 0.0688 - val_loss: 0.0833 - 36ms/epoch - 36ms/step Epoch 29/50 1/1 - 0s - loss: 0.0683 - val_loss: 0.0821 - 38ms/epoch - 38ms/step Epoch 30/50 1/1 - 0s - loss: 0.0669 - val_loss: 0.0795 - 38ms/epoch - 38ms/step Epoch 31/50 1/1 - 0s - loss: 0.0647 - val_loss: 0.0759 - 37ms/epoch - 37ms/step Epoch 32/50 1/1 - 0s - loss: 0.0622 - val_loss: 0.0716 - 39ms/epoch - 39ms/step Epoch 33/50 1/1 - 0s - loss: 0.0594 - val_loss: 0.0669 - 37ms/epoch - 37ms/step Epoch 34/50 1/1 - 0s - loss: 0.0567 - val_loss: 0.0623 - 35ms/epoch - 35ms/step Epoch 35/50 1/1 - 0s - loss: 0.0543 - val_loss: 0.0579 - 36ms/epoch - 36ms/step Epoch 36/50 1/1 - 0s - loss: 0.0524 - val_loss: 0.0539 - 35ms/epoch - 35ms/step Epoch 37/50 1/1 - 0s - loss: 0.0510 - val_loss: 0.0505 - 36ms/epoch - 36ms/step Epoch 38/50 1/1 - 0s - loss: 0.0501 - val_loss: 0.0478 - 36ms/epoch - 36ms/step Epoch 39/50 1/1 - 0s - loss: 0.0497 - val_loss: 0.0456 - 36ms/epoch - 36ms/step Epoch 40/50 1/1 - 0s - loss: 0.0496 - val_loss: 0.0439 - 34ms/epoch - 34ms/step Epoch 41/50 1/1 - 0s - loss: 0.0497 - val_loss: 0.0427 - 38ms/epoch - 38ms/step Epoch 42/50 1/1 - 0s - loss: 0.0498 - val_loss: 0.0419 - 36ms/epoch - 36ms/step Epoch 43/50 1/1 - 0s - loss: 0.0498 - val_loss: 0.0414 - 37ms/epoch - 37ms/step Epoch 44/50 1/1 - 0s - loss: 0.0496 - val_loss: 0.0412 - 39ms/epoch - 39ms/step Epoch 45/50 1/1 - 0s - loss: 0.0492 - val_loss: 0.0413 - 38ms/epoch - 38ms/step Epoch 46/50 1/1 - 0s - loss: 0.0485 - val_loss: 0.0417 - 37ms/epoch - 37ms/step Epoch 47/50 1/1 - 0s - loss: 0.0477 - val_loss: 0.0423 - 38ms/epoch - 38ms/step Epoch 48/50 1/1 - 0s - loss: 0.0468 - val_loss: 0.0433 - 37ms/epoch - 37ms/step Epoch 49/50 1/1 - 0s - loss: 0.0458 - val_loss: 0.0446 - 35ms/epoch - 35ms/step Epoch 50/50 1/1 - 0s - loss: 0.0449 - val_loss: 0.0462 - 37ms/epoch - 37ms/step 4/4 [==============================] - 0s 3ms/step
Test RMSE: 0.215 G4 Epoch 1/50 1/1 - 6s - loss: 0.0633 - val_loss: 0.3544 - 6s/epoch - 6s/step Epoch 2/50 1/1 - 0s - loss: 0.0267 - val_loss: 0.2877 - 38ms/epoch - 38ms/step Epoch 3/50 1/1 - 0s - loss: 0.0121 - val_loss: 0.2459 - 38ms/epoch - 38ms/step Epoch 4/50 1/1 - 0s - loss: 0.0137 - val_loss: 0.2254 - 36ms/epoch - 36ms/step Epoch 5/50 1/1 - 0s - loss: 0.0206 - val_loss: 0.2189 - 36ms/epoch - 36ms/step Epoch 6/50 1/1 - 0s - loss: 0.0241 - val_loss: 0.2210 - 36ms/epoch - 36ms/step Epoch 7/50 1/1 - 0s - loss: 0.0225 - val_loss: 0.2294 - 35ms/epoch - 35ms/step Epoch 8/50 1/1 - 0s - loss: 0.0178 - val_loss: 0.2429 - 37ms/epoch - 37ms/step Epoch 9/50 1/1 - 0s - loss: 0.0126 - val_loss: 0.2605 - 37ms/epoch - 37ms/step Epoch 10/50 1/1 - 0s - loss: 0.0088 - val_loss: 0.2804 - 38ms/epoch - 38ms/step Epoch 11/50 1/1 - 0s - loss: 0.0071 - val_loss: 0.3001 - 38ms/epoch - 38ms/step Epoch 12/50 1/1 - 0s - loss: 0.0073 - val_loss: 0.3172 - 40ms/epoch - 40ms/step Epoch 13/50 1/1 - 0s - loss: 0.0085 - val_loss: 0.3295 - 38ms/epoch - 38ms/step Epoch 14/50 1/1 - 0s - loss: 0.0097 - val_loss: 0.3357 - 36ms/epoch - 36ms/step Epoch 15/50 1/1 - 0s - loss: 0.0102 - val_loss: 0.3359 - 37ms/epoch - 37ms/step Epoch 16/50 1/1 - 0s - loss: 0.0096 - val_loss: 0.3308 - 38ms/epoch - 38ms/step Epoch 17/50 1/1 - 0s - loss: 0.0083 - val_loss: 0.3221 - 38ms/epoch - 38ms/step Epoch 18/50 1/1 - 0s - loss: 0.0067 - val_loss: 0.3112 - 36ms/epoch - 36ms/step Epoch 19/50 1/1 - 0s - loss: 0.0052 - val_loss: 0.2997 - 38ms/epoch - 38ms/step Epoch 20/50 1/1 - 0s - loss: 0.0043 - val_loss: 0.2891 - 36ms/epoch - 36ms/step Epoch 21/50 1/1 - 0s - loss: 0.0040 - val_loss: 0.2802 - 39ms/epoch - 39ms/step Epoch 22/50 1/1 - 0s - loss: 0.0041 - val_loss: 0.2738 - 36ms/epoch - 36ms/step Epoch 23/50 1/1 - 0s - loss: 0.0045 - val_loss: 0.2701 - 36ms/epoch - 36ms/step Epoch 24/50 1/1 - 0s - loss: 0.0048 - val_loss: 0.2691 - 38ms/epoch - 38ms/step Epoch 25/50 1/1 - 0s - loss: 0.0048 - val_loss: 0.2709 - 36ms/epoch - 36ms/step Epoch 26/50 1/1 - 0s - loss: 0.0045 - val_loss: 0.2750 - 38ms/epoch - 38ms/step Epoch 27/50 1/1 - 0s - loss: 0.0039 - val_loss: 0.2811 - 38ms/epoch - 38ms/step Epoch 28/50 1/1 - 0s - loss: 0.0032 - val_loss: 0.2885 - 37ms/epoch - 37ms/step Epoch 29/50 1/1 - 0s - loss: 0.0027 - val_loss: 0.2964 - 36ms/epoch - 36ms/step Epoch 30/50 1/1 - 0s - loss: 0.0024 - val_loss: 0.3040 - 36ms/epoch - 36ms/step Epoch 31/50 1/1 - 0s - loss: 0.0023 - val_loss: 0.3105 - 36ms/epoch - 36ms/step Epoch 32/50 1/1 - 0s - loss: 0.0024 - val_loss: 0.3153 - 34ms/epoch - 34ms/step Epoch 33/50 1/1 - 0s - loss: 0.0025 - val_loss: 0.3178 - 37ms/epoch - 37ms/step Epoch 34/50 1/1 - 0s - loss: 0.0025 - val_loss: 0.3179 - 36ms/epoch - 36ms/step Epoch 35/50 1/1 - 0s - loss: 0.0024 - val_loss: 0.3158 - 34ms/epoch - 34ms/step Epoch 36/50 1/1 - 0s - loss: 0.0022 - val_loss: 0.3119 - 37ms/epoch - 37ms/step Epoch 37/50 1/1 - 0s - loss: 0.0019 - val_loss: 0.3070 - 35ms/epoch - 35ms/step Epoch 38/50 1/1 - 0s - loss: 0.0016 - val_loss: 0.3017 - 36ms/epoch - 36ms/step Epoch 39/50 1/1 - 0s - loss: 0.0014 - val_loss: 0.2966 - 37ms/epoch - 37ms/step Epoch 40/50 1/1 - 0s - loss: 0.0014 - val_loss: 0.2925 - 37ms/epoch - 37ms/step Epoch 41/50 1/1 - 0s - loss: 0.0014 - val_loss: 0.2897 - 42ms/epoch - 42ms/step Epoch 42/50 1/1 - 0s - loss: 0.0014 - val_loss: 0.2885 - 37ms/epoch - 37ms/step Epoch 43/50 1/1 - 0s - loss: 0.0014 - val_loss: 0.2888 - 36ms/epoch - 36ms/step Epoch 44/50 1/1 - 0s - loss: 0.0014 - val_loss: 0.2905 - 36ms/epoch - 36ms/step Epoch 45/50 1/1 - 0s - loss: 0.0012 - val_loss: 0.2934 - 37ms/epoch - 37ms/step Epoch 46/50 1/1 - 0s - loss: 0.0011 - val_loss: 0.2971 - 36ms/epoch - 36ms/step Epoch 47/50 1/1 - 0s - loss: 9.9558e-04 - val_loss: 0.3009 - 34ms/epoch - 34ms/step Epoch 48/50 1/1 - 0s - loss: 9.3669e-04 - val_loss: 0.3045 - 34ms/epoch - 34ms/step Epoch 49/50 1/1 - 0s - loss: 9.2858e-04 - val_loss: 0.3072 - 37ms/epoch - 37ms/step Epoch 50/50 1/1 - 0s - loss: 9.4507e-04 - val_loss: 0.3088 - 34ms/epoch - 34ms/step 2/2 [==============================] - 0s 5ms/step
Test RMSE: 0.556 HA Epoch 1/50 1/1 - 3s - loss: 0.3960 - val_loss: 0.6918 - 3s/epoch - 3s/step Epoch 2/50 1/1 - 0s - loss: 0.2906 - val_loss: 0.5265 - 36ms/epoch - 36ms/step Epoch 3/50 1/1 - 0s - loss: 0.2107 - val_loss: 0.3928 - 34ms/epoch - 34ms/step Epoch 4/50 1/1 - 0s - loss: 0.1557 - val_loss: 0.2902 - 37ms/epoch - 37ms/step Epoch 5/50 1/1 - 0s - loss: 0.1241 - val_loss: 0.2173 - 33ms/epoch - 33ms/step Epoch 6/50 1/1 - 0s - loss: 0.1126 - val_loss: 0.1706 - 35ms/epoch - 35ms/step Epoch 7/50 1/1 - 0s - loss: 0.1155 - val_loss: 0.1443 - 36ms/epoch - 36ms/step Epoch 8/50 1/1 - 0s - loss: 0.1252 - val_loss: 0.1310 - 34ms/epoch - 34ms/step Epoch 9/50 1/1 - 0s - loss: 0.1345 - val_loss: 0.1251 - 37ms/epoch - 37ms/step Epoch 10/50 1/1 - 0s - loss: 0.1388 - val_loss: 0.1235 - 35ms/epoch - 35ms/step Epoch 11/50 1/1 - 0s - loss: 0.1368 - val_loss: 0.1252 - 34ms/epoch - 34ms/step Epoch 12/50 1/1 - 0s - loss: 0.1295 - val_loss: 0.1305 - 35ms/epoch - 35ms/step Epoch 13/50 1/1 - 0s - loss: 0.1187 - val_loss: 0.1401 - 38ms/epoch - 38ms/step Epoch 14/50 1/1 - 0s - loss: 0.1068 - val_loss: 0.1543 - 34ms/epoch - 34ms/step Epoch 15/50 1/1 - 0s - loss: 0.0955 - val_loss: 0.1728 - 35ms/epoch - 35ms/step Epoch 16/50 1/1 - 0s - loss: 0.0860 - val_loss: 0.1951 - 36ms/epoch - 36ms/step Epoch 17/50 1/1 - 0s - loss: 0.0791 - val_loss: 0.2197 - 36ms/epoch - 36ms/step Epoch 18/50 1/1 - 0s - loss: 0.0746 - val_loss: 0.2451 - 35ms/epoch - 35ms/step Epoch 19/50 1/1 - 0s - loss: 0.0722 - val_loss: 0.2698 - 34ms/epoch - 34ms/step Epoch 20/50 1/1 - 0s - loss: 0.0712 - val_loss: 0.2922 - 38ms/epoch - 38ms/step Epoch 21/50 1/1 - 0s - loss: 0.0707 - val_loss: 0.3110 - 35ms/epoch - 35ms/step Epoch 22/50 1/1 - 0s - loss: 0.0702 - val_loss: 0.3252 - 35ms/epoch - 35ms/step Epoch 23/50 1/1 - 0s - loss: 0.0690 - val_loss: 0.3344 - 35ms/epoch - 35ms/step Epoch 24/50 1/1 - 0s - loss: 0.0668 - val_loss: 0.3384 - 35ms/epoch - 35ms/step Epoch 25/50 1/1 - 0s - loss: 0.0636 - val_loss: 0.3376 - 35ms/epoch - 35ms/step Epoch 26/50 1/1 - 0s - loss: 0.0596 - val_loss: 0.3326 - 37ms/epoch - 37ms/step Epoch 27/50 1/1 - 0s - loss: 0.0551 - val_loss: 0.3244 - 36ms/epoch - 36ms/step Epoch 28/50 1/1 - 0s - loss: 0.0505 - val_loss: 0.3141 - 36ms/epoch - 36ms/step Epoch 29/50 1/1 - 0s - loss: 0.0463 - val_loss: 0.3030 - 35ms/epoch - 35ms/step Epoch 30/50 1/1 - 0s - loss: 0.0427 - val_loss: 0.2923 - 37ms/epoch - 37ms/step Epoch 31/50 1/1 - 0s - loss: 0.0400 - val_loss: 0.2831 - 36ms/epoch - 36ms/step Epoch 32/50 1/1 - 0s - loss: 0.0381 - val_loss: 0.2765 - 37ms/epoch - 37ms/step Epoch 33/50 1/1 - 0s - loss: 0.0367 - val_loss: 0.2733 - 38ms/epoch - 38ms/step Epoch 34/50 1/1 - 0s - loss: 0.0357 - val_loss: 0.2742 - 37ms/epoch - 37ms/step Epoch 35/50 1/1 - 0s - loss: 0.0346 - val_loss: 0.2796 - 40ms/epoch - 40ms/step Epoch 36/50 1/1 - 0s - loss: 0.0331 - val_loss: 0.2897 - 36ms/epoch - 36ms/step Epoch 37/50 1/1 - 0s - loss: 0.0313 - val_loss: 0.3043 - 37ms/epoch - 37ms/step Epoch 38/50 1/1 - 0s - loss: 0.0291 - val_loss: 0.3233 - 37ms/epoch - 37ms/step Epoch 39/50 1/1 - 0s - loss: 0.0268 - val_loss: 0.3458 - 36ms/epoch - 36ms/step Epoch 40/50 1/1 - 0s - loss: 0.0246 - val_loss: 0.3707 - 34ms/epoch - 34ms/step Epoch 41/50 1/1 - 0s - loss: 0.0228 - val_loss: 0.3968 - 33ms/epoch - 33ms/step Epoch 42/50 1/1 - 0s - loss: 0.0215 - val_loss: 0.4223 - 36ms/epoch - 36ms/step Epoch 43/50 1/1 - 0s - loss: 0.0207 - val_loss: 0.4455 - 36ms/epoch - 36ms/step Epoch 44/50 1/1 - 0s - loss: 0.0201 - val_loss: 0.4651 - 35ms/epoch - 35ms/step Epoch 45/50 1/1 - 0s - loss: 0.0196 - val_loss: 0.4799 - 36ms/epoch - 36ms/step Epoch 46/50 1/1 - 0s - loss: 0.0190 - val_loss: 0.4894 - 39ms/epoch - 39ms/step Epoch 47/50 1/1 - 0s - loss: 0.0181 - val_loss: 0.4939 - 39ms/epoch - 39ms/step Epoch 48/50 1/1 - 0s - loss: 0.0171 - val_loss: 0.4941 - 36ms/epoch - 36ms/step Epoch 49/50 1/1 - 0s - loss: 0.0161 - val_loss: 0.4913 - 39ms/epoch - 39ms/step Epoch 50/50 1/1 - 0s - loss: 0.0151 - val_loss: 0.4872 - 36ms/epoch - 36ms/step 1/1 [==============================] - 0s 423ms/step
Test RMSE: 0.698 NK Epoch 1/50 1/1 - 3s - loss: 0.5388 - val_loss: 0.3626 - 3s/epoch - 3s/step Epoch 2/50 1/1 - 0s - loss: 0.4263 - val_loss: 0.2820 - 37ms/epoch - 37ms/step Epoch 3/50 1/1 - 0s - loss: 0.3296 - val_loss: 0.2246 - 37ms/epoch - 37ms/step Epoch 4/50 1/1 - 0s - loss: 0.2484 - val_loss: 0.1904 - 41ms/epoch - 41ms/step Epoch 5/50 1/1 - 0s - loss: 0.1828 - val_loss: 0.1785 - 37ms/epoch - 37ms/step Epoch 6/50 1/1 - 0s - loss: 0.1323 - val_loss: 0.1868 - 37ms/epoch - 37ms/step Epoch 7/50 1/1 - 0s - loss: 0.0965 - val_loss: 0.2116 - 35ms/epoch - 35ms/step Epoch 8/50 1/1 - 0s - loss: 0.0741 - val_loss: 0.2472 - 37ms/epoch - 37ms/step Epoch 9/50 1/1 - 0s - loss: 0.0634 - val_loss: 0.2868 - 36ms/epoch - 36ms/step Epoch 10/50 1/1 - 0s - loss: 0.0618 - val_loss: 0.3231 - 35ms/epoch - 35ms/step Epoch 11/50 1/1 - 0s - loss: 0.0662 - val_loss: 0.3503 - 38ms/epoch - 38ms/step Epoch 12/50 1/1 - 0s - loss: 0.0731 - val_loss: 0.3650 - 35ms/epoch - 35ms/step Epoch 13/50 1/1 - 0s - loss: 0.0796 - val_loss: 0.3663 - 35ms/epoch - 35ms/step Epoch 14/50 1/1 - 0s - loss: 0.0837 - val_loss: 0.3554 - 38ms/epoch - 38ms/step Epoch 15/50 1/1 - 0s - loss: 0.0844 - val_loss: 0.3350 - 43ms/epoch - 43ms/step Epoch 16/50 1/1 - 0s - loss: 0.0817 - val_loss: 0.3083 - 42ms/epoch - 42ms/step Epoch 17/50 1/1 - 0s - loss: 0.0761 - val_loss: 0.2785 - 40ms/epoch - 40ms/step Epoch 18/50 1/1 - 0s - loss: 0.0685 - val_loss: 0.2485 - 40ms/epoch - 40ms/step Epoch 19/50 1/1 - 0s - loss: 0.0600 - val_loss: 0.2204 - 38ms/epoch - 38ms/step Epoch 20/50 1/1 - 0s - loss: 0.0514 - val_loss: 0.1960 - 38ms/epoch - 38ms/step Epoch 21/50 1/1 - 0s - loss: 0.0436 - val_loss: 0.1761 - 37ms/epoch - 37ms/step Epoch 22/50 1/1 - 0s - loss: 0.0370 - val_loss: 0.1611 - 38ms/epoch - 38ms/step Epoch 23/50 1/1 - 0s - loss: 0.0319 - val_loss: 0.1507 - 38ms/epoch - 38ms/step Epoch 24/50 1/1 - 0s - loss: 0.0284 - val_loss: 0.1445 - 39ms/epoch - 39ms/step Epoch 25/50 1/1 - 0s - loss: 0.0261 - val_loss: 0.1418 - 41ms/epoch - 41ms/step Epoch 26/50 1/1 - 0s - loss: 0.0250 - val_loss: 0.1417 - 42ms/epoch - 42ms/step Epoch 27/50 1/1 - 0s - loss: 0.0247 - val_loss: 0.1435 - 40ms/epoch - 40ms/step Epoch 28/50 1/1 - 0s - loss: 0.0248 - val_loss: 0.1463 - 40ms/epoch - 40ms/step Epoch 29/50 1/1 - 0s - loss: 0.0250 - val_loss: 0.1495 - 39ms/epoch - 39ms/step Epoch 30/50 1/1 - 0s - loss: 0.0251 - val_loss: 0.1527 - 40ms/epoch - 40ms/step Epoch 31/50 1/1 - 0s - loss: 0.0250 - val_loss: 0.1555 - 38ms/epoch - 38ms/step Epoch 32/50 1/1 - 0s - loss: 0.0245 - val_loss: 0.1577 - 41ms/epoch - 41ms/step Epoch 33/50 1/1 - 0s - loss: 0.0236 - val_loss: 0.1590 - 37ms/epoch - 37ms/step Epoch 34/50 1/1 - 0s - loss: 0.0224 - val_loss: 0.1596 - 38ms/epoch - 38ms/step Epoch 35/50 1/1 - 0s - loss: 0.0209 - val_loss: 0.1595 - 39ms/epoch - 39ms/step Epoch 36/50 1/1 - 0s - loss: 0.0192 - val_loss: 0.1587 - 36ms/epoch - 36ms/step Epoch 37/50 1/1 - 0s - loss: 0.0175 - val_loss: 0.1574 - 37ms/epoch - 37ms/step Epoch 38/50 1/1 - 0s - loss: 0.0159 - val_loss: 0.1559 - 37ms/epoch - 37ms/step Epoch 39/50 1/1 - 0s - loss: 0.0145 - val_loss: 0.1541 - 36ms/epoch - 36ms/step Epoch 40/50 1/1 - 0s - loss: 0.0134 - val_loss: 0.1524 - 37ms/epoch - 37ms/step Epoch 41/50 1/1 - 0s - loss: 0.0126 - val_loss: 0.1508 - 38ms/epoch - 38ms/step Epoch 42/50 1/1 - 0s - loss: 0.0120 - val_loss: 0.1495 - 37ms/epoch - 37ms/step Epoch 43/50 1/1 - 0s - loss: 0.0117 - val_loss: 0.1486 - 37ms/epoch - 37ms/step Epoch 44/50 1/1 - 0s - loss: 0.0116 - val_loss: 0.1480 - 39ms/epoch - 39ms/step Epoch 45/50 1/1 - 0s - loss: 0.0116 - val_loss: 0.1480 - 38ms/epoch - 38ms/step Epoch 46/50 1/1 - 0s - loss: 0.0116 - val_loss: 0.1484 - 39ms/epoch - 39ms/step Epoch 47/50 1/1 - 0s - loss: 0.0116 - val_loss: 0.1495 - 40ms/epoch - 40ms/step Epoch 48/50 1/1 - 0s - loss: 0.0114 - val_loss: 0.1511 - 37ms/epoch - 37ms/step Epoch 49/50 1/1 - 0s - loss: 0.0112 - val_loss: 0.1533 - 36ms/epoch - 36ms/step Epoch 50/50 1/1 - 0s - loss: 0.0109 - val_loss: 0.1561 - 35ms/epoch - 35ms/step 3/3 [==============================] - 0s 4ms/step
Test RMSE: 0.395 UA Epoch 1/50 2/2 - 3s - loss: 0.3827 - val_loss: 0.4592 - 3s/epoch - 1s/step Epoch 2/50 2/2 - 0s - loss: 0.2386 - val_loss: 0.2808 - 41ms/epoch - 21ms/step Epoch 3/50 2/2 - 0s - loss: 0.1957 - val_loss: 0.1700 - 39ms/epoch - 20ms/step Epoch 4/50 2/2 - 0s - loss: 0.2138 - val_loss: 0.1119 - 42ms/epoch - 21ms/step Epoch 5/50 2/2 - 0s - loss: 0.2531 - val_loss: 0.0866 - 40ms/epoch - 20ms/step Epoch 6/50 2/2 - 0s - loss: 0.2792 - val_loss: 0.0776 - 42ms/epoch - 21ms/step Epoch 7/50 2/2 - 0s - loss: 0.2791 - val_loss: 0.0770 - 41ms/epoch - 21ms/step Epoch 8/50 2/2 - 0s - loss: 0.2582 - val_loss: 0.0819 - 41ms/epoch - 20ms/step Epoch 9/50 2/2 - 0s - loss: 0.2283 - val_loss: 0.0911 - 44ms/epoch - 22ms/step Epoch 10/50 2/2 - 0s - loss: 0.1990 - val_loss: 0.1024 - 44ms/epoch - 22ms/step Epoch 11/50 2/2 - 0s - loss: 0.1749 - val_loss: 0.1132 - 42ms/epoch - 21ms/step Epoch 12/50 2/2 - 0s - loss: 0.1569 - val_loss: 0.1209 - 43ms/epoch - 22ms/step Epoch 13/50 2/2 - 0s - loss: 0.1439 - val_loss: 0.1237 - 41ms/epoch - 21ms/step Epoch 14/50 2/2 - 0s - loss: 0.1340 - val_loss: 0.1211 - 43ms/epoch - 22ms/step Epoch 15/50 2/2 - 0s - loss: 0.1260 - val_loss: 0.1138 - 42ms/epoch - 21ms/step Epoch 16/50 2/2 - 0s - loss: 0.1192 - val_loss: 0.1035 - 44ms/epoch - 22ms/step Epoch 17/50 2/2 - 0s - loss: 0.1138 - val_loss: 0.0920 - 39ms/epoch - 20ms/step Epoch 18/50 2/2 - 0s - loss: 0.1096 - val_loss: 0.0812 - 49ms/epoch - 24ms/step Epoch 19/50 2/2 - 0s - loss: 0.1067 - val_loss: 0.0724 - 40ms/epoch - 20ms/step Epoch 20/50 2/2 - 0s - loss: 0.1042 - val_loss: 0.0661 - 43ms/epoch - 21ms/step Epoch 21/50 2/2 - 0s - loss: 0.1011 - val_loss: 0.0622 - 43ms/epoch - 22ms/step Epoch 22/50 2/2 - 0s - loss: 0.0964 - val_loss: 0.0603 - 43ms/epoch - 22ms/step Epoch 23/50 2/2 - 0s - loss: 0.0898 - val_loss: 0.0600 - 56ms/epoch - 28ms/step Epoch 24/50 2/2 - 0s - loss: 0.0817 - val_loss: 0.0606 - 41ms/epoch - 20ms/step Epoch 25/50 2/2 - 0s - loss: 0.0732 - val_loss: 0.0617 - 41ms/epoch - 21ms/step Epoch 26/50 2/2 - 0s - loss: 0.0652 - val_loss: 0.0624 - 45ms/epoch - 23ms/step Epoch 27/50 2/2 - 0s - loss: 0.0584 - val_loss: 0.0623 - 42ms/epoch - 21ms/step Epoch 28/50 2/2 - 0s - loss: 0.0529 - val_loss: 0.0611 - 42ms/epoch - 21ms/step Epoch 29/50 2/2 - 0s - loss: 0.0484 - val_loss: 0.0593 - 55ms/epoch - 27ms/step Epoch 30/50 2/2 - 0s - loss: 0.0448 - val_loss: 0.0574 - 42ms/epoch - 21ms/step Epoch 31/50 2/2 - 0s - loss: 0.0421 - val_loss: 0.0560 - 42ms/epoch - 21ms/step Epoch 32/50 2/2 - 0s - loss: 0.0399 - val_loss: 0.0554 - 40ms/epoch - 20ms/step Epoch 33/50 2/2 - 0s - loss: 0.0376 - val_loss: 0.0554 - 41ms/epoch - 20ms/step Epoch 34/50 2/2 - 0s - loss: 0.0351 - val_loss: 0.0559 - 41ms/epoch - 20ms/step Epoch 35/50 2/2 - 0s - loss: 0.0324 - val_loss: 0.0566 - 42ms/epoch - 21ms/step Epoch 36/50 2/2 - 0s - loss: 0.0298 - val_loss: 0.0573 - 42ms/epoch - 21ms/step Epoch 37/50 2/2 - 0s - loss: 0.0276 - val_loss: 0.0577 - 41ms/epoch - 20ms/step Epoch 38/50 2/2 - 0s - loss: 0.0259 - val_loss: 0.0577 - 39ms/epoch - 19ms/step Epoch 39/50 2/2 - 0s - loss: 0.0247 - val_loss: 0.0574 - 48ms/epoch - 24ms/step Epoch 40/50 2/2 - 0s - loss: 0.0238 - val_loss: 0.0570 - 43ms/epoch - 21ms/step Epoch 41/50 2/2 - 0s - loss: 0.0232 - val_loss: 0.0568 - 41ms/epoch - 20ms/step Epoch 42/50 2/2 - 0s - loss: 0.0225 - val_loss: 0.0568 - 40ms/epoch - 20ms/step Epoch 43/50 2/2 - 0s - loss: 0.0217 - val_loss: 0.0570 - 41ms/epoch - 21ms/step Epoch 44/50 2/2 - 0s - loss: 0.0209 - val_loss: 0.0571 - 49ms/epoch - 24ms/step Epoch 45/50 2/2 - 0s - loss: 0.0200 - val_loss: 0.0572 - 41ms/epoch - 20ms/step Epoch 46/50 2/2 - 0s - loss: 0.0193 - val_loss: 0.0570 - 42ms/epoch - 21ms/step Epoch 47/50 2/2 - 0s - loss: 0.0188 - val_loss: 0.0567 - 43ms/epoch - 22ms/step Epoch 48/50 2/2 - 0s - loss: 0.0184 - val_loss: 0.0564 - 43ms/epoch - 21ms/step Epoch 49/50 2/2 - 0s - loss: 0.0180 - val_loss: 0.0561 - 42ms/epoch - 21ms/step Epoch 50/50 2/2 - 0s - loss: 0.0176 - val_loss: 0.0560 - 43ms/epoch - 22ms/step 5/5 [==============================] - 0s 3ms/step
Test RMSE: 0.237 WN Epoch 1/50 2/2 - 2s - loss: 0.5623 - val_loss: 0.1521 - 2s/epoch - 1s/step Epoch 2/50 2/2 - 0s - loss: 0.3468 - val_loss: 0.1256 - 43ms/epoch - 22ms/step Epoch 3/50 2/2 - 0s - loss: 0.2179 - val_loss: 0.1608 - 41ms/epoch - 21ms/step Epoch 4/50 2/2 - 0s - loss: 0.1609 - val_loss: 0.2223 - 43ms/epoch - 21ms/step Epoch 5/50 2/2 - 0s - loss: 0.1494 - val_loss: 0.2649 - 45ms/epoch - 22ms/step Epoch 6/50 2/2 - 0s - loss: 0.1509 - val_loss: 0.2706 - 42ms/epoch - 21ms/step Epoch 7/50 2/2 - 0s - loss: 0.1479 - val_loss: 0.2484 - 41ms/epoch - 20ms/step Epoch 8/50 2/2 - 0s - loss: 0.1389 - val_loss: 0.2140 - 42ms/epoch - 21ms/step Epoch 9/50 2/2 - 0s - loss: 0.1291 - val_loss: 0.1796 - 41ms/epoch - 21ms/step Epoch 10/50 2/2 - 0s - loss: 0.1227 - val_loss: 0.1515 - 43ms/epoch - 22ms/step Epoch 11/50 2/2 - 0s - loss: 0.1208 - val_loss: 0.1314 - 43ms/epoch - 21ms/step Epoch 12/50 2/2 - 0s - loss: 0.1220 - val_loss: 0.1186 - 43ms/epoch - 22ms/step Epoch 13/50 2/2 - 0s - loss: 0.1240 - val_loss: 0.1110 - 43ms/epoch - 22ms/step Epoch 14/50 2/2 - 0s - loss: 0.1247 - val_loss: 0.1069 - 43ms/epoch - 21ms/step Epoch 15/50 2/2 - 0s - loss: 0.1228 - val_loss: 0.1055 - 45ms/epoch - 23ms/step Epoch 16/50 2/2 - 0s - loss: 0.1186 - val_loss: 0.1063 - 43ms/epoch - 21ms/step Epoch 17/50 2/2 - 0s - loss: 0.1128 - val_loss: 0.1091 - 43ms/epoch - 22ms/step Epoch 18/50 2/2 - 0s - loss: 0.1066 - val_loss: 0.1134 - 43ms/epoch - 22ms/step Epoch 19/50 2/2 - 0s - loss: 0.1012 - val_loss: 0.1185 - 40ms/epoch - 20ms/step Epoch 20/50 2/2 - 0s - loss: 0.0969 - val_loss: 0.1230 - 42ms/epoch - 21ms/step Epoch 21/50 2/2 - 0s - loss: 0.0935 - val_loss: 0.1258 - 39ms/epoch - 19ms/step Epoch 22/50 2/2 - 0s - loss: 0.0906 - val_loss: 0.1260 - 41ms/epoch - 21ms/step Epoch 23/50 2/2 - 0s - loss: 0.0877 - val_loss: 0.1235 - 40ms/epoch - 20ms/step Epoch 24/50 2/2 - 0s - loss: 0.0846 - val_loss: 0.1190 - 40ms/epoch - 20ms/step Epoch 25/50 2/2 - 0s - loss: 0.0814 - val_loss: 0.1135 - 44ms/epoch - 22ms/step Epoch 26/50 2/2 - 0s - loss: 0.0784 - val_loss: 0.1081 - 44ms/epoch - 22ms/step Epoch 27/50 2/2 - 0s - loss: 0.0757 - val_loss: 0.1035 - 40ms/epoch - 20ms/step Epoch 28/50 2/2 - 0s - loss: 0.0731 - val_loss: 0.1003 - 39ms/epoch - 20ms/step Epoch 29/50 2/2 - 0s - loss: 0.0705 - val_loss: 0.0986 - 40ms/epoch - 20ms/step Epoch 30/50 2/2 - 0s - loss: 0.0676 - val_loss: 0.0982 - 43ms/epoch - 22ms/step Epoch 31/50 2/2 - 0s - loss: 0.0645 - val_loss: 0.0989 - 42ms/epoch - 21ms/step Epoch 32/50 2/2 - 0s - loss: 0.0612 - val_loss: 0.1003 - 41ms/epoch - 20ms/step Epoch 33/50 2/2 - 0s - loss: 0.0580 - val_loss: 0.1017 - 41ms/epoch - 21ms/step Epoch 34/50 2/2 - 0s - loss: 0.0550 - val_loss: 0.1024 - 44ms/epoch - 22ms/step Epoch 35/50 2/2 - 0s - loss: 0.0521 - val_loss: 0.1020 - 43ms/epoch - 21ms/step Epoch 36/50 2/2 - 0s - loss: 0.0493 - val_loss: 0.1004 - 42ms/epoch - 21ms/step Epoch 37/50 2/2 - 0s - loss: 0.0465 - val_loss: 0.0980 - 44ms/epoch - 22ms/step Epoch 38/50 2/2 - 0s - loss: 0.0438 - val_loss: 0.0955 - 42ms/epoch - 21ms/step Epoch 39/50 2/2 - 0s - loss: 0.0412 - val_loss: 0.0934 - 43ms/epoch - 22ms/step Epoch 40/50 2/2 - 0s - loss: 0.0387 - val_loss: 0.0922 - 41ms/epoch - 21ms/step Epoch 41/50 2/2 - 0s - loss: 0.0363 - val_loss: 0.0920 - 41ms/epoch - 21ms/step Epoch 42/50 2/2 - 0s - loss: 0.0338 - val_loss: 0.0925 - 42ms/epoch - 21ms/step Epoch 43/50 2/2 - 0s - loss: 0.0314 - val_loss: 0.0932 - 40ms/epoch - 20ms/step Epoch 44/50 2/2 - 0s - loss: 0.0292 - val_loss: 0.0936 - 40ms/epoch - 20ms/step Epoch 45/50 2/2 - 0s - loss: 0.0271 - val_loss: 0.0933 - 39ms/epoch - 20ms/step Epoch 46/50 2/2 - 0s - loss: 0.0251 - val_loss: 0.0924 - 39ms/epoch - 19ms/step Epoch 47/50 2/2 - 0s - loss: 0.0232 - val_loss: 0.0913 - 40ms/epoch - 20ms/step Epoch 48/50 2/2 - 0s - loss: 0.0214 - val_loss: 0.0905 - 44ms/epoch - 22ms/step Epoch 49/50 2/2 - 0s - loss: 0.0197 - val_loss: 0.0905 - 43ms/epoch - 21ms/step Epoch 50/50 2/2 - 0s - loss: 0.0182 - val_loss: 0.0912 - 39ms/epoch - 20ms/step 5/5 [==============================] - 0s 4ms/step
Test RMSE: 0.302
lstm_rsme_dict20 = {}
for airline in lstm_g.groups.keys():
print(airline)
n_epochs = 20
n_batch = 72
loss_type = 'mean_squared_error'
model_rsme = lstm_run(airline, n_epochs, n_batch, loss_type)
feat_lst = [model_rsme, n_epochs, n_batch]
lstm_rsme_dict20[airline] = feat_lst
AA Epoch 1/20 2/2 - 3s - loss: 0.2505 - val_loss: 0.3238 - 3s/epoch - 1s/step Epoch 2/20 2/2 - 0s - loss: 0.1569 - val_loss: 0.2063 - 41ms/epoch - 20ms/step Epoch 3/20 2/2 - 0s - loss: 0.1033 - val_loss: 0.1209 - 39ms/epoch - 20ms/step Epoch 4/20 2/2 - 0s - loss: 0.0830 - val_loss: 0.0675 - 39ms/epoch - 19ms/step Epoch 5/20 2/2 - 0s - loss: 0.0881 - val_loss: 0.0414 - 39ms/epoch - 20ms/step Epoch 6/20 2/2 - 0s - loss: 0.1037 - val_loss: 0.0323 - 39ms/epoch - 20ms/step Epoch 7/20 2/2 - 0s - loss: 0.1142 - val_loss: 0.0313 - 40ms/epoch - 20ms/step Epoch 8/20 2/2 - 0s - loss: 0.1135 - val_loss: 0.0346 - 39ms/epoch - 20ms/step Epoch 9/20 2/2 - 0s - loss: 0.1042 - val_loss: 0.0418 - 38ms/epoch - 19ms/step Epoch 10/20 2/2 - 0s - loss: 0.0919 - val_loss: 0.0526 - 40ms/epoch - 20ms/step Epoch 11/20 2/2 - 0s - loss: 0.0808 - val_loss: 0.0658 - 41ms/epoch - 20ms/step Epoch 12/20 2/2 - 0s - loss: 0.0730 - val_loss: 0.0792 - 39ms/epoch - 19ms/step Epoch 13/20 2/2 - 0s - loss: 0.0682 - val_loss: 0.0905 - 38ms/epoch - 19ms/step Epoch 14/20 2/2 - 0s - loss: 0.0656 - val_loss: 0.0982 - 38ms/epoch - 19ms/step Epoch 15/20 2/2 - 0s - loss: 0.0638 - val_loss: 0.1016 - 39ms/epoch - 19ms/step Epoch 16/20 2/2 - 0s - loss: 0.0620 - val_loss: 0.1009 - 41ms/epoch - 20ms/step Epoch 17/20 2/2 - 0s - loss: 0.0599 - val_loss: 0.0970 - 44ms/epoch - 22ms/step Epoch 18/20 2/2 - 0s - loss: 0.0576 - val_loss: 0.0912 - 44ms/epoch - 22ms/step Epoch 19/20 2/2 - 0s - loss: 0.0553 - val_loss: 0.0848 - 39ms/epoch - 20ms/step Epoch 20/20 2/2 - 0s - loss: 0.0534 - val_loss: 0.0790 - 39ms/epoch - 20ms/step 5/5 [==============================] - 0s 3ms/step
Test RMSE: 0.281 AS Epoch 1/20 2/2 - 3s - loss: 0.2586 - val_loss: 0.2728 - 3s/epoch - 1s/step Epoch 2/20 2/2 - 0s - loss: 0.1418 - val_loss: 0.1784 - 41ms/epoch - 21ms/step Epoch 3/20 2/2 - 0s - loss: 0.1053 - val_loss: 0.1295 - 40ms/epoch - 20ms/step Epoch 4/20 2/2 - 0s - loss: 0.1237 - val_loss: 0.1106 - 42ms/epoch - 21ms/step Epoch 5/20 2/2 - 0s - loss: 0.1560 - val_loss: 0.1047 - 41ms/epoch - 21ms/step Epoch 6/20 2/2 - 0s - loss: 0.1717 - val_loss: 0.1025 - 39ms/epoch - 20ms/step Epoch 7/20 2/2 - 0s - loss: 0.1657 - val_loss: 0.1024 - 39ms/epoch - 20ms/step Epoch 8/20 2/2 - 0s - loss: 0.1469 - val_loss: 0.1053 - 41ms/epoch - 20ms/step Epoch 9/20 2/2 - 0s - loss: 0.1252 - val_loss: 0.1113 - 39ms/epoch - 19ms/step Epoch 10/20 2/2 - 0s - loss: 0.1073 - val_loss: 0.1191 - 41ms/epoch - 20ms/step Epoch 11/20 2/2 - 0s - loss: 0.0954 - val_loss: 0.1263 - 42ms/epoch - 21ms/step Epoch 12/20 2/2 - 0s - loss: 0.0885 - val_loss: 0.1310 - 40ms/epoch - 20ms/step Epoch 13/20 2/2 - 0s - loss: 0.0844 - val_loss: 0.1318 - 40ms/epoch - 20ms/step Epoch 14/20 2/2 - 0s - loss: 0.0812 - val_loss: 0.1288 - 42ms/epoch - 21ms/step Epoch 15/20 2/2 - 0s - loss: 0.0781 - val_loss: 0.1230 - 40ms/epoch - 20ms/step Epoch 16/20 2/2 - 0s - loss: 0.0751 - val_loss: 0.1159 - 38ms/epoch - 19ms/step Epoch 17/20 2/2 - 0s - loss: 0.0726 - val_loss: 0.1089 - 40ms/epoch - 20ms/step Epoch 18/20 2/2 - 0s - loss: 0.0710 - val_loss: 0.1031 - 40ms/epoch - 20ms/step Epoch 19/20 2/2 - 0s - loss: 0.0700 - val_loss: 0.0989 - 40ms/epoch - 20ms/step Epoch 20/20 2/2 - 0s - loss: 0.0689 - val_loss: 0.0964 - 43ms/epoch - 21ms/step 5/5 [==============================] - 0s 3ms/step
Test RMSE: 0.311 B6 Epoch 1/20 2/2 - 2s - loss: 0.2732 - val_loss: 0.5680 - 2s/epoch - 1s/step Epoch 2/20 2/2 - 0s - loss: 0.1925 - val_loss: 0.4114 - 37ms/epoch - 19ms/step Epoch 3/20 2/2 - 0s - loss: 0.1403 - val_loss: 0.2876 - 37ms/epoch - 18ms/step Epoch 4/20 2/2 - 0s - loss: 0.1115 - val_loss: 0.1963 - 38ms/epoch - 19ms/step Epoch 5/20 2/2 - 0s - loss: 0.1023 - val_loss: 0.1341 - 39ms/epoch - 20ms/step Epoch 6/20 2/2 - 0s - loss: 0.1074 - val_loss: 0.0959 - 40ms/epoch - 20ms/step Epoch 7/20 2/2 - 0s - loss: 0.1204 - val_loss: 0.0754 - 52ms/epoch - 26ms/step Epoch 8/20 2/2 - 0s - loss: 0.1348 - val_loss: 0.0663 - 38ms/epoch - 19ms/step Epoch 9/20 2/2 - 0s - loss: 0.1461 - val_loss: 0.0633 - 38ms/epoch - 19ms/step Epoch 10/20 2/2 - 0s - loss: 0.1515 - val_loss: 0.0629 - 38ms/epoch - 19ms/step Epoch 11/20 2/2 - 0s - loss: 0.1505 - val_loss: 0.0635 - 38ms/epoch - 19ms/step Epoch 12/20 2/2 - 0s - loss: 0.1441 - val_loss: 0.0644 - 39ms/epoch - 19ms/step Epoch 13/20 2/2 - 0s - loss: 0.1339 - val_loss: 0.0658 - 39ms/epoch - 20ms/step Epoch 14/20 2/2 - 0s - loss: 0.1218 - val_loss: 0.0679 - 40ms/epoch - 20ms/step Epoch 15/20 2/2 - 0s - loss: 0.1095 - val_loss: 0.0707 - 38ms/epoch - 19ms/step Epoch 16/20 2/2 - 0s - loss: 0.0981 - val_loss: 0.0743 - 40ms/epoch - 20ms/step Epoch 17/20 2/2 - 0s - loss: 0.0882 - val_loss: 0.0782 - 38ms/epoch - 19ms/step Epoch 18/20 2/2 - 0s - loss: 0.0802 - val_loss: 0.0820 - 39ms/epoch - 19ms/step Epoch 19/20 2/2 - 0s - loss: 0.0739 - val_loss: 0.0853 - 39ms/epoch - 20ms/step Epoch 20/20 2/2 - 0s - loss: 0.0691 - val_loss: 0.0878 - 41ms/epoch - 20ms/step 4/4 [==============================] - 0s 3ms/step
Test RMSE: 0.296 DL Epoch 1/20 1/1 - 3s - loss: 0.5279 - val_loss: 0.5600 - 3s/epoch - 3s/step Epoch 2/20 1/1 - 0s - loss: 0.4563 - val_loss: 0.4856 - 36ms/epoch - 36ms/step Epoch 3/20 1/1 - 0s - loss: 0.3912 - val_loss: 0.4180 - 35ms/epoch - 35ms/step Epoch 4/20 1/1 - 0s - loss: 0.3324 - val_loss: 0.3570 - 36ms/epoch - 36ms/step Epoch 5/20 1/1 - 0s - loss: 0.2800 - val_loss: 0.3024 - 36ms/epoch - 36ms/step Epoch 6/20 1/1 - 0s - loss: 0.2336 - val_loss: 0.2541 - 37ms/epoch - 37ms/step Epoch 7/20 1/1 - 0s - loss: 0.1931 - val_loss: 0.2120 - 38ms/epoch - 38ms/step Epoch 8/20 1/1 - 0s - loss: 0.1585 - val_loss: 0.1761 - 35ms/epoch - 35ms/step Epoch 9/20 1/1 - 0s - loss: 0.1296 - val_loss: 0.1462 - 36ms/epoch - 36ms/step Epoch 10/20 1/1 - 0s - loss: 0.1062 - val_loss: 0.1223 - 36ms/epoch - 36ms/step Epoch 11/20 1/1 - 0s - loss: 0.0884 - val_loss: 0.1041 - 45ms/epoch - 45ms/step Epoch 12/20 1/1 - 0s - loss: 0.0756 - val_loss: 0.0911 - 37ms/epoch - 37ms/step Epoch 13/20 1/1 - 0s - loss: 0.0677 - val_loss: 0.0829 - 37ms/epoch - 37ms/step Epoch 14/20 1/1 - 0s - loss: 0.0640 - val_loss: 0.0785 - 36ms/epoch - 36ms/step Epoch 15/20 1/1 - 0s - loss: 0.0637 - val_loss: 0.0771 - 36ms/epoch - 36ms/step Epoch 16/20 1/1 - 0s - loss: 0.0659 - val_loss: 0.0776 - 36ms/epoch - 36ms/step Epoch 17/20 1/1 - 0s - loss: 0.0696 - val_loss: 0.0789 - 37ms/epoch - 37ms/step Epoch 18/20 1/1 - 0s - loss: 0.0736 - val_loss: 0.0802 - 40ms/epoch - 40ms/step Epoch 19/20 1/1 - 0s - loss: 0.0772 - val_loss: 0.0809 - 38ms/epoch - 38ms/step Epoch 20/20 1/1 - 0s - loss: 0.0798 - val_loss: 0.0808 - 36ms/epoch - 36ms/step 4/4 [==============================] - 0s 3ms/step
Test RMSE: 0.284 G4 Epoch 1/20 1/1 - 2s - loss: 0.0279 - val_loss: 0.2764 - 2s/epoch - 2s/step Epoch 2/20 1/1 - 0s - loss: 0.0202 - val_loss: 0.3053 - 36ms/epoch - 36ms/step Epoch 3/20 1/1 - 0s - loss: 0.0194 - val_loss: 0.3192 - 35ms/epoch - 35ms/step Epoch 4/20 1/1 - 0s - loss: 0.0182 - val_loss: 0.3206 - 35ms/epoch - 35ms/step Epoch 5/20 1/1 - 0s - loss: 0.0156 - val_loss: 0.3151 - 35ms/epoch - 35ms/step Epoch 6/20 1/1 - 0s - loss: 0.0128 - val_loss: 0.3072 - 32ms/epoch - 32ms/step Epoch 7/20 1/1 - 0s - loss: 0.0108 - val_loss: 0.3007 - 34ms/epoch - 34ms/step Epoch 8/20 1/1 - 0s - loss: 0.0097 - val_loss: 0.2978 - 32ms/epoch - 32ms/step Epoch 9/20 1/1 - 0s - loss: 0.0093 - val_loss: 0.2995 - 32ms/epoch - 32ms/step Epoch 10/20 1/1 - 0s - loss: 0.0087 - val_loss: 0.3055 - 34ms/epoch - 34ms/step Epoch 11/20 1/1 - 0s - loss: 0.0078 - val_loss: 0.3150 - 32ms/epoch - 32ms/step Epoch 12/20 1/1 - 0s - loss: 0.0068 - val_loss: 0.3268 - 33ms/epoch - 33ms/step Epoch 13/20 1/1 - 0s - loss: 0.0059 - val_loss: 0.3389 - 33ms/epoch - 33ms/step Epoch 14/20 1/1 - 0s - loss: 0.0054 - val_loss: 0.3494 - 33ms/epoch - 33ms/step Epoch 15/20 1/1 - 0s - loss: 0.0052 - val_loss: 0.3565 - 36ms/epoch - 36ms/step Epoch 16/20 1/1 - 0s - loss: 0.0050 - val_loss: 0.3596 - 31ms/epoch - 31ms/step Epoch 17/20 1/1 - 0s - loss: 0.0047 - val_loss: 0.3587 - 37ms/epoch - 37ms/step Epoch 18/20 1/1 - 0s - loss: 0.0042 - val_loss: 0.3549 - 35ms/epoch - 35ms/step Epoch 19/20 1/1 - 0s - loss: 0.0037 - val_loss: 0.3497 - 34ms/epoch - 34ms/step Epoch 20/20 1/1 - 0s - loss: 0.0032 - val_loss: 0.3446 - 33ms/epoch - 33ms/step 2/2 [==============================] - 0s 4ms/step
Test RMSE: 0.587 HA Epoch 1/20 1/1 - 3s - loss: 0.3407 - val_loss: 0.4060 - 3s/epoch - 3s/step Epoch 2/20 1/1 - 0s - loss: 0.2746 - val_loss: 0.3141 - 33ms/epoch - 33ms/step Epoch 3/20 1/1 - 0s - loss: 0.2220 - val_loss: 0.2403 - 32ms/epoch - 32ms/step Epoch 4/20 1/1 - 0s - loss: 0.1827 - val_loss: 0.1836 - 35ms/epoch - 35ms/step Epoch 5/20 1/1 - 0s - loss: 0.1558 - val_loss: 0.1426 - 34ms/epoch - 34ms/step Epoch 6/20 1/1 - 0s - loss: 0.1401 - val_loss: 0.1153 - 36ms/epoch - 36ms/step Epoch 7/20 1/1 - 0s - loss: 0.1335 - val_loss: 0.0990 - 34ms/epoch - 34ms/step Epoch 8/20 1/1 - 0s - loss: 0.1332 - val_loss: 0.0905 - 35ms/epoch - 35ms/step Epoch 9/20 1/1 - 0s - loss: 0.1361 - val_loss: 0.0868 - 34ms/epoch - 34ms/step Epoch 10/20 1/1 - 0s - loss: 0.1396 - val_loss: 0.0858 - 33ms/epoch - 33ms/step Epoch 11/20 1/1 - 0s - loss: 0.1415 - val_loss: 0.0863 - 35ms/epoch - 35ms/step Epoch 12/20 1/1 - 0s - loss: 0.1410 - val_loss: 0.0882 - 36ms/epoch - 36ms/step Epoch 13/20 1/1 - 0s - loss: 0.1379 - val_loss: 0.0917 - 33ms/epoch - 33ms/step Epoch 14/20 1/1 - 0s - loss: 0.1325 - val_loss: 0.0974 - 35ms/epoch - 35ms/step Epoch 15/20 1/1 - 0s - loss: 0.1257 - val_loss: 0.1057 - 35ms/epoch - 35ms/step Epoch 16/20 1/1 - 0s - loss: 0.1182 - val_loss: 0.1171 - 35ms/epoch - 35ms/step Epoch 17/20 1/1 - 0s - loss: 0.1108 - val_loss: 0.1315 - 34ms/epoch - 34ms/step Epoch 18/20 1/1 - 0s - loss: 0.1040 - val_loss: 0.1489 - 33ms/epoch - 33ms/step Epoch 19/20 1/1 - 0s - loss: 0.0982 - val_loss: 0.1689 - 33ms/epoch - 33ms/step Epoch 20/20 1/1 - 0s - loss: 0.0936 - val_loss: 0.1908 - 33ms/epoch - 33ms/step 1/1 [==============================] - 0s 407ms/step
Test RMSE: 0.437 NK Epoch 1/20 1/1 - 3s - loss: 0.5218 - val_loss: 0.2882 - 3s/epoch - 3s/step Epoch 2/20 1/1 - 0s - loss: 0.4567 - val_loss: 0.2644 - 36ms/epoch - 36ms/step Epoch 3/20 1/1 - 0s - loss: 0.3965 - val_loss: 0.2497 - 41ms/epoch - 41ms/step Epoch 4/20 1/1 - 0s - loss: 0.3414 - val_loss: 0.2446 - 40ms/epoch - 40ms/step Epoch 5/20 1/1 - 0s - loss: 0.2916 - val_loss: 0.2491 - 38ms/epoch - 38ms/step Epoch 6/20 1/1 - 0s - loss: 0.2472 - val_loss: 0.2632 - 39ms/epoch - 39ms/step Epoch 7/20 1/1 - 0s - loss: 0.2084 - val_loss: 0.2863 - 40ms/epoch - 40ms/step Epoch 8/20 1/1 - 0s - loss: 0.1754 - val_loss: 0.3172 - 40ms/epoch - 40ms/step Epoch 9/20 1/1 - 0s - loss: 0.1481 - val_loss: 0.3543 - 35ms/epoch - 35ms/step Epoch 10/20 1/1 - 0s - loss: 0.1264 - val_loss: 0.3950 - 37ms/epoch - 37ms/step Epoch 11/20 1/1 - 0s - loss: 0.1100 - val_loss: 0.4367 - 35ms/epoch - 35ms/step Epoch 12/20 1/1 - 0s - loss: 0.0985 - val_loss: 0.4763 - 36ms/epoch - 36ms/step Epoch 13/20 1/1 - 0s - loss: 0.0911 - val_loss: 0.5111 - 38ms/epoch - 38ms/step Epoch 14/20 1/1 - 0s - loss: 0.0869 - val_loss: 0.5389 - 35ms/epoch - 35ms/step Epoch 15/20 1/1 - 0s - loss: 0.0850 - val_loss: 0.5579 - 36ms/epoch - 36ms/step Epoch 16/20 1/1 - 0s - loss: 0.0845 - val_loss: 0.5675 - 36ms/epoch - 36ms/step Epoch 17/20 1/1 - 0s - loss: 0.0844 - val_loss: 0.5676 - 36ms/epoch - 36ms/step Epoch 18/20 1/1 - 0s - loss: 0.0841 - val_loss: 0.5591 - 35ms/epoch - 35ms/step Epoch 19/20 1/1 - 0s - loss: 0.0831 - val_loss: 0.5429 - 35ms/epoch - 35ms/step Epoch 20/20 1/1 - 0s - loss: 0.0812 - val_loss: 0.5206 - 35ms/epoch - 35ms/step 3/3 [==============================] - 0s 4ms/step
Test RMSE: 0.722 UA Epoch 1/20 2/2 - 3s - loss: 0.3601 - val_loss: 0.4134 - 3s/epoch - 1s/step Epoch 2/20 2/2 - 0s - loss: 0.2574 - val_loss: 0.2877 - 39ms/epoch - 19ms/step Epoch 3/20 2/2 - 0s - loss: 0.2029 - val_loss: 0.1943 - 38ms/epoch - 19ms/step Epoch 4/20 2/2 - 0s - loss: 0.1885 - val_loss: 0.1330 - 40ms/epoch - 20ms/step Epoch 5/20 2/2 - 0s - loss: 0.2026 - val_loss: 0.0981 - 38ms/epoch - 19ms/step Epoch 6/20 2/2 - 0s - loss: 0.2273 - val_loss: 0.0808 - 40ms/epoch - 20ms/step Epoch 7/20 2/2 - 0s - loss: 0.2465 - val_loss: 0.0735 - 40ms/epoch - 20ms/step Epoch 8/20 2/2 - 0s - loss: 0.2519 - val_loss: 0.0718 - 42ms/epoch - 21ms/step Epoch 9/20 2/2 - 0s - loss: 0.2436 - val_loss: 0.0737 - 38ms/epoch - 19ms/step Epoch 10/20 2/2 - 0s - loss: 0.2265 - val_loss: 0.0786 - 40ms/epoch - 20ms/step Epoch 11/20 2/2 - 0s - loss: 0.2062 - val_loss: 0.0860 - 40ms/epoch - 20ms/step Epoch 12/20 2/2 - 0s - loss: 0.1869 - val_loss: 0.0945 - 42ms/epoch - 21ms/step Epoch 13/20 2/2 - 0s - loss: 0.1708 - val_loss: 0.1027 - 39ms/epoch - 20ms/step Epoch 14/20 2/2 - 0s - loss: 0.1583 - val_loss: 0.1092 - 42ms/epoch - 21ms/step Epoch 15/20 2/2 - 0s - loss: 0.1489 - val_loss: 0.1128 - 40ms/epoch - 20ms/step Epoch 16/20 2/2 - 0s - loss: 0.1417 - val_loss: 0.1131 - 39ms/epoch - 20ms/step Epoch 17/20 2/2 - 0s - loss: 0.1359 - val_loss: 0.1104 - 41ms/epoch - 20ms/step Epoch 18/20 2/2 - 0s - loss: 0.1311 - val_loss: 0.1052 - 41ms/epoch - 20ms/step Epoch 19/20 2/2 - 0s - loss: 0.1272 - val_loss: 0.0986 - 40ms/epoch - 20ms/step Epoch 20/20 2/2 - 0s - loss: 0.1240 - val_loss: 0.0915 - 39ms/epoch - 20ms/step 5/5 [==============================] - 0s 3ms/step
Test RMSE: 0.302 WN Epoch 1/20 2/2 - 2s - loss: 0.2861 - val_loss: 0.1369 - 2s/epoch - 1s/step Epoch 2/20 2/2 - 0s - loss: 0.1983 - val_loss: 0.1659 - 42ms/epoch - 21ms/step Epoch 3/20 2/2 - 0s - loss: 0.1582 - val_loss: 0.2075 - 47ms/epoch - 24ms/step Epoch 4/20 2/2 - 0s - loss: 0.1468 - val_loss: 0.2316 - 39ms/epoch - 20ms/step Epoch 5/20 2/2 - 0s - loss: 0.1440 - val_loss: 0.2308 - 43ms/epoch - 22ms/step Epoch 6/20 2/2 - 0s - loss: 0.1398 - val_loss: 0.2135 - 40ms/epoch - 20ms/step Epoch 7/20 2/2 - 0s - loss: 0.1335 - val_loss: 0.1895 - 41ms/epoch - 21ms/step Epoch 8/20 2/2 - 0s - loss: 0.1277 - val_loss: 0.1660 - 40ms/epoch - 20ms/step Epoch 9/20 2/2 - 0s - loss: 0.1241 - val_loss: 0.1470 - 41ms/epoch - 20ms/step Epoch 10/20 2/2 - 0s - loss: 0.1227 - val_loss: 0.1336 - 38ms/epoch - 19ms/step Epoch 11/20 2/2 - 0s - loss: 0.1222 - val_loss: 0.1252 - 40ms/epoch - 20ms/step Epoch 12/20 2/2 - 0s - loss: 0.1211 - val_loss: 0.1210 - 39ms/epoch - 19ms/step Epoch 13/20 2/2 - 0s - loss: 0.1183 - val_loss: 0.1200 - 53ms/epoch - 27ms/step Epoch 14/20 2/2 - 0s - loss: 0.1138 - val_loss: 0.1218 - 50ms/epoch - 25ms/step Epoch 15/20 2/2 - 0s - loss: 0.1087 - val_loss: 0.1255 - 40ms/epoch - 20ms/step Epoch 16/20 2/2 - 0s - loss: 0.1037 - val_loss: 0.1301 - 39ms/epoch - 19ms/step Epoch 17/20 2/2 - 0s - loss: 0.0994 - val_loss: 0.1340 - 39ms/epoch - 20ms/step Epoch 18/20 2/2 - 0s - loss: 0.0958 - val_loss: 0.1359 - 39ms/epoch - 20ms/step Epoch 19/20 2/2 - 0s - loss: 0.0926 - val_loss: 0.1350 - 39ms/epoch - 20ms/step Epoch 20/20 2/2 - 0s - loss: 0.0892 - val_loss: 0.1313 - 45ms/epoch - 23ms/step 5/5 [==============================] - 0s 3ms/step
Test RMSE: 0.362
lstm_results50 = pd.DataFrame.from_dict(lstm_rsme_dict50, orient='index',
columns=['RMSE', 'Epochs', 'Batch Size'])
lstm_results50 = lstm_results50.reset_index()
lstm_results50['Airline'] = lstm_results50['index'].map(MKT_code)
lstm_results50.drop('index', axis=1, inplace=True)
lstm_results50 = lstm_results50.set_index('Airline')
lstm_results50
RMSE | Epochs | Batch Size | |
---|---|---|---|
Airline | |||
American | 0.440847 | 50 | 72 |
Alaskan | 0.388948 | 50 | 72 |
JetBlue | 0.304574 | 50 | 72 |
Delta | 0.214968 | 50 | 72 |
Allegiant | 0.555687 | 50 | 72 |
Hawaiian | 0.698020 | 50 | 72 |
Spirit | 0.395131 | 50 | 72 |
United | 0.236709 | 50 | 72 |
Southwest | 0.302007 | 50 | 72 |
lstm_results20 = pd.DataFrame.from_dict(lstm_rsme_dict20, orient='index',
columns=['RMSE', 'Epochs', 'Batch Size'])
lstm_results20 = lstm_results20.reset_index()
lstm_results20['Airline'] = lstm_results20['index'].map(MKT_code)
lstm_results20.drop('index', axis=1, inplace=True)
lstm_results20 = lstm_results20.set_index('Airline')
lstm_results20
RMSE | Epochs | Batch Size | |
---|---|---|---|
Airline | |||
American | 0.280981 | 20 | 72 |
Alaskan | 0.310535 | 20 | 72 |
JetBlue | 0.296296 | 20 | 72 |
Delta | 0.284168 | 20 | 72 |
Allegiant | 0.587060 | 20 | 72 |
Hawaiian | 0.436785 | 20 | 72 |
Spirit | 0.721506 | 20 | 72 |
United | 0.302426 | 20 | 72 |
Southwest | 0.362422 | 20 | 72 |
Conclusion¶
Over the course of this project, it became clear that stocks are inherently hard to predict and model. Analyzing operational metrics and company financials and combining these analytics with cancellations and delays offers a unique dataset to train an ML algorithm on. After logistic regression had failed, the choice to move to LSTM resulted in lower RMSEs across each airline. While this model was not perfect, it improved our prior findings significantly. We saw a higher RMSE for Hawaiian and Allegiant but this could be due to the small nature of these airlines. Furthermore, Allegiant is an operating carrier and a subsidy of Delta, complicating forecasting attempts.