CMPS3160 Project: Delays & Declines

Happy Herold, Syed Mehran

Github Page

This project is a collaboration between Happy Herold and Syed Mehran for CMPS-3160: Intro to Data Science. The authors intend to investigate how airline operational performance can predict stock prices. To investigate, the authors have compiled data from the last 6 months on airline financials and performance.

Project Goals:

Authors hope to learn about how operational inefficiency contributes to stock prices.

Authors would also like to learn how soon the market responds to inefficiencies

Authors want to investigate how one could use this data to develop a model that can predict price shifts due to inefficiency

Milestone 1:

The authors intend to investigate how airline operational performance can predict stock prices. The data sets for this project are stock market data obtained from Yahoo Finance and domestic flight data from the Bureau of Transportation Statistics TranStats Marketing Carrier On-Time Performance table. To investigate, the authors have compiled data from the last 6 months of available data (01/01/2023 to 05/31/2023) on airline financials and performance. With the exception of the aviation index the 9 major airlines from the stock market data correspond to the market carriers in the flight data.

Since the aim of this project is to predict the stock prices based on airline performance, the stock market data provides actual values of the stock prices. One of the questions that the project aims to answer is whether or not the performance of a stock can be predicted based on the performance of an airline, and does the performance of multiple airlines impact the stock price of a single airline. As part of the ETL process the daily returns for each stock were calculated and used in creating a correlation matrix for daily returns. Time series data of the trading volume, open price, and closing price were also plotted.

To measure airline performance in this project flight data from the Marketing Carrier On-Time Performance table. The flight data contains flight characteristics including date, market carrier, operating carrier, tail number, origin and destination, scheduled departure time, actual departure time, scheduled arrival time, actual arrival time, canceled and diverted flights as well as other key features of the flight. The flight data does not specifically include airline performance metrics. One of the questions that the project aims to answer is if an airline’s flight data can be used to create a metric that can quantify the performance of an airline and can these metrics be used to compare different airlines and be able to predict the stock price. During the ETL process daily number of flights, daily number of cancellations, and proportion of flights that are operated by the market carrier have been calculated using the flight data and plotted.

****Data Set Websites:****

Marketing Carrier On-Time Performance Data

Yahoo Finance

Collaboration Plan:

A private project git repository has been created to store the data and the Colab Jupyter notebook. Authors intend to meet once a week to discuss weekly project goals, raise questions, and develop the final product. Meetings will take place by Zoom or in-person and the pair will communicate using text messaging. The weekly goals and project to-do list are shared in a OneNote document that is shared between the pair. Up until Milestone 2 the authors will continue to calculate various airline performance metrics from the flight data and begin predicting the stock prices based on these metrics with different scikit-learn regression models. After completing Milestone 2, the authors plan on gathering data visualizations and the model to develop a final website and presentation that clearly explains the aim of the project, the steps and tools utilized in the project, and the findings from the analysis and model created.

In [ ]:
!git clone https://github.com/syedmehran/cmps3160-project.git
%cd cmps3160-project/
Cloning into 'cmps3160-project'...
remote: Enumerating objects: 570, done.
remote: Counting objects: 100% (165/165), done.
remote: Compressing objects: 100% (93/93), done.
remote: Total 570 (delta 101), reused 131 (delta 72), pack-reused 405
Receiving objects: 100% (570/570), 688.84 MiB | 18.42 MiB/s, done.
Resolving deltas: 100% (331/331), done.
Updating files: 100% (353/353), done.
/content/cmps3160-project
In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob
import yfinance as yf
import datetime

Stock data ETL¶

In [ ]:
#Load stock data in

financ_UAL = yf.download('UAL', start = '2015-04-30') # united
financ_ALK = yf.download('ALK', start = '2015-04-30') # alaskan
financ_AAL = yf.download("AAL", start = '2015-04-30') # american
financ_ALGT = yf.download("ALGT", start = '2015-04-30') # allegiant
financ_HA = yf.download("HA", start = '2015-04-30') # hawaiian
financ_LUV = yf.download("LUV", start = '2015-04-30') # southwest
financ_SAVE = yf.download("SAVE", start = '2015-04-30') # spirit
financ_JBLU = yf.download("JBLU", start = '2015-04-30') # jetblue
financ_DAL = yf.download("DAL", start = '2015-04-30') # delta
financ_JETS = yf.download("JETS", start = '2015-04-30') # JETS etf index
financ_FRONTIER = yf.download("ULCC", start='2015-04-30')
financ_OIL = yf.download('CL=F', start = '2015-04-30')

financ_airlines = {
    'United': financ_UAL,
    'Alaskan': financ_ALK,
    'American': financ_AAL,
    'Allegiant': financ_ALGT,
    'JetBlue': financ_JBLU,
    'Hawaiian': financ_HA,
    'Southwest': financ_LUV,
    'Spirit': financ_SAVE,
    'Delta': financ_DAL,
    'Frontier': financ_FRONTIER,
    'JETS' : financ_JETS
}

airline_colors = {
    'United': '#1f77b4',  # muted blue
    'Alaskan': '#2ca02c',  # cooked asparagus green
    'American': '#d62728',  # brick red
    'Allegiant': '#ff7f0e',  # safety orange
    'JetBlue': '#9467bd',  # muted purple
    'Hawaiian': '#e377c2',  # raspberry yogurt pink
    'Southwest': '#7f7f7f',  # middle gray
    'Spirit': '#bcbd22',  # curry yellow-green
    'Delta': '#17becf',  # blue-teal
    'JETS': '#8c564b',  # chestnut brown
    'Frontier': '#c7c7c7'  # silver gray
}
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
In [ ]:
# Load in airline financial data
financials_2023 = pd.read_csv('../cmps3160-project/finance-data/T_F41SCHEDULE_P12_2023.csv')
financials_2022 = pd.read_csv('../cmps3160-project/finance-data/T_F41SCHEDULE_P12_2022.csv')
financials_2021 = pd.read_csv('../cmps3160-project/finance-data/T_F41SCHEDULE_P12_2021.csv')
financials_2020 = pd.read_csv('../cmps3160-project/finance-data/T_F41SCHEDULE_P12_2020.csv')

financials_2020.tail()
Out[ ]:
NET_INCOME OP_PROFIT_LOSS TRANS_REV_PAX MAIL TOTAL_PROPERTY PROP_FREIGHT PROP_BAG TOTAL_CHARTER CHARTER_PAX CHARTER_PROP ... UNIQUE_CARRIER UNIQUE_CARRIER_NAME CARRIER CARRIER_NAME UNIQUE_CARRIER_ENTITY REGION CARRIER_GROUP_NEW CARRIER_GROUP YEAR QUARTER
408 262685.0 -597883.0 NaN 726.0 NaN 444323.0 NaN NaN NaN 7892.0 ... FX Federal Express Corporation FX Federal Express Corporation 16196 A 3 3 2020 3
409 688988.0 1395899.0 NaN 0.0 NaN 2785917.0 NaN NaN NaN 7598.0 ... FX Federal Express Corporation FX Federal Express Corporation 06200 D 3 3 2020 3
410 908014.0 1130748.0 NaN 394.0 NaN 2566955.0 NaN NaN NaN 1195.0 ... FX Federal Express Corporation FX Federal Express Corporation 06200 D 3 3 2020 1
411 926627.0 1363349.0 NaN 302.0 NaN 2502021.0 NaN NaN NaN 33790.0 ... FX Federal Express Corporation FX Federal Express Corporation 06200 D 3 3 2020 2
412 1188167.0 1624892.0 NaN 45.0 NaN 2986833.0 NaN NaN NaN 6047.0 ... FX Federal Express Corporation FX Federal Express Corporation 06200 D 3 3 2020 4

5 rows × 52 columns

In [ ]:
financials = pd.concat([financials_2023, financials_2022, financials_2021, financials_2020], ignore_index=True)
In [ ]:
for airline, df in financ_airlines.items():
    print(f"{airline} DataFrame size: {df.shape}")
    print(f"Null values in {airline}:")
    null_counts = df.isnull().sum()  # counts of null values in each column
    print(null_counts[null_counts > 0])  # print columns with null values
    print("\n")

print(financ_OIL.isnull().values.any())
financ_OIL.shape
United DataFrame size: (2173, 6)
Null values in United:
Series([], dtype: int64)


Alaskan DataFrame size: (2173, 6)
Null values in Alaskan:
Series([], dtype: int64)


American DataFrame size: (2173, 6)
Null values in American:
Series([], dtype: int64)


Allegiant DataFrame size: (2173, 6)
Null values in Allegiant:
Series([], dtype: int64)


JetBlue DataFrame size: (2173, 6)
Null values in JetBlue:
Series([], dtype: int64)


Hawaiian DataFrame size: (2173, 6)
Null values in Hawaiian:
Series([], dtype: int64)


Southwest DataFrame size: (2173, 6)
Null values in Southwest:
Series([], dtype: int64)


Spirit DataFrame size: (2173, 6)
Null values in Spirit:
Series([], dtype: int64)


Delta DataFrame size: (2173, 6)
Null values in Delta:
Series([], dtype: int64)


Frontier DataFrame size: (682, 6)
Null values in Frontier:
Series([], dtype: int64)


JETS DataFrame size: (2173, 6)
Null values in JETS:
Series([], dtype: int64)


False
Out[ ]:
(2172, 6)

This indicates that the data downloaded from Yahoo Finance is clean and does not have missing values that need to be handled.

In [ ]:
financ_airline_codes = {'United': 'UAL', 'Alaskan': 'ALK', 'American': 'AAL',
                        'Allegiant': 'ALGT', 'JetBlue': 'JBLU', 'Hawaiian': 'HA',
                        'Southwest': 'LUV', 'Spirit': 'SAVE', 'Delta': 'DAL',
                        'JETS': 'JETS'}

flight_airline_codes = {'American':'AA', 'Delta':'DL', 'Southwest':'WN',
                        'United':'UA', 'Alaskan':'AS', 'Spirit': 'NK',
                        'Hawaiian':'HA', 'Allegiant':'G4', 'JetBlue':'B6',
                        'Frontier':'F9'}
In [ ]:
financ_JETS.head() # displaying JETS ETF to check data and columns
Out[ ]:
Open High Low Close Adj Close Volume
Date
2015-04-30 24.493999 24.579000 24.129999 24.188000 23.045496 7300
2015-05-01 24.700001 24.969000 24.379999 24.870001 23.695282 182800
2015-05-04 25.100000 25.100000 24.600000 24.660000 23.495201 209000
2015-05-05 24.480000 24.480000 24.000000 24.000000 22.866375 75400
2015-05-06 24.010000 24.110001 23.590000 23.980000 22.847322 36900
In [ ]:
plt.figure(figsize=(14,7))

for airline, df in financ_airlines.items():
    plt.plot(df['Close'], label=airline, color=airline_colors[airline])

plt.title('Time Series Graph of Closing Prices ($)')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.legend()

plt.show()
No description has been provided for this image

A curiosity to note here would be the y-axis (-50 to 300). The graph is indicating there was a time when oil had gone negative. Upon further research, on 4/20/20, oil reached a low of -$37

Correlational Analysis¶


Now, we want to calculate the daily returns on each stock. We can use a for function here to loop through our airlines dictionary from earlier. Since the function produces an NA for the first date, we will fill that in with 0. We want to investigate daily returns as a percentage change of the previous day, so setting the first day equal to NA makes sense.

This will allow us to calculate how correlated each stock's movement is with one another.

In [ ]:
for airline, df in financ_airlines.items():
    df['Daily Return'] = df['Close'].pct_change().dropna()
financ_OIL['Daily Return'] = financ_OIL['Close'].pct_change().dropna()
In [ ]:
combined_returns = pd.DataFrame()

for airline, df in financ_airlines.items():
    combined_returns[airline] = df['Daily Return']

mean_returns = combined_returns.mean()
cov_returns = combined_returns.cov()
std_returns = combined_returns.std()
combined_returns.dropna(inplace=True)
combined_returns.head()
Out[ ]:
United Alaskan American Allegiant JetBlue Hawaiian Southwest Spirit Delta Frontier JETS
Date
2021-04-05 0.025424 0.023025 0.015507 0.021900 0.039273 0.005968 0.027243 0.003480 0.029080 0.014854 0.020702
2021-04-06 0.003205 0.037277 -0.007016 0.007277 0.003779 0.002596 0.017945 0.021339 0.027861 0.031887 0.008330
2021-04-07 0.001513 -0.018714 -0.005403 -0.005557 -0.009412 -0.026257 -0.009048 -0.019849 -0.028074 0.042553 -0.009698
2021-04-08 -0.002350 0.004561 -0.011701 0.001916 0.001900 -0.003038 -0.000945 0.003730 -0.007769 0.032556 -0.002902
2021-04-09 -0.016827 -0.000825 -0.004651 0.007211 -0.001423 -0.006476 -0.000630 -0.011680 -0.010841 0.005647 -0.006912

Creating a correlation matrix so we can examine relationships between shifts in daily returns between all companies.

In [ ]:
correlation_matrix = combined_returns.corr()
In [ ]:
import seaborn as sns

#create correlation matrix
cmap = sns.diverging_palette(10, 130, as_cmap=True)
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap = 'Blues', vmin=-1, vmax=1,
            xticklabels=correlation_matrix.columns,
            yticklabels=correlation_matrix.columns)
plt.title('Correlation Matrix of Daily Returns')
plt.show()
No description has been provided for this image

The matrix suggests that while the airline industry's stocks move together to a large extent. Analyzing these correlations over different time lags or during specific events, such as oil price shocks or global events affecting travel, to see if and how these relationships change.

In [ ]:
# combine oil returns with airline returns
financ_OIL['Daily Return'] = financ_OIL['Close'].pct_change()
combined_corr_data = financ_OIL[['Daily Return']].rename(columns={'Daily Return': 'Oil Return'})

for airline, df in financ_airlines.items():
    combined_corr_data[airline] = df['Close'].pct_change()
combined_corr_data.dropna(inplace = True)

combined_corr_data.head()
Out[ ]:
Oil Return United Alaskan American Allegiant JetBlue Hawaiian Southwest Spirit Delta Frontier JETS
Date
2021-04-05 -0.045565 0.025424 0.023025 0.015507 0.021900 0.039273 0.005968 0.027243 0.003480 0.029080 0.014854 0.020702
2021-04-06 0.011594 0.003205 0.037277 -0.007016 0.007277 0.003779 0.002596 0.017945 0.021339 0.027861 0.031887 0.008330
2021-04-07 0.007416 0.001513 -0.018714 -0.005403 -0.005557 -0.009412 -0.026257 -0.009048 -0.019849 -0.028074 0.042553 -0.009698
2021-04-08 -0.002844 -0.002350 0.004561 -0.011701 0.001916 0.001900 -0.003038 -0.000945 0.003730 -0.007769 0.032556 -0.002902
2021-04-09 -0.004698 -0.016827 -0.000825 -0.004651 0.007211 -0.001423 -0.006476 -0.000630 -0.011680 -0.010841 0.005647 -0.006912
In [ ]:
delays = [7, 14, 30,60,90,180,365]
airline_names = ['United', 'Alaskan', 'American', 'Allegiant', 'JetBlue', 'Hawaiian', 'Southwest', 'Spirit', 'Delta', 'JETS']

correlation_results = pd.DataFrame(index=airline_names, columns=delays)

for delay in delays:
    for airline in airline_names:
        shifted_returns = combined_corr_data[airline].shift(-delay)
        correlation = combined_corr_data['Oil Return'].corr(shifted_returns)
        correlation_results.at[airline, delay] = correlation

correlation_results.head(10)
Out[ ]:
7 14 30 60 90 180 365
United 0.008827 0.001552 -0.012239 0.025134 -0.063869 0.002089 -0.03858
Alaskan -0.023877 -0.000322 0.021835 -0.010173 -0.05357 0.036831 -0.015939
American -0.017728 0.001487 0.006128 0.044039 -0.058847 0.016144 -0.029939
Allegiant -0.010919 -0.022085 0.015095 0.049887 -0.068738 0.003521 -0.03009
JetBlue -0.023745 0.018836 0.042487 0.029154 -0.052595 0.075384 -0.042281
Hawaiian -0.026035 0.021067 0.001887 0.019371 0.009924 -0.023589 -0.073234
Southwest -0.031236 -0.004643 0.020085 0.010116 0.002631 0.000623 -0.040111
Spirit 0.027841 0.004721 0.003411 0.003366 -0.008387 0.056605 -0.043658
Delta 0.000773 -0.013916 0.00983 0.015437 -0.042934 0.012249 -0.018705
JETS -0.021892 0.001033 0.005418 0.02165 -0.051936 0.026966 -0.056063
In [ ]:
import scipy.stats as stats


significance_results = {}
n = 2160

correlations = {
    'United': [-0.049228, -0.030873, -0.025656, -0.079161, -0.012919, -0.016999, -0.004876],
    'Alaskan': [-0.041154, -0.019935, -0.021779, -0.066841, -0.004358, 0.001154, -0.000124],
    'American': [-0.056931, -0.016334, 0.00937, -0.070561, -0.003926, -0.005263, 0.007494],
    'Allegiant': [-0.030043, -0.073951, -0.046508, -0.029428, -0.025044, -0.018937, 0.008539],
    'JetBlue': [-0.061789, -0.036663, -0.01862, -0.05643, -0.004846, 0.003824, 0.003574],
    'Hawaiian': [-0.058751, -0.029893, -0.040023, -0.045276, 0.003318, -0.037189, 0.001015],
    'Southwest': [-0.042116, -0.020714, -0.035552, -0.045155, 0.006958, -0.007177, 0.000559],
    'Spirit': [-0.029643, 0.01245, -0.048053, -0.058677, -0.006752, 0.001502, 0.004846],
    'Delta': [-0.068438, -0.004271, -0.015327, -0.060281, -0.007192, -0.008411, -0.003624],
    'JETS': [-0.061341, -0.02731, -0.032695, -0.061719, -0.007318, -0.005128, -0.003519]
}


for airline, corr_values in correlations.items():
    significance_results[airline] = {}
    for delay_index, r in enumerate(corr_values):
        df = n - 2
        t_stat = r * (df ** 0.5) / ((1 - r ** 2) ** 0.5)
        p_value = 2 * (1 - stats.t.cdf(abs(t_stat), df))
        significance_results[airline][delays[delay_index]] = p_value

# Display the p-values
for airline in significance_results:
    print(f"Airline: {airline}")
    for delay in significance_results[airline]:
        print(f"Delay {delay} days: p-value = {significance_results[airline][delay]:.4f}")
Airline: United
Delay 7 days: p-value = 0.0221
Delay 14 days: p-value = 0.1515
Delay 30 days: p-value = 0.2333
Delay 60 days: p-value = 0.0002
Delay 90 days: p-value = 0.5484
Delay 180 days: p-value = 0.4297
Delay 365 days: p-value = 0.8208
Airline: Alaskan
Delay 7 days: p-value = 0.0558
Delay 14 days: p-value = 0.3544
Delay 30 days: p-value = 0.3117
Delay 60 days: p-value = 0.0019
Delay 90 days: p-value = 0.8396
Delay 180 days: p-value = 0.9573
Delay 365 days: p-value = 0.9954
Airline: American
Delay 7 days: p-value = 0.0081
Delay 14 days: p-value = 0.4480
Delay 30 days: p-value = 0.6634
Delay 60 days: p-value = 0.0010
Delay 90 days: p-value = 0.8553
Delay 180 days: p-value = 0.8069
Delay 365 days: p-value = 0.7278
Airline: Allegiant
Delay 7 days: p-value = 0.1628
Delay 14 days: p-value = 0.0006
Delay 30 days: p-value = 0.0307
Delay 60 days: p-value = 0.1716
Delay 90 days: p-value = 0.2446
Delay 180 days: p-value = 0.3790
Delay 365 days: p-value = 0.6916
Airline: JetBlue
Delay 7 days: p-value = 0.0041
Delay 14 days: p-value = 0.0885
Delay 30 days: p-value = 0.3871
Delay 60 days: p-value = 0.0087
Delay 90 days: p-value = 0.8219
Delay 180 days: p-value = 0.8590
Delay 365 days: p-value = 0.8681
Airline: Hawaiian
Delay 7 days: p-value = 0.0063
Delay 14 days: p-value = 0.1649
Delay 30 days: p-value = 0.0629
Delay 60 days: p-value = 0.0354
Delay 90 days: p-value = 0.8775
Delay 180 days: p-value = 0.0840
Delay 365 days: p-value = 0.9624
Airline: Southwest
Delay 7 days: p-value = 0.0503
Delay 14 days: p-value = 0.3359
Delay 30 days: p-value = 0.0986
Delay 60 days: p-value = 0.0359
Delay 90 days: p-value = 0.7465
Delay 180 days: p-value = 0.7389
Delay 365 days: p-value = 0.9793
Airline: Spirit
Delay 7 days: p-value = 0.1685
Delay 14 days: p-value = 0.5631
Delay 30 days: p-value = 0.0255
Delay 60 days: p-value = 0.0064
Delay 90 days: p-value = 0.7538
Delay 180 days: p-value = 0.9444
Delay 365 days: p-value = 0.8219
Airline: Delta
Delay 7 days: p-value = 0.0015
Delay 14 days: p-value = 0.8427
Delay 30 days: p-value = 0.4765
Delay 60 days: p-value = 0.0051
Delay 90 days: p-value = 0.7383
Delay 180 days: p-value = 0.6960
Delay 365 days: p-value = 0.8663
Airline: JETS
Delay 7 days: p-value = 0.0043
Delay 14 days: p-value = 0.2045
Delay 30 days: p-value = 0.1287
Delay 60 days: p-value = 0.0041
Delay 90 days: p-value = 0.7339
Delay 180 days: p-value = 0.8117
Delay 365 days: p-value = 0.8702
In [ ]:
# create combined line chart for all airlines with significance stars for correlations
p_values_data = {
    'United': [0.0221, 0.1515, 0.2333, 0.0002, 0.5484, 0.4297, 0.8208],
    'Alaskan': [0.0558, 0.3544, 0.3117, 0.0019, 0.8396, 0.9573, 0.9954],
    'American': [0.0081, 0.4480, 0.6634, 0.0010, 0.8553, 0.8069, 0.7278],
    'Allegiant': [0.1628, 0.0006, 0.0307, 0.1716, 0.2446, 0.3790, 0.6916],
    'JetBlue': [0.0041, 0.0885, 0.3871, 0.0087, 0.8219, 0.8590, 0.8681],
    'Hawaiian': [0.0063, 0.1649, 0.0629, 0.0354, 0.8775, 0.0840, 0.9624],
    'Southwest': [0.0503, 0.3359, 0.0986, 0.0359, 0.7465, 0.7389, 0.9793],
    'Spirit': [0.1685, 0.5631, 0.0255, 0.0064, 0.7538, 0.9444, 0.8219],
    'Delta': [0.0015, 0.8427, 0.4765, 0.0051, 0.7383, 0.6960, 0.8663],
    'JETS': [0.0043, 0.2045, 0.1287, 0.0041, 0.7339, 0.8117, 0.8702]
}

# Initialize the plot
plt.figure(figsize=(14, 10))

# Plotting the correlation values for each airline
for airline, corr_values in correlations.items():
    p_values = p_values_data[airline]
    plt.plot(delays, corr_values, marker='o', label=airline)
    # Adding stars for significant correlations
    for delay, p_value, corr in zip(delays, p_values, corr_values):
        if p_value < 0.05:
            plt.plot(delay, corr, 'r*', markersize=12)

plt.title('Correlation of Oil Returns with Airline Returns Over Various Time Delays')
plt.xlabel('Time Delay (days)')
plt.ylabel('Correlation Coefficient')
plt.legend()
plt.grid(False)
plt.show()
No description has been provided for this image

The above plot shows the correlation coefficient between oil daily returns and airline returns dependent on time delays set to test the correlation. We see the highest number of significant correlations occur at 60 days. The daily return of airline stocks is correlated with oil returns from 60 days prior.

Load in Data about operational metrics¶

In [ ]:
#filter by carriers we are analyzing
financials = financials[financials['CARRIER'].isin(['AA', 'DL', 'WN', 'UA', 'AS', 'F9', 'HA', 'G4', 'B6', 'NK'])]
financials
Out[ ]:
NET_INCOME OP_PROFIT_LOSS TRANS_REV_PAX MAIL TOTAL_PROPERTY PROP_FREIGHT PROP_BAG TOTAL_CHARTER CHARTER_PAX CHARTER_PROP ... UNIQUE_CARRIER UNIQUE_CARRIER_NAME CARRIER CARRIER_NAME UNIQUE_CARRIER_ENTITY REGION CARRIER_GROUP_NEW CARRIER_GROUP YEAR QUARTER
2 -294047.62 -294047.62 895106.59 7515.70 NaN 124382.84 8503.14 NaN 0.00 0.0 ... AA American Airlines Inc. AA American Airlines Inc. 10049 A 3 3 2023 1
3 -258801.65 -234102.19 5015655.46 10224.02 NaN 68087.22 210466.52 NaN 17355.33 NaN ... UA United Air Lines Inc. UA United Air Lines Inc. 0A875 D 3 3 2023 1
4 -233930.65 -295070.82 1464226.29 0.00 NaN 101.75 104368.02 NaN 224.13 0.0 ... B6 JetBlue Airways B6 JetBlue Airways 06673 D 3 3 2023 1
5 -185853.00 437900.00 5914695.00 9697.00 NaN 60430.00 172908.00 NaN 77822.00 85.0 ... DL Delta Air Lines Inc. DL Delta Air Lines Inc. 01260 D 3 3 2023 1
6 -153850.00 -274205.00 4697111.00 587.00 NaN 38321.00 15128.00 NaN 5787.00 0.0 ... WN Southwest Airlines Co. WN Southwest Airlines Co. 06725 D 3 3 2023 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1336 6542.00 22237.00 637215.00 0.00 NaN 9743.00 26125.00 NaN 3724.00 0.0 ... DL Delta Air Lines Inc. DL Delta Air Lines Inc. 10260 L 3 3 2020 1
1350 9127.98 14530.18 30357.19 0.00 NaN 0.00 4681.78 NaN 0.00 0.0 ... F9 Frontier Airlines Inc. F9 Frontier Airlines Inc. 16461 L 3 3 2020 1
1356 11141.00 30232.00 744195.00 5898.00 NaN 58152.00 16292.00 NaN 8705.00 0.0 ... DL Delta Air Lines Inc. DL Delta Air Lines Inc. 10261 A 3 3 2020 1
1366 17296.17 -80572.10 130274.24 0.00 NaN 0.00 17308.07 NaN 0.00 0.0 ... F9 Frontier Airlines Inc. F9 Frontier Airlines Inc. 06460 D 3 3 2020 2
1384 29778.84 38659.28 299862.94 0.00 NaN 0.00 57336.92 NaN 8918.84 0.0 ... G4 Allegiant Air G4 Allegiant Air 06035 D 3 3 2020 1

358 rows × 52 columns

The above dataframe has data about carriers we are not analyzing, so we can remove them. We then want to group operational profits/losses by unique carrier.

In [ ]:
grouped_income = pd.DataFrame()
grouped_income = financials.groupby(['UNIQUE_CARRIER', 'YEAR', 'QUARTER'])['OP_PROFIT_LOSS'].sum()
grouped_income_df = grouped_income.reset_index()
In [ ]:
grouped_income_new = pd.DataFrame()
grouped_income_new = financials.groupby(['UNIQUE_CARRIER', 'YEAR', 'QUARTER'])['OP_PROFIT_LOSS'].sum()
grouped_income_df_new = grouped_income.reset_index()

# set back quarter by 1 since these operating losses aren't known until the future
# we want to use current operating losses (posted at end of quarter) to predict declines
grouped_income_df_new['QUARTER'] = grouped_income_df_new['QUARTER'] - 1

for i in grouped_income_df_new.index:
    if grouped_income_df_new.at[i, 'QUARTER'] == 0:
        grouped_income_df_new.at[i, 'QUARTER'] = 4
        grouped_income_df_new.at[i, 'YEAR'] -= 1


grouped_income_df_new
Out[ ]:
UNIQUE_CARRIER YEAR QUARTER OP_PROFIT_LOSS
0 AA 2019 4 -2535836.17
1 AA 2020 1 -4459137.82
2 AA 2020 2 -4945286.70
3 AA 2020 3 -2438663.13
4 AA 2020 4 -3439576.16
... ... ... ... ...
135 WN 2022 1 1158160.00
136 WN 2022 2 395148.00
137 WN 2022 3 -385541.00
138 WN 2022 4 -284229.00
139 WN 2023 1 794871.00

140 rows × 4 columns

In [ ]:
plt.figure(figsize=(10, 6))
for carrier in grouped_income_df['UNIQUE_CARRIER'].unique():
    subset = grouped_income_df[grouped_income_df['UNIQUE_CARRIER'] == carrier]
    plt.plot(subset['YEAR'].astype(str) + ' Q' + subset['QUARTER'].astype(str),
             subset['OP_PROFIT_LOSS'], label=carrier, marker='o')

plt.title('Operational Profit/Loss by Quarter for Market Carriers (2020-2023)')
plt.xlabel('Year and Quarter')
plt.ylabel('Operational Profit/Loss')
plt.xticks(rotation=45)
plt.legend()
plt.grid(False)
plt.tight_layout()
plt.show()
No description has been provided for this image

All carriers experienced fluctuations in their operational profits/losses across the quarters. Most carriers had a downward trend starting in Q1 2021, with losses peaking in Q2 or Q3 of the same year. A general recovery seems to occur from Q3 2021 onwards, with many lines showing an upward trend. However, in the first two quarters of 2023, there appears to be a significant split with some carriers showing sharp increases in profits while others show decreases or remain relatively stable.

Stock Price Trend Transition Probabilities¶


Process Overview¶

  1. For each stock (airline), calculate the monthly price change: $$ \text{Price Change} = \text{Close}_{t} - \text{Close}_{t-1} $$

  2. Label the trend based on the price change: $$ \text{Trend} = \begin{cases} \text{'up'} & \text{if Price Change} > 0 \\ \text{'down'} & \text{if Price Change} < 0 \\ \text{'no change'} & \text{if Price Change} = 0 \end{cases} $$

  3. Calculate the transition probabilities:

    • Define the previous month's trend: $$ \text{Prev Trend} = \text{Trend}_{t-1} $$
    • Count the transitions (up-up, down-down, up-down, down-up).
    • Calculate the probabilities of each transition: $$ P(\text{Transition}) = \frac{\text{Count of Transition}}{\text{Total Transitions}} $$

Transition Matrix¶

The transition matrix is formed as follows: $$ \begin{array}{|c|c|c|c|c|} \hline \text{Airline} & \text{up-up} & \text{up-down} & \text{down-up} & \text{down-down} \\ \hline \text{Airline 1} & P_{\text{up-up}} & P_{\text{up-down}} & P_{\text{down-up}} & P_{\text{down-down}} \\ \text{Airline 2} & \dots & \dots & \dots & \dots \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ \hline \end{array} $$

Here, $P_\text{transition type}$ represents the probability of a specific transition type for an airline.

In [ ]:
for airline, df in financ_airlines.items():
    # calculate difference in two day consecutive 'Close' price
    df['Price Change'] = df['Close'].diff()

    # label if price went up or down
    df['Trend'] = df['Price Change'].apply(lambda x: 'up' if x > 0 else ('down' if x < 0 else 'no change'))

def calculate_transition_probabilities(df):
    # shifted column to compare each day's trend with the previous day's
    df['Prev Trend'] = df['Trend'].shift(1)

    # transition state
    transitions = ['up-up', 'down-down', 'up-down', 'down-up']

    # initialize dictionary to hold the counts of each transition
    transition_counts = {transition: 0 for transition in transitions}

    # loop through and count the transitions
    for prev, current in zip(df['Prev Trend'][1:], df['Trend'][1:]):  # Skip the first entry (NaN)
        if prev == 'up' and current == 'up':
            transition_counts['up-up'] += 1
        elif prev == 'down' and current == 'down':
            transition_counts['down-down'] += 1
        elif prev == 'up' and current == 'down':
            transition_counts['up-down'] += 1
        elif prev == 'down' and current == 'up':
            transition_counts['down-up'] += 1

    # sum the total number of transitions and calculate probabilities
    total_transitions = sum(transition_counts.values())
    transition_probabilities = {transition: count / total_transitions for transition, count in transition_counts.items()}
    return transition_probabilities


transitions = ['up-up', 'up-down', 'down-up', 'down-down']

transition_matrix = pd.DataFrame(columns=transitions)

for airline, df in financ_airlines.items():
    transition_probs = calculate_transition_probabilities(df)
    transition_matrix.loc[airline] = transition_probs.values()

print(transition_matrix)
              up-up   up-down   down-up  down-down
United     0.266790  0.239463  0.247337   0.246410
Alaskan    0.250348  0.238737  0.255922   0.254993
American   0.250117  0.264610  0.242637   0.242637
Allegiant  0.236793  0.264133  0.249305   0.249768
JetBlue    0.225991  0.255614  0.258958   0.259436
Hawaiian   0.233882  0.252235  0.257412   0.256471
Southwest  0.252666  0.242930  0.252202   0.252202
Spirit     0.215377  0.263085  0.261232   0.260306
Delta      0.260159  0.231667  0.255021   0.253153
Frontier   0.203927  0.324773  0.234139   0.237160
JETS       0.264720  0.235045  0.249647   0.250589
In [ ]:
plt.figure(figsize=(10, 8))
plt.title('Transition Probability Heatmap')

sns.heatmap(transition_matrix, annot=True, cmap='Blues', fmt=".3f")

plt.title('Transition Probability Heatmap')
plt.ylabel('Airlines')
plt.xlabel('Transition Type')

plt.show()
No description has been provided for this image

The heatmap of trend transition probabilities suggests the following:

  • Stocks with higher 'up-up' and 'down-down' probabilities might show stronger trend persistence, meaning that once they start moving in a certain direction, they are more likely to continue in that direction for at least another day.
  • Conversely, stocks with higher 'up-down' and 'down-up' probabilities tend to have more frequent trend reversals, suggesting higher volatility and possibly less predictable movements.
  • The relative size of these probabilities can inform investors about the nature of the stock's price movements, which can be important for trading strategies that depend on momentum or mean reversion.

Marketing Carrier On-Time Performance Data¶

Marketing Carrier On-Time Performance Data ETL¶


This flight data was obtained from the Bureau of Transportation Statistics Airline On-Time Performance Data Base. The data table used for the following data is Marketing Carrier On-Time Performance (Beginning January 2018).

On-Time : Marketing Carrier On-Time Performance (Beginning January 2018)

In [ ]:
# Load in flight Data
flight_cols = ['MKT_UNIQUE_CARRIER', 'OP_UNIQUE_CARRIER','YEAR', 'QUARTER', 'MONTH', 'DAY_OF_MONTH',
       'DAY_OF_WEEK', 'FL_DATE', 'MKT_CARRIER','OP_CARRIER', 'TAIL_NUM',
        'ORIGIN', 'DEST','CRS_DEP_TIME', 'DEP_TIME', 'DEP_DELAY',
        'DEP_DELAY_NEW', 'DEP_DEL15','DEP_DELAY_GROUP',
        'TAXI_OUT', 'WHEELS_OFF','WHEELS_ON', 'TAXI_IN', 'CRS_ARR_TIME',
        'ARR_TIME', 'ARR_DELAY','ARR_DELAY_NEW', 'ARR_DEL15', 'ARR_DELAY_GROUP',
        'CANCELLED', 'CANCELLATION_CODE', 'DIVERTED']

airline_code = ['AA', 'DL', 'WN', 'UA', 'AS', 'F9', 'HA', 'G4', 'B6', 'NK']
file_dict = {}
for airline in airline_code:
  file_dict[airline] = glob.glob('../cmps3160-project/202*-*/{}*.csv'.format(airline))

Due to upload size restrictions on git the orginal files had to be broken into multiple smaller files. Below the function combines all files with the same market carrier into one DataFrame.

In [ ]:
dtype_cols = {'MKT_UNIQUE_CARRIER':'str', 'OP_UNIQUE_CARRIER':'str','FL_DATE':'str', 'MKT_CARRIER':'str','OP_CARRIER':'str', 'TAIL_NUM':'str',
        'ORIGIN':'str', 'DEST':'str','CANCELLED':'int', 'CANCELLATION_CODE':'str', 'DIVERTED':'int'}
In [ ]:
def df_airline(input):
  frame  = pd.read_csv(file_dict[input][0], usecols = flight_cols, dtype = dtype_cols)
  for file in file_dict[input][1:]:
    frame2 = pd.read_csv(file, usecols = flight_cols, dtype = dtype_cols)
    frame = pd.concat([frame, frame2], ignore_index=True)
  frame['FL_DATE'] = frame['FL_DATE'].apply(lambda x: x[:-12])
  frame['FL_DATE'] = pd.to_datetime(frame['FL_DATE'])
  frame = pd.get_dummies(frame, columns=['CANCELLATION_CODE'])
  can_cols_names = {'CANCELLATION_CODE_A':'CANCEL_CODE_CARRIER', 'CANCELLATION_CODE_B':'CANCEL_CODE_WEATHER', 'CANCELLATION_CODE_C':'CANCEL_CODE_NAS', 'CANCELLATION_CODE_D':'CANCEL_CODE_SECURITY'}
  frame.rename(columns=can_cols_names, inplace=True)
  return frame
In [ ]:
df_AA = df_airline('AA') # american
df_DL = df_airline('DL') # delta
df_WN = df_airline('WN') # southwest
df_UA = df_airline('UA') # united
df_AS = df_airline('AS') # alaskan
df_NK = df_airline('NK') # spirit
df_HA = df_airline('HA') # hawaiian
df_G4 = df_airline('G4') # allegiant
df_B6 = df_airline('B6') # jetblue
df_F9 = df_airline('F9') # frontier

flight_airlines = {'AA':df_AA, 'DL':df_DL, 'WN':df_WN,
                        'UA':df_UA, 'AS':df_AS, 'NK': df_NK,
                        'HA':df_HA, 'G4':df_G4, 'B6':df_B6,
                        'F9':df_F9}

The following is a DataFrame with all flights that took place from June 2022 to June 2023. Each row of the DataFrame corresponds to one flight.

In [ ]:
flight_data = pd.concat([df_AA, df_DL, df_WN, df_UA, df_AS, df_NK, df_HA, df_G4, df_B6, df_F9], axis=0)
In [ ]:
flight_data.head()
Out[ ]:
MKT_UNIQUE_CARRIER YEAR QUARTER MONTH DAY_OF_MONTH DAY_OF_WEEK FL_DATE MKT_CARRIER OP_UNIQUE_CARRIER OP_CARRIER ... ARR_DELAY ARR_DELAY_NEW ARR_DEL15 ARR_DELAY_GROUP CANCELLED DIVERTED CANCEL_CODE_CARRIER CANCEL_CODE_WEATHER CANCEL_CODE_NAS CANCEL_CODE_SECURITY
0 AA 2023 2 5 16 2 2023-05-16 AA AA AA ... -32.0 0.0 0.0 -2.0 0 0 0 0 0.0 0.0
1 AA 2023 2 5 16 2 2023-05-16 AA AA AA ... 4.0 4.0 0.0 0.0 0 0 0 0 0.0 0.0
2 AA 2023 2 5 16 2 2023-05-16 AA AA AA ... 12.0 12.0 0.0 0.0 0 0 0 0 0.0 0.0
3 AA 2023 2 5 16 2 2023-05-16 AA AA AA ... -9.0 0.0 0.0 -1.0 0 0 0 0 0.0 0.0
4 AA 2023 2 5 16 2 2023-05-16 AA AA AA ... -28.0 0.0 0.0 -2.0 0 0 0 0 0.0 0.0

5 rows × 35 columns

The function delay below calculate the number of delays that the market carrier airline has each day. This can be used to calculate the number of arrival delays or departure delays. The FAA considers a flight that is more that 15 minutes late delay. The columns DEP_DEL15 and ARR_DEL15 are binary with 0 representing not delayed and 1 representing delayed.

In [ ]:
# Calculate number of delays using the binary column DEP_DEL15 or ARR_DEL15
def delays(DataFrame, airline, column):
  df = DataFrame.groupby('FL_DATE')[column].sum()
  df.rename('{a}_NUM_{b}'.format(a=airline, b=column), inplace=True)
  return df

The following function counts the number of flights an airline as the market carrier flies and the number of cancelations that the airline has in a given day as the market carrier. Two series are created one that counts the number of flights each day and one that counts the number of cancelations each day.

In [ ]:
# Calculates the number of flight per day and number of cancelations per day
def num_flights(DataFrame, airline):
  cancelled = DataFrame.groupby('CANCELLED')
  df_FL = cancelled.get_group(0)['FL_DATE'].value_counts()
  df_FL.rename('{}_NUM_FL'.format(airline), inplace=True)

  df_CAN = cancelled.get_group(1)['FL_DATE'].value_counts()
  df_CAN.rename('{}_NUM_CAN'.format(airline), inplace=True)
  return df_FL, df_CAN
In [ ]:
# Calculates the different types of cancelations
def cancel_type(DataFrame, airline):
  cancelled = DataFrame.groupby('CANCELLED')
  df_CAN = cancelled.get_group(1)
  df_CAN = df_CAN.drop(['MKT_UNIQUE_CARRIER', 'YEAR', 'QUARTER', 'MONTH', 'DAY_OF_MONTH',
       'DAY_OF_WEEK', 'MKT_CARRIER', 'OP_UNIQUE_CARRIER',
       'OP_CARRIER', 'TAIL_NUM', 'ORIGIN', 'DEST', 'CRS_DEP_TIME', 'DEP_TIME',
       'DEP_DELAY', 'DEP_DELAY_NEW', 'DEP_DEL15', 'DEP_DELAY_GROUP',
       'TAXI_OUT', 'WHEELS_OFF', 'WHEELS_ON', 'TAXI_IN', 'CRS_ARR_TIME',
       'ARR_TIME', 'ARR_DELAY', 'ARR_DELAY_NEW', 'ARR_DEL15',
       'ARR_DELAY_GROUP', 'CANCELLED', 'DIVERTED'], axis=1)
  df_CAN = df_CAN.groupby('FL_DATE').sum()
  return df_CAN

Using the functions above a DataFrame is created with each market carrier airline series as a column. Each of the functions below has the flight date as the index and the values indicated in the comments for each airline are the columns of the DataFrame.

In [ ]:
# Number of Departure Delays
df_DEP_DEL = delays(df_AA, 'AA', 'DEP_DEL15')

# Number of Arrival Delays
df_ARR_DEL = delays(df_AA, 'AA', 'ARR_DEL15')

# DataFrames for Number of Flights and Number of Cancelations
df_NUM_FL, df_NUM_CAN = num_flights(df_AA, 'AA')

# Proportion of ALL flights that are canceled
df_PROP_CAN_FL = pd.DataFrame()
df_PROP_CAN_FL['AA_PROP_CAN_FL'] = df_NUM_CAN/df_NUM_FL


for airline in airline_code[1:]:
  df_DEP_DEL = pd.concat([df_DEP_DEL, delays(flight_airlines[airline], airline, 'DEP_DEL15')], axis=1)
  df_ARR_DEL = pd.concat([df_ARR_DEL, delays(flight_airlines[airline], airline,'ARR_DEL15')], axis=1)

  num_fl, num_can = num_flights(flight_airlines[airline], airline)
  prop = pd.DataFrame()
  prop['{}_PROP_CAN_FL'.format(airline)] = num_can / num_fl
  df_NUM_FL = pd.concat([df_NUM_FL, num_fl], axis=1)
  df_NUM_CAN = pd.concat([df_NUM_CAN, num_can], axis=1)
  df_PROP_CAN_FL = pd.concat([df_PROP_CAN_FL, prop], axis=1)

df_DEP_DEL = df_DEP_DEL.sort_index()
df_ARR_DEL = df_ARR_DEL.sort_index()
df_NUM_FL = df_NUM_FL.sort_index()
df_NUM_CAN = df_NUM_CAN.sort_index()
df_PROP_CAN_FL =  df_PROP_CAN_FL.sort_index()

Flight Data EDA¶

The bar plot below so the total daily departure delays for all market carrier airlines. The bar plot is a stacked bar plot with various colors showing that airlines number of delays. Since departure delays often result in arrival delays the plots very simillar.

In [ ]:
# Plotting Total Daily Depature Delays
fig, ax = plt.subplots(figsize=(14,7))

bottom = np.zeros(len(df_ARR_DEL))
for col in df_DEP_DEL.columns:
  p = ax.bar(df_DEP_DEL.index.strftime('%Y-%m-%d'), df_DEP_DEL[col], label = col[0:2], bottom = bottom)
  bottom += df_DEP_DEL[col]

plt.title('Total Daily Depature Delays')
plt.xlabel('Date')
plt.ylabel('Number of Departure Delays')
ax.set_xticks(ax.get_xticks()[::7])
plt.xticks(rotation=90)
ax.legend(title = 'Market Carrier', loc='upper right')
plt.show()
No description has been provided for this image
In [ ]:
# Plotting Total Daily Arrival Delays
fig, ax = plt.subplots(figsize=(14,7))

bottom = np.zeros(len(df_ARR_DEL))
for col in df_ARR_DEL.columns:
  p = ax.bar(df_ARR_DEL.index.strftime('%Y-%m-%d'), df_ARR_DEL[col], label = col[0:2], bottom = bottom)
  bottom += df_ARR_DEL[col]

plt.title('Total Arrival Delays')
plt.xlabel('Date')
plt.ylabel('Number of Arrival Delays')
ax.set_xticks(ax.get_xticks()[::7])
plt.xticks(rotation=90)
ax.legend(title = 'Market Carrier', loc='upper right')
plt.show()
No description has been provided for this image

Another way to plot departure delays is to look at the delays by airline. A line plot was used to show this below.

In [ ]:
# Plotting Market Carrier Daily Depature Delays
fig, ax = plt.subplots(figsize=(14,7))
legend_DEP_DEL = {}
for col in df_DEP_DEL.columns:
  plt.plot(df_DEP_DEL.index.strftime('%Y-%m-%d'), df_DEP_DEL[col])
  legend_DEP_DEL[col[0:2]] = col

plt.title('Market Carrier Daily Depature Delays')
plt.xlabel('Date')
plt.ylabel('Number of Depature Delays')
ax.set_xticks(ax.get_xticks()[::7])
plt.xticks(rotation=90)
ax.legend(legend_DEP_DEL, title = 'Market Carrier', loc='upper right')
plt.show()
No description has been provided for this image

The next two plots show the number of flights by airline and the number of cancelations by airline.

In [ ]:
#Plotting Market Carrier Daily Number of Flights

fig, ax = plt.subplots(figsize=(14,7))
legend_NUM_FL= {}
for col in df_NUM_FL.columns:
  plt.plot(df_NUM_FL.index.strftime('%Y-%m-%d'), df_NUM_FL[col])
  legend_NUM_FL[col[0:2]] = col

plt.title('Market Carrrier Daily Number of Flights')
plt.xlabel('Date')
plt.ylabel('Number of Flights')
ax.set_xticks(ax.get_xticks()[::7])
plt.xticks(rotation=90)
ax.legend(legend_NUM_FL, title = 'Market Carrier', loc='upper right')
plt.show()
No description has been provided for this image
In [ ]:
#Plotting Market Carrier Daily Number of Cancelations

fig, ax = plt.subplots(figsize=(14,7))
legend_NUM_CAN= {}
for col in df_NUM_CAN.columns:
  plt.plot(df_NUM_CAN.index.strftime('%Y-%m-%d'), df_NUM_CAN[col])
  legend_NUM_CAN[col[0:2]] = col

plt.title('Market Carrier Daily Number of Canelations')
plt.xlabel('Date')
plt.ylabel('Number of Canelations')
ax.set_xticks(ax.get_xticks()[::7])
plt.xticks(rotation=90)
ax.legend(legend_NUM_CAN, title = 'Market Carrier', loc='upper right')
plt.show()
No description has been provided for this image
In [ ]:
# Create a Correlation Martix for the Proportion of Cancelled Flights by Market Carrier
MKT_code = {'AA':'American', 'DL':'Delta', 'WN':'Southwest',
                        'UA':'United', 'AS': 'Alaskan', 'NK':'Spirit',
                        'HA':'Hawaiian', 'G4':'Allegiant', 'B6':'JetBlue',
                        'F9':'Frontier'}

prop_can_corr_matrix = df_PROP_CAN_FL.corr()

label_names = []

for item in list(prop_can_corr_matrix.columns):
  code = item[0:2]
  name = MKT_code[code]
  label_names.append(name)

cmap = sns.diverging_palette(10, 130, as_cmap=True)
plt.figure(figsize=(10, 8))
sns.heatmap(prop_can_corr_matrix, annot=True, cmap = 'Blues', vmin=0, vmax=1,
            xticklabels=label_names,
            yticklabels=label_names)
plt.title('Correlation Matrix of the Proportion of Cancelled Flights (by Market Carrier)')
plt.show()
No description has been provided for this image

Since some market carriers use operating carriers to fulfil some of their flights. Looking at the proportion of a market carriers flights that are not operated by the market carrier could help to indicate the market carriers current demands.

In [ ]:
def inner_op_mkt(mkt_carrier, airline):
  df = mkt_carrier[airline].to_frame().rename(columns={'FL_DATE':'NUM_SCH_FL'}).reset_index()
  df = df.set_index('FL_DATE')

  op_not_mkt = df[df['OP_UNIQUE_CARRIER']!= airline]
  op_not_mkt = op_not_mkt['NUM_SCH_FL'].rename('OP_NOT_MKT').to_frame().reset_index()
  op_not_mkt = op_not_mkt.groupby('FL_DATE')['OP_NOT_MKT'].sum()

  op_mkt = df[df['OP_UNIQUE_CARRIER'] == airline]
  op_mkt = op_mkt['NUM_SCH_FL'].rename('OP_MKT')

  df_mkt = op_mkt.to_frame().join(op_not_mkt, on='FL_DATE')
  df_mkt['PROP_OP_BY_MKT'] = df_mkt['OP_MKT']/(df_mkt['OP_MKT']+ df_mkt['OP_NOT_MKT'])

  df_mkt = df_mkt.drop(['OP_MKT', 'OP_NOT_MKT'], axis=1)
  return df_mkt
In [ ]:
# status {0, 1}
# Flights = 0
# Canceled = 1
def all_op_mkt(status):
  if status == 0 or  status == 1:
    flights_status = flight_data.groupby('CANCELLED')
    fl_status = flights_status.get_group(status)

    fl_type = {0:'FL', 1:'CAN'}
  else:
    return None
  airline_code = ['AA', 'DL', 'WN', 'UA', 'AS', 'F9', 'HA', 'G4', 'B6', 'NK']

  mkt_carrier = fl_status.groupby(['MKT_UNIQUE_CARRIER', 'OP_UNIQUE_CARRIER'])['FL_DATE'].value_counts()
  df_op_mkt = inner_op_mkt(mkt_carrier, airline_code[0]).add_suffix('_{}'.format(airline_code[0]))
  df_op_mkt = df_op_mkt.add_prefix('{}_'.format(fl_type[status]))

  for airline in airline_code[1:]:
    df_mkt = inner_op_mkt(mkt_carrier, airline)

    if  df_mkt['PROP_OP_BY_MKT'].isnull().all() == True:
      df_mkt['PROP_OP_BY_MKT'] = 0
      df_mkt = df_mkt.add_suffix('_{}'.format(airline))
    else:
      df_mkt = df_mkt.add_suffix('_{}'.format(airline))

    df_mkt = df_mkt.add_prefix('{}_'.format(fl_type[status]))


    df_op_mkt = df_op_mkt.join(df_mkt, on='FL_DATE')

  df_op_mkt.dropna(how='all', axis=1, inplace=True)
  df_op_mkt.fillna(0, inplace=True)
  df_op_mkt = df_op_mkt.sort_index()


  return df_op_mkt
In [ ]:
FL_op_mkt = all_op_mkt(0) # only flights
CAN_op_mkt = all_op_mkt(1) # only cancelled flight

The two plots below plot the proportion of flights, and cancellations that are operated by the market carrier.

In [ ]:
# Plotting Proportion of cancelations operated by the market carrier
fig, ax = plt.subplots(figsize=(14,7))

bottom = np.zeros(len( CAN_op_mkt))
for col in CAN_op_mkt.replace(0,np.nan).dropna(axis=1,how="all").columns:
  p = ax.bar(CAN_op_mkt.index.strftime('%Y-%m-%d'), CAN_op_mkt[col], label = col[-2:], bottom = bottom)
  bottom += CAN_op_mkt[col]

plt.title('Proportion of Cancellations Operated by Market Carrier')
plt.xlabel('Date')
plt.ylabel('Proportion of Cancellations')
ax.set_xticks(ax.get_xticks()[::7])
plt.xticks(rotation=90)
ax.legend(title = 'Market Carrier', loc='upper right')
plt.show()
No description has been provided for this image

Air Carrier Statistics (Form 41 Traffic)- U.S. Carriers¶

Data Base Profile

T-100 Domestic Segment (U.S. Carriers) Data ETL¶


This flight data was obtained from the Bureau of Transportation Statistics Air Carrier Statistics (Form 41 Traffic)- U.S. Carriers Data Base. The data table used for the following data is T-100 Domestic Segment (U.S. Carriers). This data table contains monthly domestic non-stop segment data reported by U.S. air carriers.

T-100 Domestic Segment (U.S. Carriers)

In [ ]:
# Load in T-100 Domestic Segment (U.S. Carriers)
T_100_lst = glob.glob('../cmps3160-project/T-100 Domestic Segment (U.S. Carriers)/*.csv')
In [ ]:
df_air_stats = pd.read_csv(T_100_lst[0])

for item in T_100_lst[1:]:
  frame =  pd.read_csv(item)
  df_air_stats = pd.concat([df_air_stats, frame], ignore_index=True)
In [ ]:
df_air_stats.head()
Out[ ]:
UNIQUE_CARRIER DEPARTURES_SCHEDULED DEPARTURES_PERFORMED PAYLOAD SEATS PASSENGERS FREIGHT MAIL DISTANCE RAMP_TO_RAMP ... DEST_STATE_NM DEST_WAC AIRCRAFT_GROUP AIRCRAFT_TYPE AIRCRAFT_CONFIG YEAR QUARTER MONTH DISTANCE_GROUP CLASS
0 YV 0.0 0.0 0.0 0.0 0.0 0.0 0.0 24.0 32.0 ... Virginia 38 6 673 1 2022 3 7 1 F
1 YV 0.0 0.0 0.0 0.0 0.0 0.0 0.0 45.0 35.0 ... Virginia 38 6 673 1 2022 3 7 1 F
2 YV 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70.0 52.0 ... Pennsylvania 23 6 673 1 2022 1 3 1 F
3 YV 0.0 0.0 0.0 0.0 0.0 0.0 0.0 83.0 32.0 ... Florida 33 6 673 1 2022 2 5 1 F
4 YV 0.0 0.0 0.0 0.0 0.0 0.0 0.0 83.0 52.0 ... North Carolina 36 6 673 1 2022 2 6 1 F

5 rows × 45 columns

In [ ]:
df_stat_groups = df_air_stats.groupby('UNIQUE_CARRIER')
In [ ]:
full_seats={}
freight_weight={}

Calculate proportion of full seats

In [ ]:
# Proportion of Full Seats
for airline in airline_code:
  inner_dict = {}
  yr_month_group = df_air_stats.groupby(['YEAR', 'MONTH'])
  for i in yr_month_group.groups.keys():
    frame = yr_month_group.get_group(i)
    yr = frame['YEAR'].unique().tolist()[0]
    month = frame['MONTH'].unique().tolist()[0]
    month_prop_full = frame['PASSENGERS'].sum() / frame['SEATS'].sum()
    inner_dict[i] = month_prop_full
  full_seats[airline] = inner_dict

Calculate proportion of weight from freight and mail

In [ ]:
# Proportion of Weight from Freight and Mail
for airline in airline_code:
  inner_dict = {}
  yr_month_group = df_air_stats.groupby(['YEAR', 'MONTH'])
  for i in yr_month_group.groups.keys():
    frame = yr_month_group.get_group(i)
    yr = frame['YEAR'].unique().tolist()[0]
    month = frame['MONTH'].unique().tolist()[0]
    month_prop_freight = (frame['FREIGHT'].sum() + frame['MAIL'].sum())/ frame['PAYLOAD'].sum()
    inner_dict[i] = month_prop_freight
  freight_weight[airline] = inner_dict
In [ ]:
# DataFrames of proportion of full seats and proportion of weight from freight and mail
df_full_seats = pd.DataFrame.from_dict(full_seats).add_suffix('_PROP_FULL_SEATS')
df_freight_weight = pd.DataFrame.from_dict(freight_weight).add_suffix('_PROP_FREIGHT/MAIL')

df_full_seats.index.rename(['YEAR', 'MONTH'], inplace=True)
df_freight_weight.index.rename(['YEAR', 'MONTH'], inplace=True)

dataframe2  = pd.DataFrame()
date_range = pd.date_range(start=df_AA['FL_DATE'].min(),end=df_AA['FL_DATE'].max())
dataframe2['FL_DATE'] = date_range
dataframe2['YEAR'] = pd.DatetimeIndex(dataframe2['FL_DATE']).year
dataframe2['MONTH'] = pd.DatetimeIndex(dataframe2['FL_DATE']).month

df_prop_full_seats = pd.merge(dataframe2, df_full_seats, on=['YEAR', 'MONTH'], validate="many_to_one")
df_prop_full_seats.drop(['YEAR', 'MONTH'], axis=1, inplace=True)
df_prop_full_seats.set_index('FL_DATE', inplace=True)

df_prop_freight_weight = pd.merge(dataframe2, df_freight_weight, on=['YEAR', 'MONTH'], validate="many_to_one")
df_prop_freight_weight.drop(['YEAR', 'MONTH'], axis=1, inplace=True)
df_prop_freight_weight.set_index('FL_DATE', inplace=True)
In [ ]:
def T100_stat_by_airline(summ_stat_frame, type, startdate, enddate):
  dataframe2  = pd.DataFrame()
  date_range = pd.date_range(start=startdate,end=enddate)
  dataframe2['FL_DATE'] = date_range
  dataframe2['YEAR'] = pd.DatetimeIndex(dataframe2['FL_DATE']).year
  dataframe2['MONTH'] = pd.DatetimeIndex(dataframe2['FL_DATE']).month

  df_mean_stats = summ_stat_frame.loc[airline].add_prefix('{a}_{b}_'.format(a=airline, b=type))
  df_mean_stats = pd.merge(dataframe2, df_mean_stats, on=['YEAR', 'MONTH'], validate="many_to_one")
  df_mean_stats.drop(['YEAR', 'MONTH'], axis=1, inplace=True)
  df_mean_stats.set_index('FL_DATE', inplace=True)
  return df_mean_stats
In [ ]:
# Dictionary of all flight metrics DataFrames
flight_metics = {'Departure' : df_DEP_DEL, 'Arrival' : df_ARR_DEL,
                 'Fight Count' : df_NUM_FL, 'Cancel Count': df_NUM_CAN,
                 'Prop Flights Operated by Market' : FL_op_mkt,
                 'Prop Canceled Operated by Market' : CAN_op_mkt,
                 'Prop Full Seats' : df_full_seats, 'Prop Freight/Mail': df_freight_weight
                 }
In [ ]:
carrier_group_dict = {'1':'Regional Carriers (including Large, Medium, Commuter, Small Certified)',
                      '2':'National Carriers', '3':'Major Carriers'}

T100_carrier_groups = df_air_stats.groupby(['CARRIER_GROUP', 'YEAR', 'MONTH'])

T100_mean = T100_carrier_groups['DEPARTURES_SCHEDULED', 'DEPARTURES_PERFORMED', 'PAYLOAD', 'SEATS', 'PASSENGERS',
 'FREIGHT', 'MAIL', 'DISTANCE', 'RAMP_TO_RAMP', 'AIR_TIME', 'DISTANCE_GROUP'].mean()

Major_Mean = T100_mean.loc[3, :].add_prefix('Major_Carriers_Mean_')
National_Mean = T100_mean.loc[2, :].add_prefix('National_Carriers_Mean_')
Regional_Mean = T100_mean.loc[2, :].add_prefix('Regional_Carriers_Mean_')
<ipython-input-54-3fba41ab82e6>:6: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
  T100_mean = T100_carrier_groups['DEPARTURES_SCHEDULED', 'DEPARTURES_PERFORMED', 'PAYLOAD', 'SEATS', 'PASSENGERS',
In [ ]:
df_air_stats.head()
Out[ ]:
UNIQUE_CARRIER DEPARTURES_SCHEDULED DEPARTURES_PERFORMED PAYLOAD SEATS PASSENGERS FREIGHT MAIL DISTANCE RAMP_TO_RAMP ... DEST_STATE_NM DEST_WAC AIRCRAFT_GROUP AIRCRAFT_TYPE AIRCRAFT_CONFIG YEAR QUARTER MONTH DISTANCE_GROUP CLASS
0 YV 0.0 0.0 0.0 0.0 0.0 0.0 0.0 24.0 32.0 ... Virginia 38 6 673 1 2022 3 7 1 F
1 YV 0.0 0.0 0.0 0.0 0.0 0.0 0.0 45.0 35.0 ... Virginia 38 6 673 1 2022 3 7 1 F
2 YV 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70.0 52.0 ... Pennsylvania 23 6 673 1 2022 1 3 1 F
3 YV 0.0 0.0 0.0 0.0 0.0 0.0 0.0 83.0 32.0 ... Florida 33 6 673 1 2022 2 5 1 F
4 YV 0.0 0.0 0.0 0.0 0.0 0.0 0.0 83.0 52.0 ... North Carolina 36 6 673 1 2022 2 6 1 F

5 rows × 45 columns

Combining All Data Sets by the Market Airline¶

Combining Airline Operations Performance Metrics and Stock data.

Airline Operations Performance Metrics¶

Airline Operations Performance Metrics were created using the Marketing Carrier On-Time Performance data and the T-100 Domestic Segment (U.S. Carriers) data.

In [ ]:
df_AA_op_stat = pd.DataFrame() # american
df_DL_op_stat = pd.DataFrame() # delta
df_WN_op_stat = pd.DataFrame() # southwest
df_UA_op_stat = pd.DataFrame() # united
df_AS_op_stat = pd.DataFrame() # alaskan
df_NK_op_stat = pd.DataFrame() # spirit
df_HA_op_stat = pd.DataFrame() # hawaiian
df_G4_op_stat = pd.DataFrame() # allegiant
df_B6_op_stat = pd.DataFrame() # jetblue
df_F9_op_stat = pd.DataFrame() # frontier

airlines_operations_stats = {'AA':df_AA_op_stat, 'DL':df_DL_op_stat, 'WN':df_WN_op_stat,
                        'UA':df_UA_op_stat, 'AS':df_AS_op_stat, 'NK': df_NK_op_stat,
                        'HA':df_HA_op_stat, 'G4':df_G4_op_stat, 'B6':df_B6_op_stat,
                        'F9':df_F9_op_stat}

startdate = df_AA['FL_DATE'].min()
enddate = df_AA['FL_DATE'].max()
In [ ]:
df_airline_stat = df_air_stats.groupby(['UNIQUE_CARRIER', 'YEAR', 'MONTH'])
df_mean_stats = df_airline_stat[['DEPARTURES_SCHEDULED', 'DEPARTURES_PERFORMED', 'PAYLOAD', 'SEATS', 'PASSENGERS','FREIGHT', 'MAIL', 'DISTANCE', 'RAMP_TO_RAMP', 'AIR_TIME']].mean()
df_std_stats = df_airline_stat[['DEPARTURES_SCHEDULED', 'DEPARTURES_PERFORMED', 'PAYLOAD', 'SEATS', 'PASSENGERS','FREIGHT', 'MAIL', 'DISTANCE', 'RAMP_TO_RAMP', 'AIR_TIME']].std()
df_median_stats = df_airline_stat[['DEPARTURES_SCHEDULED', 'DEPARTURES_PERFORMED', 'PAYLOAD', 'SEATS', 'PASSENGERS','FREIGHT', 'MAIL', 'DISTANCE', 'RAMP_TO_RAMP', 'AIR_TIME']].median()

from datetime import datetime
frames_lst = flight_metics.values()
for frame in frames_lst:
  cols = frame.columns
  #print(frame)
  if frame.index.dtype =='datetime64[ns]':
    frame.index.rename('FL_DATE', inplace=True)
    for airline in airline_code:
      air_cols = cols.str.contains(airline)
      frame2 = frame.loc[:, air_cols]
      airlines_operations_stats[airline] = pd.concat([airlines_operations_stats[airline], frame2], axis=1)
  else:
    dataframe  = pd.DataFrame()
    date_range = pd.date_range(start=startdate,end=enddate)
    dataframe['FL_DATE'] = date_range
    dataframe['YEAR'] = pd.DatetimeIndex(dataframe['FL_DATE']).year
    dataframe['MONTH'] = pd.DatetimeIndex(dataframe['FL_DATE']).month
    for airline in airline_code:
      air_cols = cols.str.contains(airline)
      frame2 = frame.loc[:, air_cols]
      frame2 = pd.merge(dataframe, frame2, on=['YEAR', 'MONTH'])
      frame2.drop(['YEAR', 'MONTH'], axis=1, inplace=True)
      frame2.set_index('FL_DATE', inplace=True)
      airlines_operations_stats[airline] = pd.concat([airlines_operations_stats[airline], frame2], axis=1)

for airline in airline_code:
    frame_can_type = cancel_type(flight_airlines[airline], airline)

    for item in ['CANCEL_CODE_CARRIER', 'CANCEL_CODE_WEATHER', 'CANCEL_CODE_NAS','CANCEL_CODE_SECURITY']:
      if item in frame_can_type.columns:
        continue
      else:
        frame_can_type[item] = 0

    frame3 = T100_stat_by_airline(df_mean_stats, 'MEAN', startdate, enddate)
    frame4 = T100_stat_by_airline(df_std_stats, 'STD', startdate, enddate)
    frame5 = T100_stat_by_airline(df_median_stats, 'MEDIAN', startdate, enddate)
    airlines_operations_stats[airline] = pd.concat([airlines_operations_stats[airline], frame_can_type,frame3, frame4, frame5], axis=1)
    airlines_operations_stats[airline]['AIRLINE'] = airline

Stock and Airline Operations Performance Metrics¶

Merge dataframes into one

In [ ]:
df_american = pd.DataFrame()
df_delta = pd.DataFrame()
df_southwest = pd.DataFrame()
df_united = pd.DataFrame()
df_alaskan = pd.DataFrame()
df_spirit = pd.DataFrame()
df_hawaiian = pd.DataFrame()
df_allegiant = pd.DataFrame()
df_jetblue = pd.DataFrame()
df_jets = pd.DataFrame()
df_crude_oil = pd.DataFrame()

delay_decline = {'United': df_united, 'Alaskan': df_alaskan, 'American': df_american,
                   'Allegiant': df_allegiant, 'JetBlue': df_jetblue, 'Hawaiian': df_hawaiian,
                   'Southwest': df_southwest, 'Spirit': df_spirit, 'Delta': df_delta,
                   'JETS' : df_jets}
In [ ]:
#crude oil data
crude_frame = financ_OIL.reset_index()
mask = (crude_frame['Date'] >= startdate) & (crude_frame['Date'] <= enddate)
crude_frame = crude_frame.loc[mask]
crude_frame.rename(columns={'Date':'DATE'}, inplace=True)
crude_frame = crude_frame.set_index('DATE')
crude_frame = crude_frame.add_prefix('Crude Oil ')

for airline in delay_decline.keys():
  #stock data
  fin_frame1 = financ_airlines[airline]
  fin_frame = fin_frame1.reset_index()
  mask = (fin_frame['Date'] >= startdate) & (fin_frame['Date'] <= enddate)
  fin_frame = fin_frame.loc[mask]
  fin_frame.rename(columns={'Date':'DATE'}, inplace=True)
  fin_frame = fin_frame.set_index('DATE')
  if airline == 'JETS':
    fin_frame = fin_frame.add_prefix('JETS ')
    all_airline = pd.DataFrame()
    for code in airline_code:
      air_frame1 = airlines_operations_stats[code]
      air_frame = air_frame1.reset_index()
      air_frame.rename(columns={'FL_DATE':'DATE'}, inplace=True)
      air_frame = air_frame.set_index('DATE')
      all_airline = pd.concat([all_airline, all_airline], axis=1)
  else:
    #airline data
    code = flight_airline_codes[airline]
    air_frame1 = airlines_operations_stats[code]
    air_frame = air_frame1.reset_index()
    air_frame.rename(columns={'FL_DATE':'DATE'}, inplace=True)
    air_frame = air_frame.set_index('DATE')
    delay_decline[airline] = pd.concat([air_frame, fin_frame, crude_frame], axis=1)
In [ ]:
delay_decline['Delta'].head()
Out[ ]:
DL_NUM_DEP_DEL15 DL_NUM_ARR_DEL15 DL_NUM_FL DL_NUM_CAN FL_PROP_OP_BY_MKT_DL CAN_PROP_OP_BY_MKT_DL DL_PROP_FULL_SEATS DL_PROP_FREIGHT/MAIL CANCEL_CODE_CARRIER CANCEL_CODE_WEATHER ... Price Change Trend Prev Trend Crude Oil Open Crude Oil High Crude Oil Low Crude Oil Close Crude Oil Adj Close Crude Oil Volume Crude Oil Daily Return
DATE
2021-11-01 417.0 398.0 4167 6.0 0.572354 0.0 0.800704 0.008555 6.0 0.0 ... 1.289997 up down 83.360001 84.879997 82.739998 84.050003 84.050003 412309.0 0.005744
2021-11-02 244.0 233.0 3812 NaN 0.577650 0.0 0.800704 0.008555 NaN NaN ... -0.029999 down up 83.870003 84.410004 82.919998 83.910004 83.910004 429608.0 -0.001666
2021-11-03 300.0 341.0 3837 NaN 0.577013 0.0 0.800704 0.008555 NaN NaN ... 0.850002 up down 83.070000 83.080002 79.690002 80.860001 80.860001 546577.0 -0.036349
2021-11-04 396.0 511.0 4247 NaN 0.565576 0.0 0.800704 0.008555 NaN NaN ... -0.250000 down up 80.180000 83.419998 78.250000 78.809998 78.809998 701119.0 -0.025352
2021-11-05 439.0 467.0 4273 NaN 0.564241 0.0 0.800704 0.008555 NaN NaN ... 3.299999 up down 79.370003 81.800003 78.959999 81.269997 81.269997 531054.0 0.031214

5 rows × 60 columns

Combing all Stock and Airline Operations Performance Metrics to one DataFrame¶

In [ ]:
financials_drop_cols = ['NET_INCOME', 'MAIL',
       'TOTAL_PROPERTY', 'PROP_FREIGHT', 'PROP_BAG', 'TOTAL_CHARTER',
       'CHARTER_PAX', 'CHARTER_PROP', 'TOTAL_MISC_REV', 'RES_CANCEL_FEES',
       'MISC_OP_REV', 'PUB_SVC_REVENUE', 'TRANS_REVENUE', 'OP_REVENUES',
       'FLYING_OPS', 'MAINTENANCE', 'PAX_SERVICE', 'AIRCFT_SERVICES',
       'PROMOTION_SALES', 'GENERAL_ADMIN', 'GENERAL_SERVICES', 'DEPREC_AMORT',
       'TRANS_EXPENSES', 'OP_EXPENSES', 'INTEREST_LONG_DEBT',
       'INTEREST_EXP_OTH', 'FOREIGN_EX_GAINS', 'CAP_GAINS_PROP',
       'CAP_GAINS_OTHER', 'OTHER_INCOME_NET', 'NON_OP_INCOME',
       'INCOME_PRE_TAX', 'INCOME_TAX', 'INCOME_BEFORE_OTH', 'DISCONT_OPS',
       'EXTRA_ITEMS', 'INCOME_TAX_EXTRA', 'INCOME_TAX_CREDITS',
       'ACCTG_CHANGES', 'AIRLINE_ID', 'UNIQUE_CARRIER_NAME',
       'CARRIER', 'CARRIER_NAME', 'UNIQUE_CARRIER_ENTITY', 'REGION',
       'CARRIER_GROUP_NEW', 'CARRIER_GROUP']
fin_rep = financials.drop(financials_drop_cols, axis=1)
fin_rep_groups = fin_rep.groupby('UNIQUE_CARRIER')

grouped_income_df_new_g = grouped_income_df_new.groupby('UNIQUE_CARRIER')

dataframe2  = pd.DataFrame()
date_range = pd.date_range(start=df_AA['FL_DATE'].min(),end=df_AA['FL_DATE'].max())
dataframe2['FL_DATE'] = date_range
dataframe2['YEAR'] = pd.DatetimeIndex(dataframe2['FL_DATE']).year
dataframe2['QUARTER'] = pd.DatetimeIndex(dataframe2['FL_DATE']).quarter

all_fin_rep  = pd.DataFrame()
for group in grouped_income_df_new_g.groups.keys():
  fin_rep1 = grouped_income_df_new_g.get_group(group)
  fin_rep1 = fin_rep1.rename(columns={'OP_PROFIT_LOSS':'Prev Quarter OP_PROFIT_LOSS'})
  df_fin_rep = pd.merge(dataframe2, fin_rep1, on=['YEAR', 'QUARTER'])
  df_fin_rep.drop(['YEAR', 'QUARTER'], axis=1, inplace=True)
  df_fin_rep.set_index('FL_DATE', inplace=True)
  all_fin_rep = pd.concat([all_fin_rep, df_fin_rep], axis=0)
In [ ]:
all_df = pd.DataFrame()
all_fin_rep_groups = all_fin_rep.groupby('UNIQUE_CARRIER')
for item in delay_decline.keys():
  if item != 'JETS':
    frame1 = all_fin_rep_groups.get_group(flight_airline_codes[item])
    frame = delay_decline[item]
    frame = frame.reset_index()
    frame.rename(columns={'DATE':'Date'}, inplace=True)
    frame = frame.set_index('Date')
    frame = frame.set_axis(['NUM_DEP_DEL15', 'NUM_ARR_DEL15', 'NUM_FL', 'NUM_CAN',
       'FL_PROP_OP_BY_MKT', 'CAN_PROP_OP_BY_MKT', 'PROP_FULL_SEATS',
       'PROP_FREIGHT/MAIL', 'CANCEL_CODE_CARRIER', 'CANCEL_CODE_WEATHER',
       'CANCEL_CODE_NAS', 'CANCEL_CODE_SECURITY', 'MEAN_DEPARTURES_SCHEDULED',
       'MEAN_DEPARTURES_PERFORMED', 'MEAN_PAYLOAD', 'MEAN_SEATS',
       'MEAN_PASSENGERS', 'MEAN_FREIGHT', 'MEAN_MAIL',
       'MEAN_DISTANCE', 'MEAN_RAMP_TO_RAMP', 'MEAN_AIR_TIME',
       'STD_DEPARTURES_SCHEDULED', 'STD_DEPARTURES_PERFORMED',
       'STD_PAYLOAD', 'STD_SEATS', 'STD_PASSENGERS', 'STD_FREIGHT',
       'STD_MAIL', 'STD_DISTANCE', 'STD_RAMP_TO_RAMP',
       'STD_AIR_TIME', 'MEDIAN_DEPARTURES_SCHEDULED',
       'MEDIAN_DEPARTURES_PERFORMED', 'MEDIAN_PAYLOAD',
       'MEDIAN_SEATS', 'MEDIAN_PASSENGERS', 'MEDIAN_FREIGHT',
       'MEDIAN_MAIL', 'MEDIAN_DISTANCE', 'MEDIAN_RAMP_TO_RAMP',
       'MEDIAN_AIR_TIME', 'AIRLINE', 'Open', 'High', 'Low', 'Close',
       'Adj Close', 'Volume', 'Daily Return',
        'Price Change', 'Trend','Prev Trend',
        'Crude Oil Open', 'Crude Oil High', 'Crude Oil Low',
       'Crude Oil Close', 'Crude Oil Adj Close', 'Crude Oil Volume',
       'Crude Oil Daily Return'], axis=1)
    frame_comb = pd.concat([frame, frame1], axis=1)
    frame_comb = frame_comb.reset_index()
    frame_comb = frame_comb.rename(columns={'index':'Date'})
    all_df = pd.concat([all_df, frame_comb],ignore_index=True, axis=0)
  else:
    continue
all_df = all_df.drop('UNIQUE_CARRIER', axis=1)
all_df['Trend'] =all_df['Trend'].map({'up':1, 'down':0})
all_df['Prev Trend'] =all_df['Prev Trend'].map({'up':1, 'down':0})
In [ ]:
all_df.head()
Out[ ]:
Date NUM_DEP_DEL15 NUM_ARR_DEL15 NUM_FL NUM_CAN FL_PROP_OP_BY_MKT CAN_PROP_OP_BY_MKT PROP_FULL_SEATS PROP_FREIGHT/MAIL CANCEL_CODE_CARRIER ... Trend Prev Trend Crude Oil Open Crude Oil High Crude Oil Low Crude Oil Close Crude Oil Adj Close Crude Oil Volume Crude Oil Daily Return Prev Quarter OP_PROFIT_LOSS
0 2021-11-01 845.0 791.0 3756 75.0 0.432109 0.146667 0.800704 0.008555 56.0 ... 1.0 1.0 83.360001 84.879997 82.739998 84.050003 84.050003 412309.0 0.005744 -1375235.78
1 2021-11-02 507.0 478.0 3688 44.0 0.428688 0.181818 0.800704 0.008555 37.0 ... 0.0 1.0 83.870003 84.410004 82.919998 83.910004 83.910004 429608.0 -0.001666 -1375235.78
2 2021-11-03 627.0 718.0 3726 30.0 0.424047 0.300000 0.800704 0.008555 25.0 ... 1.0 0.0 83.070000 83.080002 79.690002 80.860001 80.860001 546577.0 -0.036349 -1375235.78
3 2021-11-04 540.0 609.0 3797 11.0 0.429550 0.454545 0.800704 0.008555 7.0 ... 1.0 1.0 80.180000 83.419998 78.250000 78.809998 78.809998 701119.0 -0.025352 -1375235.78
4 2021-11-05 420.0 421.0 3750 15.0 0.426667 0.266667 0.800704 0.008555 11.0 ... 1.0 1.0 79.370003 81.800003 78.959999 81.269997 81.269997 531054.0 0.031214 -1375235.78

5 rows × 62 columns

Becasue we are predicting from t to t+1 we need to shift the columns back by 1 day from the orginal dataframe all_df.

In [ ]:
all_df2 = all_df
all_df2 = all_df2.dropna()

shift_cols = ['Open', 'High',
       'Low', 'Close', 'Adj Close', 'Volume', 'Daily Return', 'Price Change',
       'Crude Oil Open', 'Crude Oil High',
       'Crude Oil Low', 'Crude Oil Close', 'Crude Oil Adj Close',
       'Crude Oil Volume', 'Crude Oil Daily Return']
for item in shift_cols:
  prev = 'Prev {}'.format(item)
  all_df2[prev] = all_df2[item].shift(-1)

all_df2 = all_df2.drop(shift_cols, axis=1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
<ipython-input-64-4f8ca05e1355>:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_df2[prev] = all_df2[item].shift(-1)
In [ ]:
all_df2.columns

data = all_df2.drop('Date', axis=1)
data = data.dropna()
In [ ]:
data3 = data.drop(['MEAN_DEPARTURES_PERFORMED', 'MEAN_PAYLOAD', 'MEAN_SEATS',
       'MEAN_PASSENGERS', 'MEAN_FREIGHT', 'MEAN_MAIL', 'MEAN_DISTANCE',
       'MEAN_RAMP_TO_RAMP', 'MEAN_AIR_TIME', 'STD_DEPARTURES_SCHEDULED',
       'STD_DEPARTURES_PERFORMED', 'STD_PAYLOAD', 'STD_SEATS',
       'STD_PASSENGERS', 'STD_FREIGHT', 'STD_MAIL', 'STD_DISTANCE',
       'STD_RAMP_TO_RAMP', 'STD_AIR_TIME', 'MEDIAN_DEPARTURES_SCHEDULED',
       'MEDIAN_DEPARTURES_PERFORMED', 'MEDIAN_PAYLOAD', 'MEDIAN_SEATS',
       'MEDIAN_PASSENGERS', 'MEDIAN_FREIGHT', 'MEDIAN_MAIL', 'MEDIAN_DISTANCE',
       'MEDIAN_RAMP_TO_RAMP', 'MEDIAN_AIR_TIME'], axis=1)

data_groups = data3.groupby('AIRLINE')
X = data_groups.get_group('HA').drop(['AIRLINE', 'Trend'], axis=1)
y = data_groups.get_group('HA')['Trend']

Creating a TSNE¶

TSNE using only stock and finacial data¶

In [ ]:
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import TSNE


TSNE_fin_data = all_df.drop(['Date','NUM_DEP_DEL15', 'NUM_ARR_DEL15', 'NUM_FL', 'NUM_CAN',
       'FL_PROP_OP_BY_MKT', 'CAN_PROP_OP_BY_MKT', 'PROP_FULL_SEATS',
       'PROP_FREIGHT/MAIL', 'CANCEL_CODE_CARRIER', 'CANCEL_CODE_WEATHER',
       'CANCEL_CODE_NAS', 'CANCEL_CODE_SECURITY', 'MEAN_DEPARTURES_SCHEDULED',
       'MEAN_DEPARTURES_PERFORMED', 'MEAN_PAYLOAD', 'MEAN_SEATS',
       'MEAN_PASSENGERS', 'MEAN_FREIGHT', 'MEAN_MAIL',
       'MEAN_DISTANCE', 'MEAN_RAMP_TO_RAMP', 'MEAN_AIR_TIME',
       'STD_DEPARTURES_SCHEDULED', 'STD_DEPARTURES_PERFORMED',
       'STD_PAYLOAD', 'STD_SEATS', 'STD_PASSENGERS', 'STD_FREIGHT',
       'STD_MAIL', 'STD_DISTANCE', 'STD_RAMP_TO_RAMP',
       'STD_AIR_TIME', 'MEDIAN_DEPARTURES_SCHEDULED',
       'MEDIAN_DEPARTURES_PERFORMED', 'MEDIAN_PAYLOAD',
       'MEDIAN_SEATS', 'MEDIAN_PASSENGERS', 'MEDIAN_FREIGHT',
       'MEDIAN_MAIL', 'MEDIAN_DISTANCE', 'MEDIAN_RAMP_TO_RAMP',
       'MEDIAN_AIR_TIME'], axis=1)
TSNE_fin_data = TSNE_fin_data.dropna()

X_T = TSNE_fin_data.drop('AIRLINE', axis=1)
y_T = TSNE_fin_data['AIRLINE']
X_T_std = StandardScaler().fit_transform(X_T)
n_components = 2
tsne = TSNE(n_components)
X_tsne = tsne.fit_transform(X_T_std)
X_tsne_data = np.vstack((X_tsne.T, y_T)).T
df_tsne = pd.DataFrame(X_tsne_data, columns=['Dim1', 'Dim2', 'AIRLINE'])

plt.figure(figsize=(8, 8))
sns.scatterplot(data=df_tsne, hue='AIRLINE', x='Dim1', y='Dim2').set(title='TSNE Stock and Financial Data')
plt.show()
No description has been provided for this image

TNSE using all data¶

In [ ]:
TSNE_data = all_df.drop('Date', axis=1)
TSNE_data = TSNE_data.dropna()

X_T = TSNE_data.drop('AIRLINE', axis=1)
y_T = TSNE_data['AIRLINE']
X_T_std = StandardScaler().fit_transform(X_T)
n_components = 2
tsne = TSNE(n_components)
X_tsne = tsne.fit_transform(X_T_std)
X_tsne_data = np.vstack((X_tsne.T, y_T)).T
df_tsne = pd.DataFrame(X_tsne_data, columns=['Dim1', 'Dim2', 'AIRLINE'])

plt.figure(figsize=(8, 8))
sns.scatterplot(data=df_tsne, hue='AIRLINE', x='Dim1', y='Dim2').set(title='TSNE Flight, Stock, and Financial Data')
plt.show()
No description has been provided for this image

Models¶

In [ ]:
all_df_airline_groups = all_df.groupby('AIRLINE')

Logistic Regression¶

In [ ]:
from sklearn.model_selection import train_test_split

def train_test_split_airline(airline, rs):
  frame = all_df_airline_groups.get_group(airline)
  frame = frame.dropna()
  X = frame.drop(['Trend','Date', 'AIRLINE'], axis=1)
  y = frame['Trend']
  X_std = StandardScaler().fit_transform(X)

  # using the train test split function
  X_std_train, X_std_test, y_train, y_test = train_test_split(X_std,y , random_state=rs,test_size=0.25, shuffle=True)
  return X_std_train, X_std_test, y_train, y_test
In [ ]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import mean_squared_error
In [ ]:
df_logistic = pd.DataFrame()

for rs in [42, 104, 35]:
  air_lst = []
  logistic_acc = []
  logistic_mse = []
  for airline in all_df_airline_groups.groups.keys():
    air_lst.append(airline)

    X_std_train, X_std_test, y_train, y_test = train_test_split_airline(airline, rs)

    # all parameters not specified are set to their defaults
    logisticRegr = LogisticRegression()

    # fit model on training data
    logisticRegr.fit(X_std_train, y_train)

    # predict the testing data
    y_pred = logisticRegr.predict(X_std_test)

    # accuracy
    score = logisticRegr.score(X_std_test, y_test)
    logistic_acc.append(score)

    # MSE
    mse = mean_squared_error(y_test, y_pred)
    logistic_mse.append(mse)

  RS_col = 'Random State {}'.format(rs)
  df_l = pd.DataFrame({'Airline':air_lst, 'Accuracy':logistic_acc, 'MSE':logistic_mse})
  df_l = df_l.set_index('Airline')
  l_cols=[(RS_col,'Accuracy'),(RS_col,'MSE')]
  df_l.columns=pd.MultiIndex.from_tuples(l_cols)

  df_logistic = pd.concat([df_logistic, df_l], axis=1)
df_logistic
Out[ ]:
Random State 42 Random State 104 Random State 35
Accuracy MSE Accuracy MSE Accuracy MSE
Airline
AA 0.987500 0.012500 0.962500 0.037500 0.925000 0.075000
AS 0.948718 0.051282 0.961538 0.038462 0.974359 0.025641
B6 0.971014 0.028986 0.942029 0.057971 0.985507 0.014493
DL 0.925373 0.074627 0.910448 0.089552 0.970149 0.029851
G4 0.958333 0.041667 0.958333 0.041667 0.895833 0.104167
HA 0.966667 0.033333 0.900000 0.100000 0.933333 0.066667
NK 0.949153 0.050847 0.864407 0.135593 0.983051 0.016949
UA 0.912500 0.087500 0.912500 0.087500 0.912500 0.087500
WN 0.950000 0.050000 0.950000 0.050000 0.875000 0.125000

Above we appeared to have high accuracy and MSE but this was due to data leakage where it was predicting the outcome of the day using that days information. Below we used the shifted dataframe and obtained much lower results.

In [ ]:
def train_test_split_trend(airline, rs):
  frame = data_groups.get_group(airline)
  frame = frame.dropna()
  X = frame.drop(['Trend', 'AIRLINE'], axis=1)
  y = frame['Trend']
  X_std = StandardScaler().fit_transform(X)

  # using the train test split function
  X_std_train, X_std_test, y_train, y_test = train_test_split(X_std,y , random_state=rs,test_size=0.25, shuffle=True)
  return X_std_train, X_std_test, y_train, y_test
In [ ]:
df_logistic = pd.DataFrame()

for rs in [42, 104, 35]:
  air_lst = []
  logistic_acc = []
  logistic_mse = []
  for airline in data_groups.groups.keys():
    air_lst.append(airline)

    X_std_train, X_std_test, y_train, y_test = train_test_split_trend(airline, rs)

    # all parameters not specified are set to their defaults
    logisticRegr = LogisticRegression()

    # fit model on training data
    logisticRegr.fit(X_std_train, y_train)

    # Predict the testing data
    y_pred = logisticRegr.predict(X_std_test)

    # Accuracy
    score = logisticRegr.score(X_std_test, y_test)
    logistic_acc.append(score)

    # MSE
    mse = mean_squared_error(y_test, y_pred)
    logistic_mse.append(mse)

  RS_col = 'Random State {}'.format(rs)
  df_l = pd.DataFrame({'Airline':air_lst, 'Accuracy':logistic_acc, 'MSE':logistic_mse})
  df_l = df_l.set_index('Airline')
  l_cols=[(RS_col,'Accuracy'),(RS_col,'MSE')]
  df_l.columns=pd.MultiIndex.from_tuples(l_cols)

  df_logistic = pd.concat([df_logistic, df_l], axis=1)
df_logistic
Out[ ]:
Random State 42 Random State 104 Random State 35
Accuracy MSE Accuracy MSE Accuracy MSE
Airline
AA 0.612500 0.387500 0.550000 0.450000 0.575000 0.425000
AS 0.487179 0.512821 0.500000 0.500000 0.576923 0.423077
B6 0.449275 0.550725 0.507246 0.492754 0.434783 0.565217
DL 0.424242 0.575758 0.409091 0.590909 0.424242 0.575758
G4 0.458333 0.541667 0.520833 0.479167 0.562500 0.437500
HA 0.700000 0.300000 0.666667 0.333333 0.566667 0.433333
NK 0.610169 0.389831 0.491525 0.508475 0.559322 0.440678
UA 0.425000 0.575000 0.525000 0.475000 0.487500 0.512500
WN 0.475000 0.525000 0.500000 0.500000 0.525000 0.475000

The results above do not provide us with any additional information to make predictions on whether or not the stock price will go up or down.

Multivariate LSTM¶

As a result of the logistic regression to not work as well aswe had hoped. We decided to change into a time series model. Since the model uses past data there was no need to use the shifted dataframe and therefore we were able to use the all_df dataframe.

In [ ]:
lstm_g = all_df.groupby('AIRLINE')
In [ ]:
# prepare data for lstm
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler

# convert series to supervised learning
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = pd.DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = pd.concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg
In [ ]:
from sklearn.preprocessing import MinMaxScaler
def preprocessing_lstm(airline):
  # load dataset
  dataset = lstm_g.get_group(airline)
  dataset = dataset.set_index('Date')
  dataset = dataset.drop('AIRLINE', axis=1)
  values = dataset.values

  # ensure all data is float
  values = values.astype('float32')

  # train the normalization
  scaler = MinMaxScaler()
  scaler = scaler.fit(values)

  # normalize the dataset
  normalized = scaler.transform(values)

  # frame as supervised learning
  reframed = series_to_supervised(normalized, 1, 1)

  # drop columns we don't want to predict
  reframed.drop(reframed.columns[[9,10,11,12,13,14,15]], axis=1, inplace=True)

  # split into train and test sets
  values = reframed.values
  n_train_days = int(values.shape[0] // 2.5)
  train = values[:n_train_days, :]
  test = values[n_train_days:, :]

  # split into input and outputs
  train_X, train_y = train[:, :-1], train[:, -1]
  test_X, test_y = test[:, :-1], test[:, -1]

  # reshape input to be 3D [samples, timesteps, features]
  train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
  test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

  return test_X, test_y, train_X, train_y
In [ ]:
from matplotlib import pyplot

def lstm_plot(yhat, history, test_X, test_y, airline, loss_type):
  # plot history
  pyplot.plot(history.history['loss'], label='train')
  pyplot.plot(history.history['val_loss'], label='test')
  pltitle = '{}: Train and Test Loss from the Multivariate LSTM During Training'.format(airline)
  pyplot.title(pltitle)
  pyplot.xlabel('Epoch')
  ylab = 'Train and Test Loss ({})'.format(loss_type)
  pyplot.ylabel(ylab)
  pyplot.legend()
  pyplot.show()

  test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))

  # yhat
  yhat = np.concatenate((yhat, test_X[:, 1:]), axis=1)
  yhat = yhat[:,0]

  # actual
  test_y = test_y.reshape((len(test_y), 1))
  y = np.concatenate((test_y, test_X[:, 1:]), axis=1)
  y = y[:,0]

  # calculate RMSE
  rmse = np.sqrt(mean_squared_error(y, yhat))
  print('Test RMSE: %.3f' % rmse)
  return rmse
In [ ]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

def lstm_model(train_X, train_y, test_X, test_y, n_epochs, n_batch, loss_type):
  # design network
  model = Sequential()
  model.add(LSTM(n_epochs, input_shape=(train_X.shape[1], train_X.shape[2])))
  model.add(Dense(1))
  model.compile(loss=loss_type, optimizer='adam')
  # fit network
  history = model.fit(train_X, train_y, epochs=n_epochs, batch_size=n_batch, validation_data=(test_X, test_y), verbose=2, shuffle=False)
  # make a prediction
  yhat = model.predict(test_X)

  return yhat, history
In [ ]:
def lstm_run(airline, n_epochs, n_batch, loss_type):
  test_X, test_y, train_X, train_y = preprocessing_lstm(airline)
  yhat, history = lstm_model(train_X, train_y, test_X, test_y, n_epochs, n_batch, loss_type)
  rsme = lstm_plot(yhat, history, test_X, test_y, airline, loss_type)
  return rsme
In [ ]:
lstm_rsme_dict50 = {}

for airline in lstm_g.groups.keys():
  print(airline)
  n_epochs = 50
  n_batch = 72
  loss_type = 'mean_squared_error'
  model_rsme = lstm_run(airline, n_epochs, n_batch, loss_type)
  feat_lst = [model_rsme, n_epochs, n_batch]
  lstm_rsme_dict50[airline] = feat_lst
AA
Epoch 1/50
2/2 - 3s - loss: 0.2751 - val_loss: 0.3122 - 3s/epoch - 1s/step
Epoch 2/50
2/2 - 0s - loss: 0.1069 - val_loss: 0.1265 - 41ms/epoch - 20ms/step
Epoch 3/50
2/2 - 0s - loss: 0.0763 - val_loss: 0.0549 - 41ms/epoch - 20ms/step
Epoch 4/50
2/2 - 0s - loss: 0.1156 - val_loss: 0.0407 - 42ms/epoch - 21ms/step
Epoch 5/50
2/2 - 0s - loss: 0.1410 - val_loss: 0.0427 - 44ms/epoch - 22ms/step
Epoch 6/50
2/2 - 0s - loss: 0.1293 - val_loss: 0.0548 - 39ms/epoch - 19ms/step
Epoch 7/50
2/2 - 0s - loss: 0.1008 - val_loss: 0.0793 - 40ms/epoch - 20ms/step
Epoch 8/50
2/2 - 0s - loss: 0.0766 - val_loss: 0.1125 - 38ms/epoch - 19ms/step
Epoch 9/50
2/2 - 0s - loss: 0.0645 - val_loss: 0.1450 - 40ms/epoch - 20ms/step
Epoch 10/50
2/2 - 0s - loss: 0.0618 - val_loss: 0.1674 - 39ms/epoch - 19ms/step
Epoch 11/50
2/2 - 0s - loss: 0.0620 - val_loss: 0.1748 - 43ms/epoch - 22ms/step
Epoch 12/50
2/2 - 0s - loss: 0.0604 - val_loss: 0.1682 - 43ms/epoch - 22ms/step
Epoch 13/50
2/2 - 0s - loss: 0.0563 - val_loss: 0.1524 - 44ms/epoch - 22ms/step
Epoch 14/50
2/2 - 0s - loss: 0.0513 - val_loss: 0.1339 - 40ms/epoch - 20ms/step
Epoch 15/50
2/2 - 0s - loss: 0.0475 - val_loss: 0.1179 - 42ms/epoch - 21ms/step
Epoch 16/50
2/2 - 0s - loss: 0.0459 - val_loss: 0.1076 - 42ms/epoch - 21ms/step
Epoch 17/50
2/2 - 0s - loss: 0.0454 - val_loss: 0.1040 - 44ms/epoch - 22ms/step
Epoch 18/50
2/2 - 0s - loss: 0.0442 - val_loss: 0.1068 - 42ms/epoch - 21ms/step
Epoch 19/50
2/2 - 0s - loss: 0.0417 - val_loss: 0.1150 - 52ms/epoch - 26ms/step
Epoch 20/50
2/2 - 0s - loss: 0.0383 - val_loss: 0.1268 - 38ms/epoch - 19ms/step
Epoch 21/50
2/2 - 0s - loss: 0.0355 - val_loss: 0.1396 - 41ms/epoch - 21ms/step
Epoch 22/50
2/2 - 0s - loss: 0.0338 - val_loss: 0.1502 - 40ms/epoch - 20ms/step
Epoch 23/50
2/2 - 0s - loss: 0.0326 - val_loss: 0.1559 - 40ms/epoch - 20ms/step
Epoch 24/50
2/2 - 0s - loss: 0.0312 - val_loss: 0.1562 - 39ms/epoch - 20ms/step
Epoch 25/50
2/2 - 0s - loss: 0.0291 - val_loss: 0.1521 - 40ms/epoch - 20ms/step
Epoch 26/50
2/2 - 0s - loss: 0.0268 - val_loss: 0.1463 - 41ms/epoch - 20ms/step
Epoch 27/50
2/2 - 0s - loss: 0.0246 - val_loss: 0.1415 - 42ms/epoch - 21ms/step
Epoch 28/50
2/2 - 0s - loss: 0.0228 - val_loss: 0.1396 - 40ms/epoch - 20ms/step
Epoch 29/50
2/2 - 0s - loss: 0.0213 - val_loss: 0.1414 - 39ms/epoch - 19ms/step
Epoch 30/50
2/2 - 0s - loss: 0.0196 - val_loss: 0.1465 - 40ms/epoch - 20ms/step
Epoch 31/50
2/2 - 0s - loss: 0.0180 - val_loss: 0.1534 - 38ms/epoch - 19ms/step
Epoch 32/50
2/2 - 0s - loss: 0.0166 - val_loss: 0.1601 - 42ms/epoch - 21ms/step
Epoch 33/50
2/2 - 0s - loss: 0.0154 - val_loss: 0.1645 - 39ms/epoch - 20ms/step
Epoch 34/50
2/2 - 0s - loss: 0.0142 - val_loss: 0.1660 - 43ms/epoch - 22ms/step
Epoch 35/50
2/2 - 0s - loss: 0.0130 - val_loss: 0.1650 - 39ms/epoch - 20ms/step
Epoch 36/50
2/2 - 0s - loss: 0.0118 - val_loss: 0.1635 - 39ms/epoch - 20ms/step
Epoch 37/50
2/2 - 0s - loss: 0.0108 - val_loss: 0.1631 - 42ms/epoch - 21ms/step
Epoch 38/50
2/2 - 0s - loss: 0.0099 - val_loss: 0.1651 - 39ms/epoch - 19ms/step
Epoch 39/50
2/2 - 0s - loss: 0.0090 - val_loss: 0.1693 - 40ms/epoch - 20ms/step
Epoch 40/50
2/2 - 0s - loss: 0.0082 - val_loss: 0.1745 - 40ms/epoch - 20ms/step
Epoch 41/50
2/2 - 0s - loss: 0.0075 - val_loss: 0.1791 - 39ms/epoch - 20ms/step
Epoch 42/50
2/2 - 0s - loss: 0.0070 - val_loss: 0.1817 - 42ms/epoch - 21ms/step
Epoch 43/50
2/2 - 0s - loss: 0.0064 - val_loss: 0.1825 - 40ms/epoch - 20ms/step
Epoch 44/50
2/2 - 0s - loss: 0.0059 - val_loss: 0.1825 - 41ms/epoch - 20ms/step
Epoch 45/50
2/2 - 0s - loss: 0.0055 - val_loss: 0.1832 - 41ms/epoch - 20ms/step
Epoch 46/50
2/2 - 0s - loss: 0.0051 - val_loss: 0.1853 - 41ms/epoch - 20ms/step
Epoch 47/50
2/2 - 0s - loss: 0.0048 - val_loss: 0.1885 - 39ms/epoch - 20ms/step
Epoch 48/50
2/2 - 0s - loss: 0.0045 - val_loss: 0.1917 - 41ms/epoch - 21ms/step
Epoch 49/50
2/2 - 0s - loss: 0.0042 - val_loss: 0.1937 - 41ms/epoch - 20ms/step
Epoch 50/50
2/2 - 0s - loss: 0.0040 - val_loss: 0.1943 - 39ms/epoch - 19ms/step
5/5 [==============================] - 0s 3ms/step
No description has been provided for this image
Test RMSE: 0.441
AS
Epoch 1/50
2/2 - 3s - loss: 0.2292 - val_loss: 0.2681 - 3s/epoch - 1s/step
Epoch 2/50
2/2 - 0s - loss: 0.1045 - val_loss: 0.1762 - 38ms/epoch - 19ms/step
Epoch 3/50
2/2 - 0s - loss: 0.1041 - val_loss: 0.1437 - 39ms/epoch - 20ms/step
Epoch 4/50
2/2 - 0s - loss: 0.1507 - val_loss: 0.1359 - 40ms/epoch - 20ms/step
Epoch 5/50
2/2 - 0s - loss: 0.1747 - val_loss: 0.1336 - 38ms/epoch - 19ms/step
Epoch 6/50
2/2 - 0s - loss: 0.1632 - val_loss: 0.1348 - 38ms/epoch - 19ms/step
Epoch 7/50
2/2 - 0s - loss: 0.1343 - val_loss: 0.1413 - 40ms/epoch - 20ms/step
Epoch 8/50
2/2 - 0s - loss: 0.1060 - val_loss: 0.1521 - 40ms/epoch - 20ms/step
Epoch 9/50
2/2 - 0s - loss: 0.0866 - val_loss: 0.1633 - 40ms/epoch - 20ms/step
Epoch 10/50
2/2 - 0s - loss: 0.0760 - val_loss: 0.1709 - 39ms/epoch - 20ms/step
Epoch 11/50
2/2 - 0s - loss: 0.0703 - val_loss: 0.1723 - 39ms/epoch - 20ms/step
Epoch 12/50
2/2 - 0s - loss: 0.0661 - val_loss: 0.1675 - 36ms/epoch - 18ms/step
Epoch 13/50
2/2 - 0s - loss: 0.0617 - val_loss: 0.1586 - 40ms/epoch - 20ms/step
Epoch 14/50
2/2 - 0s - loss: 0.0576 - val_loss: 0.1483 - 42ms/epoch - 21ms/step
Epoch 15/50
2/2 - 0s - loss: 0.0548 - val_loss: 0.1392 - 41ms/epoch - 20ms/step
Epoch 16/50
2/2 - 0s - loss: 0.0535 - val_loss: 0.1327 - 39ms/epoch - 19ms/step
Epoch 17/50
2/2 - 0s - loss: 0.0528 - val_loss: 0.1289 - 38ms/epoch - 19ms/step
Epoch 18/50
2/2 - 0s - loss: 0.0509 - val_loss: 0.1278 - 40ms/epoch - 20ms/step
Epoch 19/50
2/2 - 0s - loss: 0.0470 - val_loss: 0.1288 - 39ms/epoch - 20ms/step
Epoch 20/50
2/2 - 0s - loss: 0.0417 - val_loss: 0.1315 - 38ms/epoch - 19ms/step
Epoch 21/50
2/2 - 0s - loss: 0.0364 - val_loss: 0.1351 - 36ms/epoch - 18ms/step
Epoch 22/50
2/2 - 0s - loss: 0.0324 - val_loss: 0.1382 - 39ms/epoch - 19ms/step
Epoch 23/50
2/2 - 0s - loss: 0.0298 - val_loss: 0.1393 - 40ms/epoch - 20ms/step
Epoch 24/50
2/2 - 0s - loss: 0.0277 - val_loss: 0.1381 - 39ms/epoch - 19ms/step
Epoch 25/50
2/2 - 0s - loss: 0.0255 - val_loss: 0.1353 - 40ms/epoch - 20ms/step
Epoch 26/50
2/2 - 0s - loss: 0.0234 - val_loss: 0.1323 - 40ms/epoch - 20ms/step
Epoch 27/50
2/2 - 0s - loss: 0.0216 - val_loss: 0.1305 - 39ms/epoch - 20ms/step
Epoch 28/50
2/2 - 0s - loss: 0.0201 - val_loss: 0.1305 - 41ms/epoch - 20ms/step
Epoch 29/50
2/2 - 0s - loss: 0.0187 - val_loss: 0.1322 - 39ms/epoch - 19ms/step
Epoch 30/50
2/2 - 0s - loss: 0.0173 - val_loss: 0.1351 - 39ms/epoch - 20ms/step
Epoch 31/50
2/2 - 0s - loss: 0.0163 - val_loss: 0.1379 - 39ms/epoch - 19ms/step
Epoch 32/50
2/2 - 0s - loss: 0.0157 - val_loss: 0.1393 - 39ms/epoch - 19ms/step
Epoch 33/50
2/2 - 0s - loss: 0.0150 - val_loss: 0.1391 - 43ms/epoch - 21ms/step
Epoch 34/50
2/2 - 0s - loss: 0.0139 - val_loss: 0.1381 - 36ms/epoch - 18ms/step
Epoch 35/50
2/2 - 0s - loss: 0.0127 - val_loss: 0.1373 - 40ms/epoch - 20ms/step
Epoch 36/50
2/2 - 0s - loss: 0.0117 - val_loss: 0.1377 - 38ms/epoch - 19ms/step
Epoch 37/50
2/2 - 0s - loss: 0.0109 - val_loss: 0.1393 - 40ms/epoch - 20ms/step
Epoch 38/50
2/2 - 0s - loss: 0.0102 - val_loss: 0.1417 - 38ms/epoch - 19ms/step
Epoch 39/50
2/2 - 0s - loss: 0.0096 - val_loss: 0.1436 - 40ms/epoch - 20ms/step
Epoch 40/50
2/2 - 0s - loss: 0.0091 - val_loss: 0.1445 - 41ms/epoch - 20ms/step
Epoch 41/50
2/2 - 0s - loss: 0.0085 - val_loss: 0.1445 - 39ms/epoch - 20ms/step
Epoch 42/50
2/2 - 0s - loss: 0.0078 - val_loss: 0.1442 - 41ms/epoch - 20ms/step
Epoch 43/50
2/2 - 0s - loss: 0.0072 - val_loss: 0.1446 - 43ms/epoch - 21ms/step
Epoch 44/50
2/2 - 0s - loss: 0.0067 - val_loss: 0.1458 - 41ms/epoch - 20ms/step
Epoch 45/50
2/2 - 0s - loss: 0.0063 - val_loss: 0.1476 - 42ms/epoch - 21ms/step
Epoch 46/50
2/2 - 0s - loss: 0.0059 - val_loss: 0.1492 - 40ms/epoch - 20ms/step
Epoch 47/50
2/2 - 0s - loss: 0.0056 - val_loss: 0.1501 - 40ms/epoch - 20ms/step
Epoch 48/50
2/2 - 0s - loss: 0.0053 - val_loss: 0.1504 - 40ms/epoch - 20ms/step
Epoch 49/50
2/2 - 0s - loss: 0.0050 - val_loss: 0.1506 - 42ms/epoch - 21ms/step
Epoch 50/50
2/2 - 0s - loss: 0.0047 - val_loss: 0.1513 - 41ms/epoch - 21ms/step
5/5 [==============================] - 0s 3ms/step
No description has been provided for this image
Test RMSE: 0.389
B6
Epoch 1/50
2/2 - 3s - loss: 0.3249 - val_loss: 0.1976 - 3s/epoch - 1s/step
Epoch 2/50
2/2 - 0s - loss: 0.1353 - val_loss: 0.0787 - 40ms/epoch - 20ms/step
Epoch 3/50
2/2 - 0s - loss: 0.0551 - val_loss: 0.0593 - 44ms/epoch - 22ms/step
Epoch 4/50
2/2 - 0s - loss: 0.0619 - val_loss: 0.0906 - 38ms/epoch - 19ms/step
Epoch 5/50
2/2 - 0s - loss: 0.1049 - val_loss: 0.1106 - 37ms/epoch - 18ms/step
Epoch 6/50
2/2 - 0s - loss: 0.1269 - val_loss: 0.1013 - 40ms/epoch - 20ms/step
Epoch 7/50
2/2 - 0s - loss: 0.1154 - val_loss: 0.0795 - 39ms/epoch - 20ms/step
Epoch 8/50
2/2 - 0s - loss: 0.0880 - val_loss: 0.0639 - 41ms/epoch - 21ms/step
Epoch 9/50
2/2 - 0s - loss: 0.0630 - val_loss: 0.0612 - 40ms/epoch - 20ms/step
Epoch 10/50
2/2 - 0s - loss: 0.0487 - val_loss: 0.0679 - 40ms/epoch - 20ms/step
Epoch 11/50
2/2 - 0s - loss: 0.0442 - val_loss: 0.0764 - 41ms/epoch - 20ms/step
Epoch 12/50
2/2 - 0s - loss: 0.0444 - val_loss: 0.0811 - 40ms/epoch - 20ms/step
Epoch 13/50
2/2 - 0s - loss: 0.0446 - val_loss: 0.0801 - 40ms/epoch - 20ms/step
Epoch 14/50
2/2 - 0s - loss: 0.0432 - val_loss: 0.0753 - 42ms/epoch - 21ms/step
Epoch 15/50
2/2 - 0s - loss: 0.0408 - val_loss: 0.0699 - 40ms/epoch - 20ms/step
Epoch 16/50
2/2 - 0s - loss: 0.0392 - val_loss: 0.0663 - 40ms/epoch - 20ms/step
Epoch 17/50
2/2 - 0s - loss: 0.0394 - val_loss: 0.0652 - 41ms/epoch - 20ms/step
Epoch 18/50
2/2 - 0s - loss: 0.0410 - val_loss: 0.0659 - 41ms/epoch - 20ms/step
Epoch 19/50
2/2 - 0s - loss: 0.0428 - val_loss: 0.0667 - 42ms/epoch - 21ms/step
Epoch 20/50
2/2 - 0s - loss: 0.0431 - val_loss: 0.0670 - 41ms/epoch - 21ms/step
Epoch 21/50
2/2 - 0s - loss: 0.0414 - val_loss: 0.0671 - 41ms/epoch - 21ms/step
Epoch 22/50
2/2 - 0s - loss: 0.0386 - val_loss: 0.0678 - 40ms/epoch - 20ms/step
Epoch 23/50
2/2 - 0s - loss: 0.0357 - val_loss: 0.0692 - 40ms/epoch - 20ms/step
Epoch 24/50
2/2 - 0s - loss: 0.0335 - val_loss: 0.0711 - 41ms/epoch - 21ms/step
Epoch 25/50
2/2 - 0s - loss: 0.0321 - val_loss: 0.0728 - 43ms/epoch - 22ms/step
Epoch 26/50
2/2 - 0s - loss: 0.0313 - val_loss: 0.0737 - 41ms/epoch - 21ms/step
Epoch 27/50
2/2 - 0s - loss: 0.0305 - val_loss: 0.0738 - 40ms/epoch - 20ms/step
Epoch 28/50
2/2 - 0s - loss: 0.0297 - val_loss: 0.0733 - 40ms/epoch - 20ms/step
Epoch 29/50
2/2 - 0s - loss: 0.0289 - val_loss: 0.0728 - 45ms/epoch - 23ms/step
Epoch 30/50
2/2 - 0s - loss: 0.0283 - val_loss: 0.0726 - 40ms/epoch - 20ms/step
Epoch 31/50
2/2 - 0s - loss: 0.0279 - val_loss: 0.0728 - 41ms/epoch - 20ms/step
Epoch 32/50
2/2 - 0s - loss: 0.0273 - val_loss: 0.0733 - 39ms/epoch - 20ms/step
Epoch 33/50
2/2 - 0s - loss: 0.0266 - val_loss: 0.0742 - 39ms/epoch - 19ms/step
Epoch 34/50
2/2 - 0s - loss: 0.0257 - val_loss: 0.0755 - 42ms/epoch - 21ms/step
Epoch 35/50
2/2 - 0s - loss: 0.0247 - val_loss: 0.0770 - 40ms/epoch - 20ms/step
Epoch 36/50
2/2 - 0s - loss: 0.0238 - val_loss: 0.0786 - 38ms/epoch - 19ms/step
Epoch 37/50
2/2 - 0s - loss: 0.0230 - val_loss: 0.0799 - 37ms/epoch - 18ms/step
Epoch 38/50
2/2 - 0s - loss: 0.0223 - val_loss: 0.0808 - 38ms/epoch - 19ms/step
Epoch 39/50
2/2 - 0s - loss: 0.0216 - val_loss: 0.0813 - 41ms/epoch - 20ms/step
Epoch 40/50
2/2 - 0s - loss: 0.0210 - val_loss: 0.0817 - 38ms/epoch - 19ms/step
Epoch 41/50
2/2 - 0s - loss: 0.0203 - val_loss: 0.0821 - 38ms/epoch - 19ms/step
Epoch 42/50
2/2 - 0s - loss: 0.0197 - val_loss: 0.0827 - 40ms/epoch - 20ms/step
Epoch 43/50
2/2 - 0s - loss: 0.0192 - val_loss: 0.0838 - 41ms/epoch - 20ms/step
Epoch 44/50
2/2 - 0s - loss: 0.0186 - val_loss: 0.0851 - 42ms/epoch - 21ms/step
Epoch 45/50
2/2 - 0s - loss: 0.0180 - val_loss: 0.0867 - 42ms/epoch - 21ms/step
Epoch 46/50
2/2 - 0s - loss: 0.0174 - val_loss: 0.0883 - 40ms/epoch - 20ms/step
Epoch 47/50
2/2 - 0s - loss: 0.0169 - val_loss: 0.0898 - 41ms/epoch - 21ms/step
Epoch 48/50
2/2 - 0s - loss: 0.0164 - val_loss: 0.0910 - 40ms/epoch - 20ms/step
Epoch 49/50
2/2 - 0s - loss: 0.0159 - val_loss: 0.0919 - 54ms/epoch - 27ms/step
Epoch 50/50
2/2 - 0s - loss: 0.0154 - val_loss: 0.0928 - 39ms/epoch - 19ms/step
4/4 [==============================] - 0s 4ms/step
No description has been provided for this image
Test RMSE: 0.305
DL
Epoch 1/50
1/1 - 2s - loss: 0.7080 - val_loss: 0.5901 - 2s/epoch - 2s/step
Epoch 2/50
1/1 - 0s - loss: 0.5182 - val_loss: 0.4108 - 37ms/epoch - 37ms/step
Epoch 3/50
1/1 - 0s - loss: 0.3629 - val_loss: 0.2714 - 36ms/epoch - 36ms/step
Epoch 4/50
1/1 - 0s - loss: 0.2427 - val_loss: 0.1711 - 36ms/epoch - 36ms/step
Epoch 5/50
1/1 - 0s - loss: 0.1571 - val_loss: 0.1077 - 37ms/epoch - 37ms/step
Epoch 6/50
1/1 - 0s - loss: 0.1044 - val_loss: 0.0770 - 38ms/epoch - 38ms/step
Epoch 7/50
1/1 - 0s - loss: 0.0812 - val_loss: 0.0725 - 37ms/epoch - 37ms/step
Epoch 8/50
1/1 - 0s - loss: 0.0817 - val_loss: 0.0845 - 36ms/epoch - 36ms/step
Epoch 9/50
1/1 - 0s - loss: 0.0971 - val_loss: 0.1023 - 37ms/epoch - 37ms/step
Epoch 10/50
1/1 - 0s - loss: 0.1175 - val_loss: 0.1172 - 37ms/epoch - 37ms/step
Epoch 11/50
1/1 - 0s - loss: 0.1346 - val_loss: 0.1246 - 37ms/epoch - 37ms/step
Epoch 12/50
1/1 - 0s - loss: 0.1439 - val_loss: 0.1235 - 38ms/epoch - 38ms/step
Epoch 13/50
1/1 - 0s - loss: 0.1445 - val_loss: 0.1156 - 37ms/epoch - 37ms/step
Epoch 14/50
1/1 - 0s - loss: 0.1376 - val_loss: 0.1034 - 37ms/epoch - 37ms/step
Epoch 15/50
1/1 - 0s - loss: 0.1257 - val_loss: 0.0895 - 35ms/epoch - 35ms/step
Epoch 16/50
1/1 - 0s - loss: 0.1114 - val_loss: 0.0764 - 36ms/epoch - 36ms/step
Epoch 17/50
1/1 - 0s - loss: 0.0969 - val_loss: 0.0656 - 37ms/epoch - 37ms/step
Epoch 18/50
1/1 - 0s - loss: 0.0839 - val_loss: 0.0582 - 36ms/epoch - 36ms/step
Epoch 19/50
1/1 - 0s - loss: 0.0736 - val_loss: 0.0545 - 38ms/epoch - 38ms/step
Epoch 20/50
1/1 - 0s - loss: 0.0665 - val_loss: 0.0543 - 37ms/epoch - 37ms/step
Epoch 21/50
1/1 - 0s - loss: 0.0624 - val_loss: 0.0570 - 37ms/epoch - 37ms/step
Epoch 22/50
1/1 - 0s - loss: 0.0610 - val_loss: 0.0615 - 36ms/epoch - 36ms/step
Epoch 23/50
1/1 - 0s - loss: 0.0614 - val_loss: 0.0670 - 39ms/epoch - 39ms/step
Epoch 24/50
1/1 - 0s - loss: 0.0631 - val_loss: 0.0725 - 34ms/epoch - 34ms/step
Epoch 25/50
1/1 - 0s - loss: 0.0652 - val_loss: 0.0774 - 37ms/epoch - 37ms/step
Epoch 26/50
1/1 - 0s - loss: 0.0671 - val_loss: 0.0809 - 37ms/epoch - 37ms/step
Epoch 27/50
1/1 - 0s - loss: 0.0684 - val_loss: 0.0830 - 36ms/epoch - 36ms/step
Epoch 28/50
1/1 - 0s - loss: 0.0688 - val_loss: 0.0833 - 36ms/epoch - 36ms/step
Epoch 29/50
1/1 - 0s - loss: 0.0683 - val_loss: 0.0821 - 38ms/epoch - 38ms/step
Epoch 30/50
1/1 - 0s - loss: 0.0669 - val_loss: 0.0795 - 38ms/epoch - 38ms/step
Epoch 31/50
1/1 - 0s - loss: 0.0647 - val_loss: 0.0759 - 37ms/epoch - 37ms/step
Epoch 32/50
1/1 - 0s - loss: 0.0622 - val_loss: 0.0716 - 39ms/epoch - 39ms/step
Epoch 33/50
1/1 - 0s - loss: 0.0594 - val_loss: 0.0669 - 37ms/epoch - 37ms/step
Epoch 34/50
1/1 - 0s - loss: 0.0567 - val_loss: 0.0623 - 35ms/epoch - 35ms/step
Epoch 35/50
1/1 - 0s - loss: 0.0543 - val_loss: 0.0579 - 36ms/epoch - 36ms/step
Epoch 36/50
1/1 - 0s - loss: 0.0524 - val_loss: 0.0539 - 35ms/epoch - 35ms/step
Epoch 37/50
1/1 - 0s - loss: 0.0510 - val_loss: 0.0505 - 36ms/epoch - 36ms/step
Epoch 38/50
1/1 - 0s - loss: 0.0501 - val_loss: 0.0478 - 36ms/epoch - 36ms/step
Epoch 39/50
1/1 - 0s - loss: 0.0497 - val_loss: 0.0456 - 36ms/epoch - 36ms/step
Epoch 40/50
1/1 - 0s - loss: 0.0496 - val_loss: 0.0439 - 34ms/epoch - 34ms/step
Epoch 41/50
1/1 - 0s - loss: 0.0497 - val_loss: 0.0427 - 38ms/epoch - 38ms/step
Epoch 42/50
1/1 - 0s - loss: 0.0498 - val_loss: 0.0419 - 36ms/epoch - 36ms/step
Epoch 43/50
1/1 - 0s - loss: 0.0498 - val_loss: 0.0414 - 37ms/epoch - 37ms/step
Epoch 44/50
1/1 - 0s - loss: 0.0496 - val_loss: 0.0412 - 39ms/epoch - 39ms/step
Epoch 45/50
1/1 - 0s - loss: 0.0492 - val_loss: 0.0413 - 38ms/epoch - 38ms/step
Epoch 46/50
1/1 - 0s - loss: 0.0485 - val_loss: 0.0417 - 37ms/epoch - 37ms/step
Epoch 47/50
1/1 - 0s - loss: 0.0477 - val_loss: 0.0423 - 38ms/epoch - 38ms/step
Epoch 48/50
1/1 - 0s - loss: 0.0468 - val_loss: 0.0433 - 37ms/epoch - 37ms/step
Epoch 49/50
1/1 - 0s - loss: 0.0458 - val_loss: 0.0446 - 35ms/epoch - 35ms/step
Epoch 50/50
1/1 - 0s - loss: 0.0449 - val_loss: 0.0462 - 37ms/epoch - 37ms/step
4/4 [==============================] - 0s 3ms/step
No description has been provided for this image
Test RMSE: 0.215
G4
Epoch 1/50
1/1 - 6s - loss: 0.0633 - val_loss: 0.3544 - 6s/epoch - 6s/step
Epoch 2/50
1/1 - 0s - loss: 0.0267 - val_loss: 0.2877 - 38ms/epoch - 38ms/step
Epoch 3/50
1/1 - 0s - loss: 0.0121 - val_loss: 0.2459 - 38ms/epoch - 38ms/step
Epoch 4/50
1/1 - 0s - loss: 0.0137 - val_loss: 0.2254 - 36ms/epoch - 36ms/step
Epoch 5/50
1/1 - 0s - loss: 0.0206 - val_loss: 0.2189 - 36ms/epoch - 36ms/step
Epoch 6/50
1/1 - 0s - loss: 0.0241 - val_loss: 0.2210 - 36ms/epoch - 36ms/step
Epoch 7/50
1/1 - 0s - loss: 0.0225 - val_loss: 0.2294 - 35ms/epoch - 35ms/step
Epoch 8/50
1/1 - 0s - loss: 0.0178 - val_loss: 0.2429 - 37ms/epoch - 37ms/step
Epoch 9/50
1/1 - 0s - loss: 0.0126 - val_loss: 0.2605 - 37ms/epoch - 37ms/step
Epoch 10/50
1/1 - 0s - loss: 0.0088 - val_loss: 0.2804 - 38ms/epoch - 38ms/step
Epoch 11/50
1/1 - 0s - loss: 0.0071 - val_loss: 0.3001 - 38ms/epoch - 38ms/step
Epoch 12/50
1/1 - 0s - loss: 0.0073 - val_loss: 0.3172 - 40ms/epoch - 40ms/step
Epoch 13/50
1/1 - 0s - loss: 0.0085 - val_loss: 0.3295 - 38ms/epoch - 38ms/step
Epoch 14/50
1/1 - 0s - loss: 0.0097 - val_loss: 0.3357 - 36ms/epoch - 36ms/step
Epoch 15/50
1/1 - 0s - loss: 0.0102 - val_loss: 0.3359 - 37ms/epoch - 37ms/step
Epoch 16/50
1/1 - 0s - loss: 0.0096 - val_loss: 0.3308 - 38ms/epoch - 38ms/step
Epoch 17/50
1/1 - 0s - loss: 0.0083 - val_loss: 0.3221 - 38ms/epoch - 38ms/step
Epoch 18/50
1/1 - 0s - loss: 0.0067 - val_loss: 0.3112 - 36ms/epoch - 36ms/step
Epoch 19/50
1/1 - 0s - loss: 0.0052 - val_loss: 0.2997 - 38ms/epoch - 38ms/step
Epoch 20/50
1/1 - 0s - loss: 0.0043 - val_loss: 0.2891 - 36ms/epoch - 36ms/step
Epoch 21/50
1/1 - 0s - loss: 0.0040 - val_loss: 0.2802 - 39ms/epoch - 39ms/step
Epoch 22/50
1/1 - 0s - loss: 0.0041 - val_loss: 0.2738 - 36ms/epoch - 36ms/step
Epoch 23/50
1/1 - 0s - loss: 0.0045 - val_loss: 0.2701 - 36ms/epoch - 36ms/step
Epoch 24/50
1/1 - 0s - loss: 0.0048 - val_loss: 0.2691 - 38ms/epoch - 38ms/step
Epoch 25/50
1/1 - 0s - loss: 0.0048 - val_loss: 0.2709 - 36ms/epoch - 36ms/step
Epoch 26/50
1/1 - 0s - loss: 0.0045 - val_loss: 0.2750 - 38ms/epoch - 38ms/step
Epoch 27/50
1/1 - 0s - loss: 0.0039 - val_loss: 0.2811 - 38ms/epoch - 38ms/step
Epoch 28/50
1/1 - 0s - loss: 0.0032 - val_loss: 0.2885 - 37ms/epoch - 37ms/step
Epoch 29/50
1/1 - 0s - loss: 0.0027 - val_loss: 0.2964 - 36ms/epoch - 36ms/step
Epoch 30/50
1/1 - 0s - loss: 0.0024 - val_loss: 0.3040 - 36ms/epoch - 36ms/step
Epoch 31/50
1/1 - 0s - loss: 0.0023 - val_loss: 0.3105 - 36ms/epoch - 36ms/step
Epoch 32/50
1/1 - 0s - loss: 0.0024 - val_loss: 0.3153 - 34ms/epoch - 34ms/step
Epoch 33/50
1/1 - 0s - loss: 0.0025 - val_loss: 0.3178 - 37ms/epoch - 37ms/step
Epoch 34/50
1/1 - 0s - loss: 0.0025 - val_loss: 0.3179 - 36ms/epoch - 36ms/step
Epoch 35/50
1/1 - 0s - loss: 0.0024 - val_loss: 0.3158 - 34ms/epoch - 34ms/step
Epoch 36/50
1/1 - 0s - loss: 0.0022 - val_loss: 0.3119 - 37ms/epoch - 37ms/step
Epoch 37/50
1/1 - 0s - loss: 0.0019 - val_loss: 0.3070 - 35ms/epoch - 35ms/step
Epoch 38/50
1/1 - 0s - loss: 0.0016 - val_loss: 0.3017 - 36ms/epoch - 36ms/step
Epoch 39/50
1/1 - 0s - loss: 0.0014 - val_loss: 0.2966 - 37ms/epoch - 37ms/step
Epoch 40/50
1/1 - 0s - loss: 0.0014 - val_loss: 0.2925 - 37ms/epoch - 37ms/step
Epoch 41/50
1/1 - 0s - loss: 0.0014 - val_loss: 0.2897 - 42ms/epoch - 42ms/step
Epoch 42/50
1/1 - 0s - loss: 0.0014 - val_loss: 0.2885 - 37ms/epoch - 37ms/step
Epoch 43/50
1/1 - 0s - loss: 0.0014 - val_loss: 0.2888 - 36ms/epoch - 36ms/step
Epoch 44/50
1/1 - 0s - loss: 0.0014 - val_loss: 0.2905 - 36ms/epoch - 36ms/step
Epoch 45/50
1/1 - 0s - loss: 0.0012 - val_loss: 0.2934 - 37ms/epoch - 37ms/step
Epoch 46/50
1/1 - 0s - loss: 0.0011 - val_loss: 0.2971 - 36ms/epoch - 36ms/step
Epoch 47/50
1/1 - 0s - loss: 9.9558e-04 - val_loss: 0.3009 - 34ms/epoch - 34ms/step
Epoch 48/50
1/1 - 0s - loss: 9.3669e-04 - val_loss: 0.3045 - 34ms/epoch - 34ms/step
Epoch 49/50
1/1 - 0s - loss: 9.2858e-04 - val_loss: 0.3072 - 37ms/epoch - 37ms/step
Epoch 50/50
1/1 - 0s - loss: 9.4507e-04 - val_loss: 0.3088 - 34ms/epoch - 34ms/step
2/2 [==============================] - 0s 5ms/step
No description has been provided for this image
Test RMSE: 0.556
HA
Epoch 1/50
1/1 - 3s - loss: 0.3960 - val_loss: 0.6918 - 3s/epoch - 3s/step
Epoch 2/50
1/1 - 0s - loss: 0.2906 - val_loss: 0.5265 - 36ms/epoch - 36ms/step
Epoch 3/50
1/1 - 0s - loss: 0.2107 - val_loss: 0.3928 - 34ms/epoch - 34ms/step
Epoch 4/50
1/1 - 0s - loss: 0.1557 - val_loss: 0.2902 - 37ms/epoch - 37ms/step
Epoch 5/50
1/1 - 0s - loss: 0.1241 - val_loss: 0.2173 - 33ms/epoch - 33ms/step
Epoch 6/50
1/1 - 0s - loss: 0.1126 - val_loss: 0.1706 - 35ms/epoch - 35ms/step
Epoch 7/50
1/1 - 0s - loss: 0.1155 - val_loss: 0.1443 - 36ms/epoch - 36ms/step
Epoch 8/50
1/1 - 0s - loss: 0.1252 - val_loss: 0.1310 - 34ms/epoch - 34ms/step
Epoch 9/50
1/1 - 0s - loss: 0.1345 - val_loss: 0.1251 - 37ms/epoch - 37ms/step
Epoch 10/50
1/1 - 0s - loss: 0.1388 - val_loss: 0.1235 - 35ms/epoch - 35ms/step
Epoch 11/50
1/1 - 0s - loss: 0.1368 - val_loss: 0.1252 - 34ms/epoch - 34ms/step
Epoch 12/50
1/1 - 0s - loss: 0.1295 - val_loss: 0.1305 - 35ms/epoch - 35ms/step
Epoch 13/50
1/1 - 0s - loss: 0.1187 - val_loss: 0.1401 - 38ms/epoch - 38ms/step
Epoch 14/50
1/1 - 0s - loss: 0.1068 - val_loss: 0.1543 - 34ms/epoch - 34ms/step
Epoch 15/50
1/1 - 0s - loss: 0.0955 - val_loss: 0.1728 - 35ms/epoch - 35ms/step
Epoch 16/50
1/1 - 0s - loss: 0.0860 - val_loss: 0.1951 - 36ms/epoch - 36ms/step
Epoch 17/50
1/1 - 0s - loss: 0.0791 - val_loss: 0.2197 - 36ms/epoch - 36ms/step
Epoch 18/50
1/1 - 0s - loss: 0.0746 - val_loss: 0.2451 - 35ms/epoch - 35ms/step
Epoch 19/50
1/1 - 0s - loss: 0.0722 - val_loss: 0.2698 - 34ms/epoch - 34ms/step
Epoch 20/50
1/1 - 0s - loss: 0.0712 - val_loss: 0.2922 - 38ms/epoch - 38ms/step
Epoch 21/50
1/1 - 0s - loss: 0.0707 - val_loss: 0.3110 - 35ms/epoch - 35ms/step
Epoch 22/50
1/1 - 0s - loss: 0.0702 - val_loss: 0.3252 - 35ms/epoch - 35ms/step
Epoch 23/50
1/1 - 0s - loss: 0.0690 - val_loss: 0.3344 - 35ms/epoch - 35ms/step
Epoch 24/50
1/1 - 0s - loss: 0.0668 - val_loss: 0.3384 - 35ms/epoch - 35ms/step
Epoch 25/50
1/1 - 0s - loss: 0.0636 - val_loss: 0.3376 - 35ms/epoch - 35ms/step
Epoch 26/50
1/1 - 0s - loss: 0.0596 - val_loss: 0.3326 - 37ms/epoch - 37ms/step
Epoch 27/50
1/1 - 0s - loss: 0.0551 - val_loss: 0.3244 - 36ms/epoch - 36ms/step
Epoch 28/50
1/1 - 0s - loss: 0.0505 - val_loss: 0.3141 - 36ms/epoch - 36ms/step
Epoch 29/50
1/1 - 0s - loss: 0.0463 - val_loss: 0.3030 - 35ms/epoch - 35ms/step
Epoch 30/50
1/1 - 0s - loss: 0.0427 - val_loss: 0.2923 - 37ms/epoch - 37ms/step
Epoch 31/50
1/1 - 0s - loss: 0.0400 - val_loss: 0.2831 - 36ms/epoch - 36ms/step
Epoch 32/50
1/1 - 0s - loss: 0.0381 - val_loss: 0.2765 - 37ms/epoch - 37ms/step
Epoch 33/50
1/1 - 0s - loss: 0.0367 - val_loss: 0.2733 - 38ms/epoch - 38ms/step
Epoch 34/50
1/1 - 0s - loss: 0.0357 - val_loss: 0.2742 - 37ms/epoch - 37ms/step
Epoch 35/50
1/1 - 0s - loss: 0.0346 - val_loss: 0.2796 - 40ms/epoch - 40ms/step
Epoch 36/50
1/1 - 0s - loss: 0.0331 - val_loss: 0.2897 - 36ms/epoch - 36ms/step
Epoch 37/50
1/1 - 0s - loss: 0.0313 - val_loss: 0.3043 - 37ms/epoch - 37ms/step
Epoch 38/50
1/1 - 0s - loss: 0.0291 - val_loss: 0.3233 - 37ms/epoch - 37ms/step
Epoch 39/50
1/1 - 0s - loss: 0.0268 - val_loss: 0.3458 - 36ms/epoch - 36ms/step
Epoch 40/50
1/1 - 0s - loss: 0.0246 - val_loss: 0.3707 - 34ms/epoch - 34ms/step
Epoch 41/50
1/1 - 0s - loss: 0.0228 - val_loss: 0.3968 - 33ms/epoch - 33ms/step
Epoch 42/50
1/1 - 0s - loss: 0.0215 - val_loss: 0.4223 - 36ms/epoch - 36ms/step
Epoch 43/50
1/1 - 0s - loss: 0.0207 - val_loss: 0.4455 - 36ms/epoch - 36ms/step
Epoch 44/50
1/1 - 0s - loss: 0.0201 - val_loss: 0.4651 - 35ms/epoch - 35ms/step
Epoch 45/50
1/1 - 0s - loss: 0.0196 - val_loss: 0.4799 - 36ms/epoch - 36ms/step
Epoch 46/50
1/1 - 0s - loss: 0.0190 - val_loss: 0.4894 - 39ms/epoch - 39ms/step
Epoch 47/50
1/1 - 0s - loss: 0.0181 - val_loss: 0.4939 - 39ms/epoch - 39ms/step
Epoch 48/50
1/1 - 0s - loss: 0.0171 - val_loss: 0.4941 - 36ms/epoch - 36ms/step
Epoch 49/50
1/1 - 0s - loss: 0.0161 - val_loss: 0.4913 - 39ms/epoch - 39ms/step
Epoch 50/50
1/1 - 0s - loss: 0.0151 - val_loss: 0.4872 - 36ms/epoch - 36ms/step
1/1 [==============================] - 0s 423ms/step
No description has been provided for this image
Test RMSE: 0.698
NK
Epoch 1/50
1/1 - 3s - loss: 0.5388 - val_loss: 0.3626 - 3s/epoch - 3s/step
Epoch 2/50
1/1 - 0s - loss: 0.4263 - val_loss: 0.2820 - 37ms/epoch - 37ms/step
Epoch 3/50
1/1 - 0s - loss: 0.3296 - val_loss: 0.2246 - 37ms/epoch - 37ms/step
Epoch 4/50
1/1 - 0s - loss: 0.2484 - val_loss: 0.1904 - 41ms/epoch - 41ms/step
Epoch 5/50
1/1 - 0s - loss: 0.1828 - val_loss: 0.1785 - 37ms/epoch - 37ms/step
Epoch 6/50
1/1 - 0s - loss: 0.1323 - val_loss: 0.1868 - 37ms/epoch - 37ms/step
Epoch 7/50
1/1 - 0s - loss: 0.0965 - val_loss: 0.2116 - 35ms/epoch - 35ms/step
Epoch 8/50
1/1 - 0s - loss: 0.0741 - val_loss: 0.2472 - 37ms/epoch - 37ms/step
Epoch 9/50
1/1 - 0s - loss: 0.0634 - val_loss: 0.2868 - 36ms/epoch - 36ms/step
Epoch 10/50
1/1 - 0s - loss: 0.0618 - val_loss: 0.3231 - 35ms/epoch - 35ms/step
Epoch 11/50
1/1 - 0s - loss: 0.0662 - val_loss: 0.3503 - 38ms/epoch - 38ms/step
Epoch 12/50
1/1 - 0s - loss: 0.0731 - val_loss: 0.3650 - 35ms/epoch - 35ms/step
Epoch 13/50
1/1 - 0s - loss: 0.0796 - val_loss: 0.3663 - 35ms/epoch - 35ms/step
Epoch 14/50
1/1 - 0s - loss: 0.0837 - val_loss: 0.3554 - 38ms/epoch - 38ms/step
Epoch 15/50
1/1 - 0s - loss: 0.0844 - val_loss: 0.3350 - 43ms/epoch - 43ms/step
Epoch 16/50
1/1 - 0s - loss: 0.0817 - val_loss: 0.3083 - 42ms/epoch - 42ms/step
Epoch 17/50
1/1 - 0s - loss: 0.0761 - val_loss: 0.2785 - 40ms/epoch - 40ms/step
Epoch 18/50
1/1 - 0s - loss: 0.0685 - val_loss: 0.2485 - 40ms/epoch - 40ms/step
Epoch 19/50
1/1 - 0s - loss: 0.0600 - val_loss: 0.2204 - 38ms/epoch - 38ms/step
Epoch 20/50
1/1 - 0s - loss: 0.0514 - val_loss: 0.1960 - 38ms/epoch - 38ms/step
Epoch 21/50
1/1 - 0s - loss: 0.0436 - val_loss: 0.1761 - 37ms/epoch - 37ms/step
Epoch 22/50
1/1 - 0s - loss: 0.0370 - val_loss: 0.1611 - 38ms/epoch - 38ms/step
Epoch 23/50
1/1 - 0s - loss: 0.0319 - val_loss: 0.1507 - 38ms/epoch - 38ms/step
Epoch 24/50
1/1 - 0s - loss: 0.0284 - val_loss: 0.1445 - 39ms/epoch - 39ms/step
Epoch 25/50
1/1 - 0s - loss: 0.0261 - val_loss: 0.1418 - 41ms/epoch - 41ms/step
Epoch 26/50
1/1 - 0s - loss: 0.0250 - val_loss: 0.1417 - 42ms/epoch - 42ms/step
Epoch 27/50
1/1 - 0s - loss: 0.0247 - val_loss: 0.1435 - 40ms/epoch - 40ms/step
Epoch 28/50
1/1 - 0s - loss: 0.0248 - val_loss: 0.1463 - 40ms/epoch - 40ms/step
Epoch 29/50
1/1 - 0s - loss: 0.0250 - val_loss: 0.1495 - 39ms/epoch - 39ms/step
Epoch 30/50
1/1 - 0s - loss: 0.0251 - val_loss: 0.1527 - 40ms/epoch - 40ms/step
Epoch 31/50
1/1 - 0s - loss: 0.0250 - val_loss: 0.1555 - 38ms/epoch - 38ms/step
Epoch 32/50
1/1 - 0s - loss: 0.0245 - val_loss: 0.1577 - 41ms/epoch - 41ms/step
Epoch 33/50
1/1 - 0s - loss: 0.0236 - val_loss: 0.1590 - 37ms/epoch - 37ms/step
Epoch 34/50
1/1 - 0s - loss: 0.0224 - val_loss: 0.1596 - 38ms/epoch - 38ms/step
Epoch 35/50
1/1 - 0s - loss: 0.0209 - val_loss: 0.1595 - 39ms/epoch - 39ms/step
Epoch 36/50
1/1 - 0s - loss: 0.0192 - val_loss: 0.1587 - 36ms/epoch - 36ms/step
Epoch 37/50
1/1 - 0s - loss: 0.0175 - val_loss: 0.1574 - 37ms/epoch - 37ms/step
Epoch 38/50
1/1 - 0s - loss: 0.0159 - val_loss: 0.1559 - 37ms/epoch - 37ms/step
Epoch 39/50
1/1 - 0s - loss: 0.0145 - val_loss: 0.1541 - 36ms/epoch - 36ms/step
Epoch 40/50
1/1 - 0s - loss: 0.0134 - val_loss: 0.1524 - 37ms/epoch - 37ms/step
Epoch 41/50
1/1 - 0s - loss: 0.0126 - val_loss: 0.1508 - 38ms/epoch - 38ms/step
Epoch 42/50
1/1 - 0s - loss: 0.0120 - val_loss: 0.1495 - 37ms/epoch - 37ms/step
Epoch 43/50
1/1 - 0s - loss: 0.0117 - val_loss: 0.1486 - 37ms/epoch - 37ms/step
Epoch 44/50
1/1 - 0s - loss: 0.0116 - val_loss: 0.1480 - 39ms/epoch - 39ms/step
Epoch 45/50
1/1 - 0s - loss: 0.0116 - val_loss: 0.1480 - 38ms/epoch - 38ms/step
Epoch 46/50
1/1 - 0s - loss: 0.0116 - val_loss: 0.1484 - 39ms/epoch - 39ms/step
Epoch 47/50
1/1 - 0s - loss: 0.0116 - val_loss: 0.1495 - 40ms/epoch - 40ms/step
Epoch 48/50
1/1 - 0s - loss: 0.0114 - val_loss: 0.1511 - 37ms/epoch - 37ms/step
Epoch 49/50
1/1 - 0s - loss: 0.0112 - val_loss: 0.1533 - 36ms/epoch - 36ms/step
Epoch 50/50
1/1 - 0s - loss: 0.0109 - val_loss: 0.1561 - 35ms/epoch - 35ms/step
3/3 [==============================] - 0s 4ms/step
No description has been provided for this image
Test RMSE: 0.395
UA
Epoch 1/50
2/2 - 3s - loss: 0.3827 - val_loss: 0.4592 - 3s/epoch - 1s/step
Epoch 2/50
2/2 - 0s - loss: 0.2386 - val_loss: 0.2808 - 41ms/epoch - 21ms/step
Epoch 3/50
2/2 - 0s - loss: 0.1957 - val_loss: 0.1700 - 39ms/epoch - 20ms/step
Epoch 4/50
2/2 - 0s - loss: 0.2138 - val_loss: 0.1119 - 42ms/epoch - 21ms/step
Epoch 5/50
2/2 - 0s - loss: 0.2531 - val_loss: 0.0866 - 40ms/epoch - 20ms/step
Epoch 6/50
2/2 - 0s - loss: 0.2792 - val_loss: 0.0776 - 42ms/epoch - 21ms/step
Epoch 7/50
2/2 - 0s - loss: 0.2791 - val_loss: 0.0770 - 41ms/epoch - 21ms/step
Epoch 8/50
2/2 - 0s - loss: 0.2582 - val_loss: 0.0819 - 41ms/epoch - 20ms/step
Epoch 9/50
2/2 - 0s - loss: 0.2283 - val_loss: 0.0911 - 44ms/epoch - 22ms/step
Epoch 10/50
2/2 - 0s - loss: 0.1990 - val_loss: 0.1024 - 44ms/epoch - 22ms/step
Epoch 11/50
2/2 - 0s - loss: 0.1749 - val_loss: 0.1132 - 42ms/epoch - 21ms/step
Epoch 12/50
2/2 - 0s - loss: 0.1569 - val_loss: 0.1209 - 43ms/epoch - 22ms/step
Epoch 13/50
2/2 - 0s - loss: 0.1439 - val_loss: 0.1237 - 41ms/epoch - 21ms/step
Epoch 14/50
2/2 - 0s - loss: 0.1340 - val_loss: 0.1211 - 43ms/epoch - 22ms/step
Epoch 15/50
2/2 - 0s - loss: 0.1260 - val_loss: 0.1138 - 42ms/epoch - 21ms/step
Epoch 16/50
2/2 - 0s - loss: 0.1192 - val_loss: 0.1035 - 44ms/epoch - 22ms/step
Epoch 17/50
2/2 - 0s - loss: 0.1138 - val_loss: 0.0920 - 39ms/epoch - 20ms/step
Epoch 18/50
2/2 - 0s - loss: 0.1096 - val_loss: 0.0812 - 49ms/epoch - 24ms/step
Epoch 19/50
2/2 - 0s - loss: 0.1067 - val_loss: 0.0724 - 40ms/epoch - 20ms/step
Epoch 20/50
2/2 - 0s - loss: 0.1042 - val_loss: 0.0661 - 43ms/epoch - 21ms/step
Epoch 21/50
2/2 - 0s - loss: 0.1011 - val_loss: 0.0622 - 43ms/epoch - 22ms/step
Epoch 22/50
2/2 - 0s - loss: 0.0964 - val_loss: 0.0603 - 43ms/epoch - 22ms/step
Epoch 23/50
2/2 - 0s - loss: 0.0898 - val_loss: 0.0600 - 56ms/epoch - 28ms/step
Epoch 24/50
2/2 - 0s - loss: 0.0817 - val_loss: 0.0606 - 41ms/epoch - 20ms/step
Epoch 25/50
2/2 - 0s - loss: 0.0732 - val_loss: 0.0617 - 41ms/epoch - 21ms/step
Epoch 26/50
2/2 - 0s - loss: 0.0652 - val_loss: 0.0624 - 45ms/epoch - 23ms/step
Epoch 27/50
2/2 - 0s - loss: 0.0584 - val_loss: 0.0623 - 42ms/epoch - 21ms/step
Epoch 28/50
2/2 - 0s - loss: 0.0529 - val_loss: 0.0611 - 42ms/epoch - 21ms/step
Epoch 29/50
2/2 - 0s - loss: 0.0484 - val_loss: 0.0593 - 55ms/epoch - 27ms/step
Epoch 30/50
2/2 - 0s - loss: 0.0448 - val_loss: 0.0574 - 42ms/epoch - 21ms/step
Epoch 31/50
2/2 - 0s - loss: 0.0421 - val_loss: 0.0560 - 42ms/epoch - 21ms/step
Epoch 32/50
2/2 - 0s - loss: 0.0399 - val_loss: 0.0554 - 40ms/epoch - 20ms/step
Epoch 33/50
2/2 - 0s - loss: 0.0376 - val_loss: 0.0554 - 41ms/epoch - 20ms/step
Epoch 34/50
2/2 - 0s - loss: 0.0351 - val_loss: 0.0559 - 41ms/epoch - 20ms/step
Epoch 35/50
2/2 - 0s - loss: 0.0324 - val_loss: 0.0566 - 42ms/epoch - 21ms/step
Epoch 36/50
2/2 - 0s - loss: 0.0298 - val_loss: 0.0573 - 42ms/epoch - 21ms/step
Epoch 37/50
2/2 - 0s - loss: 0.0276 - val_loss: 0.0577 - 41ms/epoch - 20ms/step
Epoch 38/50
2/2 - 0s - loss: 0.0259 - val_loss: 0.0577 - 39ms/epoch - 19ms/step
Epoch 39/50
2/2 - 0s - loss: 0.0247 - val_loss: 0.0574 - 48ms/epoch - 24ms/step
Epoch 40/50
2/2 - 0s - loss: 0.0238 - val_loss: 0.0570 - 43ms/epoch - 21ms/step
Epoch 41/50
2/2 - 0s - loss: 0.0232 - val_loss: 0.0568 - 41ms/epoch - 20ms/step
Epoch 42/50
2/2 - 0s - loss: 0.0225 - val_loss: 0.0568 - 40ms/epoch - 20ms/step
Epoch 43/50
2/2 - 0s - loss: 0.0217 - val_loss: 0.0570 - 41ms/epoch - 21ms/step
Epoch 44/50
2/2 - 0s - loss: 0.0209 - val_loss: 0.0571 - 49ms/epoch - 24ms/step
Epoch 45/50
2/2 - 0s - loss: 0.0200 - val_loss: 0.0572 - 41ms/epoch - 20ms/step
Epoch 46/50
2/2 - 0s - loss: 0.0193 - val_loss: 0.0570 - 42ms/epoch - 21ms/step
Epoch 47/50
2/2 - 0s - loss: 0.0188 - val_loss: 0.0567 - 43ms/epoch - 22ms/step
Epoch 48/50
2/2 - 0s - loss: 0.0184 - val_loss: 0.0564 - 43ms/epoch - 21ms/step
Epoch 49/50
2/2 - 0s - loss: 0.0180 - val_loss: 0.0561 - 42ms/epoch - 21ms/step
Epoch 50/50
2/2 - 0s - loss: 0.0176 - val_loss: 0.0560 - 43ms/epoch - 22ms/step
5/5 [==============================] - 0s 3ms/step
No description has been provided for this image
Test RMSE: 0.237
WN
Epoch 1/50
2/2 - 2s - loss: 0.5623 - val_loss: 0.1521 - 2s/epoch - 1s/step
Epoch 2/50
2/2 - 0s - loss: 0.3468 - val_loss: 0.1256 - 43ms/epoch - 22ms/step
Epoch 3/50
2/2 - 0s - loss: 0.2179 - val_loss: 0.1608 - 41ms/epoch - 21ms/step
Epoch 4/50
2/2 - 0s - loss: 0.1609 - val_loss: 0.2223 - 43ms/epoch - 21ms/step
Epoch 5/50
2/2 - 0s - loss: 0.1494 - val_loss: 0.2649 - 45ms/epoch - 22ms/step
Epoch 6/50
2/2 - 0s - loss: 0.1509 - val_loss: 0.2706 - 42ms/epoch - 21ms/step
Epoch 7/50
2/2 - 0s - loss: 0.1479 - val_loss: 0.2484 - 41ms/epoch - 20ms/step
Epoch 8/50
2/2 - 0s - loss: 0.1389 - val_loss: 0.2140 - 42ms/epoch - 21ms/step
Epoch 9/50
2/2 - 0s - loss: 0.1291 - val_loss: 0.1796 - 41ms/epoch - 21ms/step
Epoch 10/50
2/2 - 0s - loss: 0.1227 - val_loss: 0.1515 - 43ms/epoch - 22ms/step
Epoch 11/50
2/2 - 0s - loss: 0.1208 - val_loss: 0.1314 - 43ms/epoch - 21ms/step
Epoch 12/50
2/2 - 0s - loss: 0.1220 - val_loss: 0.1186 - 43ms/epoch - 22ms/step
Epoch 13/50
2/2 - 0s - loss: 0.1240 - val_loss: 0.1110 - 43ms/epoch - 22ms/step
Epoch 14/50
2/2 - 0s - loss: 0.1247 - val_loss: 0.1069 - 43ms/epoch - 21ms/step
Epoch 15/50
2/2 - 0s - loss: 0.1228 - val_loss: 0.1055 - 45ms/epoch - 23ms/step
Epoch 16/50
2/2 - 0s - loss: 0.1186 - val_loss: 0.1063 - 43ms/epoch - 21ms/step
Epoch 17/50
2/2 - 0s - loss: 0.1128 - val_loss: 0.1091 - 43ms/epoch - 22ms/step
Epoch 18/50
2/2 - 0s - loss: 0.1066 - val_loss: 0.1134 - 43ms/epoch - 22ms/step
Epoch 19/50
2/2 - 0s - loss: 0.1012 - val_loss: 0.1185 - 40ms/epoch - 20ms/step
Epoch 20/50
2/2 - 0s - loss: 0.0969 - val_loss: 0.1230 - 42ms/epoch - 21ms/step
Epoch 21/50
2/2 - 0s - loss: 0.0935 - val_loss: 0.1258 - 39ms/epoch - 19ms/step
Epoch 22/50
2/2 - 0s - loss: 0.0906 - val_loss: 0.1260 - 41ms/epoch - 21ms/step
Epoch 23/50
2/2 - 0s - loss: 0.0877 - val_loss: 0.1235 - 40ms/epoch - 20ms/step
Epoch 24/50
2/2 - 0s - loss: 0.0846 - val_loss: 0.1190 - 40ms/epoch - 20ms/step
Epoch 25/50
2/2 - 0s - loss: 0.0814 - val_loss: 0.1135 - 44ms/epoch - 22ms/step
Epoch 26/50
2/2 - 0s - loss: 0.0784 - val_loss: 0.1081 - 44ms/epoch - 22ms/step
Epoch 27/50
2/2 - 0s - loss: 0.0757 - val_loss: 0.1035 - 40ms/epoch - 20ms/step
Epoch 28/50
2/2 - 0s - loss: 0.0731 - val_loss: 0.1003 - 39ms/epoch - 20ms/step
Epoch 29/50
2/2 - 0s - loss: 0.0705 - val_loss: 0.0986 - 40ms/epoch - 20ms/step
Epoch 30/50
2/2 - 0s - loss: 0.0676 - val_loss: 0.0982 - 43ms/epoch - 22ms/step
Epoch 31/50
2/2 - 0s - loss: 0.0645 - val_loss: 0.0989 - 42ms/epoch - 21ms/step
Epoch 32/50
2/2 - 0s - loss: 0.0612 - val_loss: 0.1003 - 41ms/epoch - 20ms/step
Epoch 33/50
2/2 - 0s - loss: 0.0580 - val_loss: 0.1017 - 41ms/epoch - 21ms/step
Epoch 34/50
2/2 - 0s - loss: 0.0550 - val_loss: 0.1024 - 44ms/epoch - 22ms/step
Epoch 35/50
2/2 - 0s - loss: 0.0521 - val_loss: 0.1020 - 43ms/epoch - 21ms/step
Epoch 36/50
2/2 - 0s - loss: 0.0493 - val_loss: 0.1004 - 42ms/epoch - 21ms/step
Epoch 37/50
2/2 - 0s - loss: 0.0465 - val_loss: 0.0980 - 44ms/epoch - 22ms/step
Epoch 38/50
2/2 - 0s - loss: 0.0438 - val_loss: 0.0955 - 42ms/epoch - 21ms/step
Epoch 39/50
2/2 - 0s - loss: 0.0412 - val_loss: 0.0934 - 43ms/epoch - 22ms/step
Epoch 40/50
2/2 - 0s - loss: 0.0387 - val_loss: 0.0922 - 41ms/epoch - 21ms/step
Epoch 41/50
2/2 - 0s - loss: 0.0363 - val_loss: 0.0920 - 41ms/epoch - 21ms/step
Epoch 42/50
2/2 - 0s - loss: 0.0338 - val_loss: 0.0925 - 42ms/epoch - 21ms/step
Epoch 43/50
2/2 - 0s - loss: 0.0314 - val_loss: 0.0932 - 40ms/epoch - 20ms/step
Epoch 44/50
2/2 - 0s - loss: 0.0292 - val_loss: 0.0936 - 40ms/epoch - 20ms/step
Epoch 45/50
2/2 - 0s - loss: 0.0271 - val_loss: 0.0933 - 39ms/epoch - 20ms/step
Epoch 46/50
2/2 - 0s - loss: 0.0251 - val_loss: 0.0924 - 39ms/epoch - 19ms/step
Epoch 47/50
2/2 - 0s - loss: 0.0232 - val_loss: 0.0913 - 40ms/epoch - 20ms/step
Epoch 48/50
2/2 - 0s - loss: 0.0214 - val_loss: 0.0905 - 44ms/epoch - 22ms/step
Epoch 49/50
2/2 - 0s - loss: 0.0197 - val_loss: 0.0905 - 43ms/epoch - 21ms/step
Epoch 50/50
2/2 - 0s - loss: 0.0182 - val_loss: 0.0912 - 39ms/epoch - 20ms/step
5/5 [==============================] - 0s 4ms/step
No description has been provided for this image
Test RMSE: 0.302
In [ ]:
lstm_rsme_dict20 = {}

for airline in lstm_g.groups.keys():
  print(airline)
  n_epochs = 20
  n_batch = 72
  loss_type = 'mean_squared_error'
  model_rsme = lstm_run(airline, n_epochs, n_batch, loss_type)
  feat_lst = [model_rsme, n_epochs, n_batch]
  lstm_rsme_dict20[airline] = feat_lst
AA
Epoch 1/20
2/2 - 3s - loss: 0.2505 - val_loss: 0.3238 - 3s/epoch - 1s/step
Epoch 2/20
2/2 - 0s - loss: 0.1569 - val_loss: 0.2063 - 41ms/epoch - 20ms/step
Epoch 3/20
2/2 - 0s - loss: 0.1033 - val_loss: 0.1209 - 39ms/epoch - 20ms/step
Epoch 4/20
2/2 - 0s - loss: 0.0830 - val_loss: 0.0675 - 39ms/epoch - 19ms/step
Epoch 5/20
2/2 - 0s - loss: 0.0881 - val_loss: 0.0414 - 39ms/epoch - 20ms/step
Epoch 6/20
2/2 - 0s - loss: 0.1037 - val_loss: 0.0323 - 39ms/epoch - 20ms/step
Epoch 7/20
2/2 - 0s - loss: 0.1142 - val_loss: 0.0313 - 40ms/epoch - 20ms/step
Epoch 8/20
2/2 - 0s - loss: 0.1135 - val_loss: 0.0346 - 39ms/epoch - 20ms/step
Epoch 9/20
2/2 - 0s - loss: 0.1042 - val_loss: 0.0418 - 38ms/epoch - 19ms/step
Epoch 10/20
2/2 - 0s - loss: 0.0919 - val_loss: 0.0526 - 40ms/epoch - 20ms/step
Epoch 11/20
2/2 - 0s - loss: 0.0808 - val_loss: 0.0658 - 41ms/epoch - 20ms/step
Epoch 12/20
2/2 - 0s - loss: 0.0730 - val_loss: 0.0792 - 39ms/epoch - 19ms/step
Epoch 13/20
2/2 - 0s - loss: 0.0682 - val_loss: 0.0905 - 38ms/epoch - 19ms/step
Epoch 14/20
2/2 - 0s - loss: 0.0656 - val_loss: 0.0982 - 38ms/epoch - 19ms/step
Epoch 15/20
2/2 - 0s - loss: 0.0638 - val_loss: 0.1016 - 39ms/epoch - 19ms/step
Epoch 16/20
2/2 - 0s - loss: 0.0620 - val_loss: 0.1009 - 41ms/epoch - 20ms/step
Epoch 17/20
2/2 - 0s - loss: 0.0599 - val_loss: 0.0970 - 44ms/epoch - 22ms/step
Epoch 18/20
2/2 - 0s - loss: 0.0576 - val_loss: 0.0912 - 44ms/epoch - 22ms/step
Epoch 19/20
2/2 - 0s - loss: 0.0553 - val_loss: 0.0848 - 39ms/epoch - 20ms/step
Epoch 20/20
2/2 - 0s - loss: 0.0534 - val_loss: 0.0790 - 39ms/epoch - 20ms/step
5/5 [==============================] - 0s 3ms/step
No description has been provided for this image
Test RMSE: 0.281
AS
Epoch 1/20
2/2 - 3s - loss: 0.2586 - val_loss: 0.2728 - 3s/epoch - 1s/step
Epoch 2/20
2/2 - 0s - loss: 0.1418 - val_loss: 0.1784 - 41ms/epoch - 21ms/step
Epoch 3/20
2/2 - 0s - loss: 0.1053 - val_loss: 0.1295 - 40ms/epoch - 20ms/step
Epoch 4/20
2/2 - 0s - loss: 0.1237 - val_loss: 0.1106 - 42ms/epoch - 21ms/step
Epoch 5/20
2/2 - 0s - loss: 0.1560 - val_loss: 0.1047 - 41ms/epoch - 21ms/step
Epoch 6/20
2/2 - 0s - loss: 0.1717 - val_loss: 0.1025 - 39ms/epoch - 20ms/step
Epoch 7/20
2/2 - 0s - loss: 0.1657 - val_loss: 0.1024 - 39ms/epoch - 20ms/step
Epoch 8/20
2/2 - 0s - loss: 0.1469 - val_loss: 0.1053 - 41ms/epoch - 20ms/step
Epoch 9/20
2/2 - 0s - loss: 0.1252 - val_loss: 0.1113 - 39ms/epoch - 19ms/step
Epoch 10/20
2/2 - 0s - loss: 0.1073 - val_loss: 0.1191 - 41ms/epoch - 20ms/step
Epoch 11/20
2/2 - 0s - loss: 0.0954 - val_loss: 0.1263 - 42ms/epoch - 21ms/step
Epoch 12/20
2/2 - 0s - loss: 0.0885 - val_loss: 0.1310 - 40ms/epoch - 20ms/step
Epoch 13/20
2/2 - 0s - loss: 0.0844 - val_loss: 0.1318 - 40ms/epoch - 20ms/step
Epoch 14/20
2/2 - 0s - loss: 0.0812 - val_loss: 0.1288 - 42ms/epoch - 21ms/step
Epoch 15/20
2/2 - 0s - loss: 0.0781 - val_loss: 0.1230 - 40ms/epoch - 20ms/step
Epoch 16/20
2/2 - 0s - loss: 0.0751 - val_loss: 0.1159 - 38ms/epoch - 19ms/step
Epoch 17/20
2/2 - 0s - loss: 0.0726 - val_loss: 0.1089 - 40ms/epoch - 20ms/step
Epoch 18/20
2/2 - 0s - loss: 0.0710 - val_loss: 0.1031 - 40ms/epoch - 20ms/step
Epoch 19/20
2/2 - 0s - loss: 0.0700 - val_loss: 0.0989 - 40ms/epoch - 20ms/step
Epoch 20/20
2/2 - 0s - loss: 0.0689 - val_loss: 0.0964 - 43ms/epoch - 21ms/step
5/5 [==============================] - 0s 3ms/step
No description has been provided for this image
Test RMSE: 0.311
B6
Epoch 1/20
2/2 - 2s - loss: 0.2732 - val_loss: 0.5680 - 2s/epoch - 1s/step
Epoch 2/20
2/2 - 0s - loss: 0.1925 - val_loss: 0.4114 - 37ms/epoch - 19ms/step
Epoch 3/20
2/2 - 0s - loss: 0.1403 - val_loss: 0.2876 - 37ms/epoch - 18ms/step
Epoch 4/20
2/2 - 0s - loss: 0.1115 - val_loss: 0.1963 - 38ms/epoch - 19ms/step
Epoch 5/20
2/2 - 0s - loss: 0.1023 - val_loss: 0.1341 - 39ms/epoch - 20ms/step
Epoch 6/20
2/2 - 0s - loss: 0.1074 - val_loss: 0.0959 - 40ms/epoch - 20ms/step
Epoch 7/20
2/2 - 0s - loss: 0.1204 - val_loss: 0.0754 - 52ms/epoch - 26ms/step
Epoch 8/20
2/2 - 0s - loss: 0.1348 - val_loss: 0.0663 - 38ms/epoch - 19ms/step
Epoch 9/20
2/2 - 0s - loss: 0.1461 - val_loss: 0.0633 - 38ms/epoch - 19ms/step
Epoch 10/20
2/2 - 0s - loss: 0.1515 - val_loss: 0.0629 - 38ms/epoch - 19ms/step
Epoch 11/20
2/2 - 0s - loss: 0.1505 - val_loss: 0.0635 - 38ms/epoch - 19ms/step
Epoch 12/20
2/2 - 0s - loss: 0.1441 - val_loss: 0.0644 - 39ms/epoch - 19ms/step
Epoch 13/20
2/2 - 0s - loss: 0.1339 - val_loss: 0.0658 - 39ms/epoch - 20ms/step
Epoch 14/20
2/2 - 0s - loss: 0.1218 - val_loss: 0.0679 - 40ms/epoch - 20ms/step
Epoch 15/20
2/2 - 0s - loss: 0.1095 - val_loss: 0.0707 - 38ms/epoch - 19ms/step
Epoch 16/20
2/2 - 0s - loss: 0.0981 - val_loss: 0.0743 - 40ms/epoch - 20ms/step
Epoch 17/20
2/2 - 0s - loss: 0.0882 - val_loss: 0.0782 - 38ms/epoch - 19ms/step
Epoch 18/20
2/2 - 0s - loss: 0.0802 - val_loss: 0.0820 - 39ms/epoch - 19ms/step
Epoch 19/20
2/2 - 0s - loss: 0.0739 - val_loss: 0.0853 - 39ms/epoch - 20ms/step
Epoch 20/20
2/2 - 0s - loss: 0.0691 - val_loss: 0.0878 - 41ms/epoch - 20ms/step
4/4 [==============================] - 0s 3ms/step
No description has been provided for this image
Test RMSE: 0.296
DL
Epoch 1/20
1/1 - 3s - loss: 0.5279 - val_loss: 0.5600 - 3s/epoch - 3s/step
Epoch 2/20
1/1 - 0s - loss: 0.4563 - val_loss: 0.4856 - 36ms/epoch - 36ms/step
Epoch 3/20
1/1 - 0s - loss: 0.3912 - val_loss: 0.4180 - 35ms/epoch - 35ms/step
Epoch 4/20
1/1 - 0s - loss: 0.3324 - val_loss: 0.3570 - 36ms/epoch - 36ms/step
Epoch 5/20
1/1 - 0s - loss: 0.2800 - val_loss: 0.3024 - 36ms/epoch - 36ms/step
Epoch 6/20
1/1 - 0s - loss: 0.2336 - val_loss: 0.2541 - 37ms/epoch - 37ms/step
Epoch 7/20
1/1 - 0s - loss: 0.1931 - val_loss: 0.2120 - 38ms/epoch - 38ms/step
Epoch 8/20
1/1 - 0s - loss: 0.1585 - val_loss: 0.1761 - 35ms/epoch - 35ms/step
Epoch 9/20
1/1 - 0s - loss: 0.1296 - val_loss: 0.1462 - 36ms/epoch - 36ms/step
Epoch 10/20
1/1 - 0s - loss: 0.1062 - val_loss: 0.1223 - 36ms/epoch - 36ms/step
Epoch 11/20
1/1 - 0s - loss: 0.0884 - val_loss: 0.1041 - 45ms/epoch - 45ms/step
Epoch 12/20
1/1 - 0s - loss: 0.0756 - val_loss: 0.0911 - 37ms/epoch - 37ms/step
Epoch 13/20
1/1 - 0s - loss: 0.0677 - val_loss: 0.0829 - 37ms/epoch - 37ms/step
Epoch 14/20
1/1 - 0s - loss: 0.0640 - val_loss: 0.0785 - 36ms/epoch - 36ms/step
Epoch 15/20
1/1 - 0s - loss: 0.0637 - val_loss: 0.0771 - 36ms/epoch - 36ms/step
Epoch 16/20
1/1 - 0s - loss: 0.0659 - val_loss: 0.0776 - 36ms/epoch - 36ms/step
Epoch 17/20
1/1 - 0s - loss: 0.0696 - val_loss: 0.0789 - 37ms/epoch - 37ms/step
Epoch 18/20
1/1 - 0s - loss: 0.0736 - val_loss: 0.0802 - 40ms/epoch - 40ms/step
Epoch 19/20
1/1 - 0s - loss: 0.0772 - val_loss: 0.0809 - 38ms/epoch - 38ms/step
Epoch 20/20
1/1 - 0s - loss: 0.0798 - val_loss: 0.0808 - 36ms/epoch - 36ms/step
4/4 [==============================] - 0s 3ms/step
No description has been provided for this image
Test RMSE: 0.284
G4
Epoch 1/20
1/1 - 2s - loss: 0.0279 - val_loss: 0.2764 - 2s/epoch - 2s/step
Epoch 2/20
1/1 - 0s - loss: 0.0202 - val_loss: 0.3053 - 36ms/epoch - 36ms/step
Epoch 3/20
1/1 - 0s - loss: 0.0194 - val_loss: 0.3192 - 35ms/epoch - 35ms/step
Epoch 4/20
1/1 - 0s - loss: 0.0182 - val_loss: 0.3206 - 35ms/epoch - 35ms/step
Epoch 5/20
1/1 - 0s - loss: 0.0156 - val_loss: 0.3151 - 35ms/epoch - 35ms/step
Epoch 6/20
1/1 - 0s - loss: 0.0128 - val_loss: 0.3072 - 32ms/epoch - 32ms/step
Epoch 7/20
1/1 - 0s - loss: 0.0108 - val_loss: 0.3007 - 34ms/epoch - 34ms/step
Epoch 8/20
1/1 - 0s - loss: 0.0097 - val_loss: 0.2978 - 32ms/epoch - 32ms/step
Epoch 9/20
1/1 - 0s - loss: 0.0093 - val_loss: 0.2995 - 32ms/epoch - 32ms/step
Epoch 10/20
1/1 - 0s - loss: 0.0087 - val_loss: 0.3055 - 34ms/epoch - 34ms/step
Epoch 11/20
1/1 - 0s - loss: 0.0078 - val_loss: 0.3150 - 32ms/epoch - 32ms/step
Epoch 12/20
1/1 - 0s - loss: 0.0068 - val_loss: 0.3268 - 33ms/epoch - 33ms/step
Epoch 13/20
1/1 - 0s - loss: 0.0059 - val_loss: 0.3389 - 33ms/epoch - 33ms/step
Epoch 14/20
1/1 - 0s - loss: 0.0054 - val_loss: 0.3494 - 33ms/epoch - 33ms/step
Epoch 15/20
1/1 - 0s - loss: 0.0052 - val_loss: 0.3565 - 36ms/epoch - 36ms/step
Epoch 16/20
1/1 - 0s - loss: 0.0050 - val_loss: 0.3596 - 31ms/epoch - 31ms/step
Epoch 17/20
1/1 - 0s - loss: 0.0047 - val_loss: 0.3587 - 37ms/epoch - 37ms/step
Epoch 18/20
1/1 - 0s - loss: 0.0042 - val_loss: 0.3549 - 35ms/epoch - 35ms/step
Epoch 19/20
1/1 - 0s - loss: 0.0037 - val_loss: 0.3497 - 34ms/epoch - 34ms/step
Epoch 20/20
1/1 - 0s - loss: 0.0032 - val_loss: 0.3446 - 33ms/epoch - 33ms/step
2/2 [==============================] - 0s 4ms/step
No description has been provided for this image
Test RMSE: 0.587
HA
Epoch 1/20
1/1 - 3s - loss: 0.3407 - val_loss: 0.4060 - 3s/epoch - 3s/step
Epoch 2/20
1/1 - 0s - loss: 0.2746 - val_loss: 0.3141 - 33ms/epoch - 33ms/step
Epoch 3/20
1/1 - 0s - loss: 0.2220 - val_loss: 0.2403 - 32ms/epoch - 32ms/step
Epoch 4/20
1/1 - 0s - loss: 0.1827 - val_loss: 0.1836 - 35ms/epoch - 35ms/step
Epoch 5/20
1/1 - 0s - loss: 0.1558 - val_loss: 0.1426 - 34ms/epoch - 34ms/step
Epoch 6/20
1/1 - 0s - loss: 0.1401 - val_loss: 0.1153 - 36ms/epoch - 36ms/step
Epoch 7/20
1/1 - 0s - loss: 0.1335 - val_loss: 0.0990 - 34ms/epoch - 34ms/step
Epoch 8/20
1/1 - 0s - loss: 0.1332 - val_loss: 0.0905 - 35ms/epoch - 35ms/step
Epoch 9/20
1/1 - 0s - loss: 0.1361 - val_loss: 0.0868 - 34ms/epoch - 34ms/step
Epoch 10/20
1/1 - 0s - loss: 0.1396 - val_loss: 0.0858 - 33ms/epoch - 33ms/step
Epoch 11/20
1/1 - 0s - loss: 0.1415 - val_loss: 0.0863 - 35ms/epoch - 35ms/step
Epoch 12/20
1/1 - 0s - loss: 0.1410 - val_loss: 0.0882 - 36ms/epoch - 36ms/step
Epoch 13/20
1/1 - 0s - loss: 0.1379 - val_loss: 0.0917 - 33ms/epoch - 33ms/step
Epoch 14/20
1/1 - 0s - loss: 0.1325 - val_loss: 0.0974 - 35ms/epoch - 35ms/step
Epoch 15/20
1/1 - 0s - loss: 0.1257 - val_loss: 0.1057 - 35ms/epoch - 35ms/step
Epoch 16/20
1/1 - 0s - loss: 0.1182 - val_loss: 0.1171 - 35ms/epoch - 35ms/step
Epoch 17/20
1/1 - 0s - loss: 0.1108 - val_loss: 0.1315 - 34ms/epoch - 34ms/step
Epoch 18/20
1/1 - 0s - loss: 0.1040 - val_loss: 0.1489 - 33ms/epoch - 33ms/step
Epoch 19/20
1/1 - 0s - loss: 0.0982 - val_loss: 0.1689 - 33ms/epoch - 33ms/step
Epoch 20/20
1/1 - 0s - loss: 0.0936 - val_loss: 0.1908 - 33ms/epoch - 33ms/step
1/1 [==============================] - 0s 407ms/step
No description has been provided for this image
Test RMSE: 0.437
NK
Epoch 1/20
1/1 - 3s - loss: 0.5218 - val_loss: 0.2882 - 3s/epoch - 3s/step
Epoch 2/20
1/1 - 0s - loss: 0.4567 - val_loss: 0.2644 - 36ms/epoch - 36ms/step
Epoch 3/20
1/1 - 0s - loss: 0.3965 - val_loss: 0.2497 - 41ms/epoch - 41ms/step
Epoch 4/20
1/1 - 0s - loss: 0.3414 - val_loss: 0.2446 - 40ms/epoch - 40ms/step
Epoch 5/20
1/1 - 0s - loss: 0.2916 - val_loss: 0.2491 - 38ms/epoch - 38ms/step
Epoch 6/20
1/1 - 0s - loss: 0.2472 - val_loss: 0.2632 - 39ms/epoch - 39ms/step
Epoch 7/20
1/1 - 0s - loss: 0.2084 - val_loss: 0.2863 - 40ms/epoch - 40ms/step
Epoch 8/20
1/1 - 0s - loss: 0.1754 - val_loss: 0.3172 - 40ms/epoch - 40ms/step
Epoch 9/20
1/1 - 0s - loss: 0.1481 - val_loss: 0.3543 - 35ms/epoch - 35ms/step
Epoch 10/20
1/1 - 0s - loss: 0.1264 - val_loss: 0.3950 - 37ms/epoch - 37ms/step
Epoch 11/20
1/1 - 0s - loss: 0.1100 - val_loss: 0.4367 - 35ms/epoch - 35ms/step
Epoch 12/20
1/1 - 0s - loss: 0.0985 - val_loss: 0.4763 - 36ms/epoch - 36ms/step
Epoch 13/20
1/1 - 0s - loss: 0.0911 - val_loss: 0.5111 - 38ms/epoch - 38ms/step
Epoch 14/20
1/1 - 0s - loss: 0.0869 - val_loss: 0.5389 - 35ms/epoch - 35ms/step
Epoch 15/20
1/1 - 0s - loss: 0.0850 - val_loss: 0.5579 - 36ms/epoch - 36ms/step
Epoch 16/20
1/1 - 0s - loss: 0.0845 - val_loss: 0.5675 - 36ms/epoch - 36ms/step
Epoch 17/20
1/1 - 0s - loss: 0.0844 - val_loss: 0.5676 - 36ms/epoch - 36ms/step
Epoch 18/20
1/1 - 0s - loss: 0.0841 - val_loss: 0.5591 - 35ms/epoch - 35ms/step
Epoch 19/20
1/1 - 0s - loss: 0.0831 - val_loss: 0.5429 - 35ms/epoch - 35ms/step
Epoch 20/20
1/1 - 0s - loss: 0.0812 - val_loss: 0.5206 - 35ms/epoch - 35ms/step
3/3 [==============================] - 0s 4ms/step
No description has been provided for this image
Test RMSE: 0.722
UA
Epoch 1/20
2/2 - 3s - loss: 0.3601 - val_loss: 0.4134 - 3s/epoch - 1s/step
Epoch 2/20
2/2 - 0s - loss: 0.2574 - val_loss: 0.2877 - 39ms/epoch - 19ms/step
Epoch 3/20
2/2 - 0s - loss: 0.2029 - val_loss: 0.1943 - 38ms/epoch - 19ms/step
Epoch 4/20
2/2 - 0s - loss: 0.1885 - val_loss: 0.1330 - 40ms/epoch - 20ms/step
Epoch 5/20
2/2 - 0s - loss: 0.2026 - val_loss: 0.0981 - 38ms/epoch - 19ms/step
Epoch 6/20
2/2 - 0s - loss: 0.2273 - val_loss: 0.0808 - 40ms/epoch - 20ms/step
Epoch 7/20
2/2 - 0s - loss: 0.2465 - val_loss: 0.0735 - 40ms/epoch - 20ms/step
Epoch 8/20
2/2 - 0s - loss: 0.2519 - val_loss: 0.0718 - 42ms/epoch - 21ms/step
Epoch 9/20
2/2 - 0s - loss: 0.2436 - val_loss: 0.0737 - 38ms/epoch - 19ms/step
Epoch 10/20
2/2 - 0s - loss: 0.2265 - val_loss: 0.0786 - 40ms/epoch - 20ms/step
Epoch 11/20
2/2 - 0s - loss: 0.2062 - val_loss: 0.0860 - 40ms/epoch - 20ms/step
Epoch 12/20
2/2 - 0s - loss: 0.1869 - val_loss: 0.0945 - 42ms/epoch - 21ms/step
Epoch 13/20
2/2 - 0s - loss: 0.1708 - val_loss: 0.1027 - 39ms/epoch - 20ms/step
Epoch 14/20
2/2 - 0s - loss: 0.1583 - val_loss: 0.1092 - 42ms/epoch - 21ms/step
Epoch 15/20
2/2 - 0s - loss: 0.1489 - val_loss: 0.1128 - 40ms/epoch - 20ms/step
Epoch 16/20
2/2 - 0s - loss: 0.1417 - val_loss: 0.1131 - 39ms/epoch - 20ms/step
Epoch 17/20
2/2 - 0s - loss: 0.1359 - val_loss: 0.1104 - 41ms/epoch - 20ms/step
Epoch 18/20
2/2 - 0s - loss: 0.1311 - val_loss: 0.1052 - 41ms/epoch - 20ms/step
Epoch 19/20
2/2 - 0s - loss: 0.1272 - val_loss: 0.0986 - 40ms/epoch - 20ms/step
Epoch 20/20
2/2 - 0s - loss: 0.1240 - val_loss: 0.0915 - 39ms/epoch - 20ms/step
5/5 [==============================] - 0s 3ms/step
No description has been provided for this image
Test RMSE: 0.302
WN
Epoch 1/20
2/2 - 2s - loss: 0.2861 - val_loss: 0.1369 - 2s/epoch - 1s/step
Epoch 2/20
2/2 - 0s - loss: 0.1983 - val_loss: 0.1659 - 42ms/epoch - 21ms/step
Epoch 3/20
2/2 - 0s - loss: 0.1582 - val_loss: 0.2075 - 47ms/epoch - 24ms/step
Epoch 4/20
2/2 - 0s - loss: 0.1468 - val_loss: 0.2316 - 39ms/epoch - 20ms/step
Epoch 5/20
2/2 - 0s - loss: 0.1440 - val_loss: 0.2308 - 43ms/epoch - 22ms/step
Epoch 6/20
2/2 - 0s - loss: 0.1398 - val_loss: 0.2135 - 40ms/epoch - 20ms/step
Epoch 7/20
2/2 - 0s - loss: 0.1335 - val_loss: 0.1895 - 41ms/epoch - 21ms/step
Epoch 8/20
2/2 - 0s - loss: 0.1277 - val_loss: 0.1660 - 40ms/epoch - 20ms/step
Epoch 9/20
2/2 - 0s - loss: 0.1241 - val_loss: 0.1470 - 41ms/epoch - 20ms/step
Epoch 10/20
2/2 - 0s - loss: 0.1227 - val_loss: 0.1336 - 38ms/epoch - 19ms/step
Epoch 11/20
2/2 - 0s - loss: 0.1222 - val_loss: 0.1252 - 40ms/epoch - 20ms/step
Epoch 12/20
2/2 - 0s - loss: 0.1211 - val_loss: 0.1210 - 39ms/epoch - 19ms/step
Epoch 13/20
2/2 - 0s - loss: 0.1183 - val_loss: 0.1200 - 53ms/epoch - 27ms/step
Epoch 14/20
2/2 - 0s - loss: 0.1138 - val_loss: 0.1218 - 50ms/epoch - 25ms/step
Epoch 15/20
2/2 - 0s - loss: 0.1087 - val_loss: 0.1255 - 40ms/epoch - 20ms/step
Epoch 16/20
2/2 - 0s - loss: 0.1037 - val_loss: 0.1301 - 39ms/epoch - 19ms/step
Epoch 17/20
2/2 - 0s - loss: 0.0994 - val_loss: 0.1340 - 39ms/epoch - 20ms/step
Epoch 18/20
2/2 - 0s - loss: 0.0958 - val_loss: 0.1359 - 39ms/epoch - 20ms/step
Epoch 19/20
2/2 - 0s - loss: 0.0926 - val_loss: 0.1350 - 39ms/epoch - 20ms/step
Epoch 20/20
2/2 - 0s - loss: 0.0892 - val_loss: 0.1313 - 45ms/epoch - 23ms/step
5/5 [==============================] - 0s 3ms/step
No description has been provided for this image
Test RMSE: 0.362
In [ ]:
lstm_results50 = pd.DataFrame.from_dict(lstm_rsme_dict50,  orient='index',
                       columns=['RMSE', 'Epochs', 'Batch Size'])
lstm_results50 = lstm_results50.reset_index()

lstm_results50['Airline'] = lstm_results50['index'].map(MKT_code)
lstm_results50.drop('index', axis=1, inplace=True)
lstm_results50 = lstm_results50.set_index('Airline')
lstm_results50
Out[ ]:
RMSE Epochs Batch Size
Airline
American 0.440847 50 72
Alaskan 0.388948 50 72
JetBlue 0.304574 50 72
Delta 0.214968 50 72
Allegiant 0.555687 50 72
Hawaiian 0.698020 50 72
Spirit 0.395131 50 72
United 0.236709 50 72
Southwest 0.302007 50 72
In [ ]:
lstm_results20 = pd.DataFrame.from_dict(lstm_rsme_dict20,  orient='index',
                       columns=['RMSE', 'Epochs', 'Batch Size'])
lstm_results20 = lstm_results20.reset_index()

lstm_results20['Airline'] = lstm_results20['index'].map(MKT_code)
lstm_results20.drop('index', axis=1, inplace=True)
lstm_results20 = lstm_results20.set_index('Airline')
lstm_results20
Out[ ]:
RMSE Epochs Batch Size
Airline
American 0.280981 20 72
Alaskan 0.310535 20 72
JetBlue 0.296296 20 72
Delta 0.284168 20 72
Allegiant 0.587060 20 72
Hawaiian 0.436785 20 72
Spirit 0.721506 20 72
United 0.302426 20 72
Southwest 0.362422 20 72

Conclusion¶

Over the course of this project, it became clear that stocks are inherently hard to predict and model. Analyzing operational metrics and company financials and combining these analytics with cancellations and delays offers a unique dataset to train an ML algorithm on. After logistic regression had failed, the choice to move to LSTM resulted in lower RMSEs across each airline. While this model was not perfect, it improved our prior findings significantly. We saw a higher RMSE for Hawaiian and Allegiant but this could be due to the small nature of these airlines. Furthermore, Allegiant is an operating carrier and a subsidy of Delta, complicating forecasting attempts.