September challenge
Welcome, Data Scientist!
You have recently been hired by the US Department of Transportation (DOT) to analyze data from multiple airline carriers in the United States. The DOT wants to help airline carriers reduce the number of flight cancellations and improve travelers’ experiences. Your job is to help the DOT predict whether or not a flight will be canceled based on the data provided.
The challenge is yours, if you wish to accept it!
Evaluation
Understanding the Dataset
Each column in the dataset is labeled and explained in more detail below.
YEAR: Year in which the flight was scheduled to take place
MONTH: Month in which the flight was scheduled to take place
DAY: Day of the month the flight was scheduled to take place
DAY_OF_WEEK: Day of the week the flight took place
AIRLINE: Initials of the airline that was scheduled to carry out the flight
FLIGHT_NUMBER: Initials of the airline that was scheduled to carry out the flight
TAIL_NUMBER: Tail Number of the plane that was scheduled to carry out the flight
ORIGIN_AIRPORT: Location of the airport that the flight was scheduled to depart from
DESTINATION_AIRPORT: Location of the airport that the flight was scheduled to arrive at
SCHEDULED_DEPARTURE: Scheduled Departure time of flight
SCHEDULED_TIME: Amount of time flight was scheduled to take
DISTANCE: Distance between ORIGIN_AIRPORT and DESTINATION_AIRPORT
SCHEDULED_ARRIVAL: Flight’s scheduled time of arrival
CANCELLED: Flight’s cancellation status
Dataset Files
public_flights.csv – Dataset to train and analyze
pred_flights.csv – Dataset to predict flights’ cancellation status
Submission Format
All submissions should be sent through email to challenges@superdatascience.com. The file should contain predictions made on the pred_flights.csv file, and it should have the following format:
Acknowledgments
The flight cancellation data was collected and published by the DOT’s Bureau of Transportation Statistics.