February Challenge

Problem Statement

Welcome Data Scientist to the 6th SDS Club Monthly Challenge! In this month’s challenge you will be helping hospitals and medical centers to determine whether a patient will show up for their appointment. Close to 20% of patients worldwide miss their appointments, this costs medical providers millions of dollars annually. Your mission to help predict whether a given patient will show-up to their appointment.


accuracy = \frac{TP + TN}{TP + TN + FP + FN}

Understanding the Dataset

Each column in the dataset is labeled and explained in more detail below.

PatientId – the patient’s id
AppointmentId – the patients’ appointment’s id
Gender – patient’s gender
ScheduledDay – day the appointment was scheduled (should be before AppointmentDay)
AppointmentDay – day of the scheduled appointment
Age – patient’s age
Neighbourhood – neighbourhood of where the appointment will take place
Scholarship – whether the patient is receiving welfare or not
Hypertension – whether the patient has hypertension
Diabetes – whether the patient has diabetes
Alcoholism – whether the patient suffers from alcoholism
Handicap – whether the patient is handicapped
SMS_received – whether the patient was sent a text message notifying them of their appointment
No-show – whether the patient was a no-show (True -> patient didn’t show up, False -> patient showed up)

Dataset Files

public_appointments.csv – Dataset to train and analyze
pred_appointments.csv – Dataset to predict whether the patient showed up for their appointment


All submissions should be sent through email to challenges@superdatascience.com. When submitting, the file should contain predictions made on the pred_questions.csv file, and it should have the following format:In [ ]:



The data was collection by Joni Hoppen and Aquarela Advanced Analytics.