November Challenge

Sunday 1, November 2020

Problem Statement

Welcome Data Scientist to the 3rd SDS Club Monthly Challenge! In this month’s challenge you are helping your friend search for a job. Your friend has found thousands of job ads online and is trying to pick some to apply to. Your friends has heard that there are a lot of fraudulent job ads that are actually scams. Your mission is to help your friend by predicting whether a job is fraudulent based on the data provided.

This image has an empty alt attribute; its file name is startchallenge.png

Evaluation

$$\begin{equation*}
accuracy = \frac{TP + TN}{TP + TN + FP + FN}
\end{equation*}$$

Understanding the Dataset

Each column in the dataset is labeled and explained in more detail below.

title – title of the job in ad
location – location of job ad
department – corporate department
salary_range – salary range of job
company_profile – description of company
description – description of position
requirements – description of job requirements
benefits – benefits offered by the employer
telecommuting – if telecommuting position
has_company_logo – if the company’s logo is present in the ad
has_questions – if interview questions are present in ad
employment_type – type of employment (full-time, part-time, contract, etc.)
required_experience – required experience for job (master’s degree, bachelor, doctorate, etc.)
industry – industry of company (Construction, Health Care, IT, etc.)
function – function of company within industry (consulting, sales, research, etc.)
fraudulent – whether job is fraudulent or not

Dataset Files

public_jobs.csv – Dataset to train and analyze
pred_jobs.csv – Dataset to predict whether or not a job posting is fraudulent

Submission

All submissions should be sent through email to challenges@superdatascience.com. When submitting, the file should contain predictions made on the pred_jobs.csv file, and it should have the following format:

Remember, for the submission to be valid, the predictions MUST BE in a .csv format.

Acknowledgements

The data was collected and published by The University of the Aegean, Laboratory of Information & Communication Systems Security.