Welcome Data Scientist to the 3rd SDS Club Monthly Challenge! In this month’s challenge you are helping your friend search for a job. Your friend has found thousands of job ads online and is trying to pick some to apply to. Your friends has heard that there are a lot of fraudulent job ads that are actually scams. Your mission is to help your friend by predicting whether a job is fraudulent based on the data provided.
Understanding the Dataset
Each column in the dataset is labeled and explained in more detail below.
title – title of the job in ad
location – location of job ad
department – corporate department
salary_range – salary range of job
company_profile – description of company
description – description of position
requirements – description of job requirements
benefits – benefits offered by the employer
telecommuting – if telecommuting position
has_company_logo – if the company’s logo is present in the ad
has_questions – if interview questions are present in ad
employment_type – type of employment (full-time, part-time, contract, etc.)
required_experience – required experience for job (master’s degree, bachelor, doctorate, etc.)
industry – industry of company (Construction, Health Care, IT, etc.)
function – function of company within industry (consulting, sales, research, etc.)
fraudulent – whether job is fraudulent or not
public_jobs.csv – Dataset to train and analyze
pred_jobs.csv – Dataset to predict whether or not a job posting is fraudulent
All submissions should be sent through email to firstname.lastname@example.org. When submitting, the file should contain predictions made on the pred_jobs.csv file, and it should have the following format:
Remember, for the submission to be valid, the predictions MUST BE in a .csv format.
The data was collected and published by The University of the Aegean, Laboratory of Information & Communication Systems Security.