UHG
Search
Close this search box.

Hackathon alert! MachineHack launches ‘Data Engineering Championship’ as part of DES2022 Summit

Share

Hackathon alert! MachineHack launches ‘Data Engineering Championship’ as part of DES2022 Summit

Illustration by Hackathon alert! MachineHack launches ‘Data Engineering Championship’ as part of DES2022 Summit

Data Engineering Summit 2022, presented by Google Cloud and organised by Analytics India Magazine, is India’s first conference dedicated to the high-demand and impactful field of data engineering. This virtual conference, to be held on April 30, 2022, will focus on data engineering innovation and give attendees direct access to top engineers and innovators working in leading tech companies. 

This will be a golden opportunity for attendees to learn about the software deployment architecture of machine learning systems, how to produce the latest data frameworks and solutions for business use cases from the very best in the field.

Data Engineering Championship by MachineHack

MachineHack is organising a data engineering hackathon for data scientists & data engineers to participate and win a chance to present at DES 2022. 

Data engineering consists of collecting, provisioning and maintaining excellent quality data to get insights. In order to do that, a data engineer needs to design and develop a scalable data architecture, set up processes that pool data from multiple sources, check the data quality, and eliminate corrupt data. In addition, exploratory data analysis (EDA) and extract, transform, and load (ETL) techniques are required to access and use downstream to solve business problems.

START DATE: 13th April 2022, 6:00 PM

END DATE: 30th May 2022, 6:00 PM

REGISTER NOW

All you need to know about the ‘Data Engineering Championship’

With the dataset provided, the participants need to analyse and create features of the following description.

  • ‘DATE’: create the date from year, month and day of the week 
  • ‘LOW’: Lower value of DEP_TIME_BLK
  • ‘HIGH’: Higher value of DEP_TIME_BLK
  • ‘TIMESTAMP’: create a timestamp with date and lower value of DEP_TIME_BLK
  • ‘WIND_CHILL’: the perceived temperature due to cooling effect of wind blowing
  • ‘PRCP_SNOW_RATIO’: ratio of precipitation and snow
  • ‘PLANE_AGE_AIRLINE_AIRPORT_FLIGHTS_MONTH_RATIO’: ratio of plane age and airline and airport flights months.
  • ‘SEAT_DISTRIBUTION’: Ratio of seats and in  concurrent flight CONCURRENT_FLIGHTS
  • ‘SEAT_DISTRIBUTION_NORMALISED’: normalized values of ratio of seats and in  concurrent flight 

Evaluation 

In order to calculate the winners of the hackathon, the submissions will be evaluated using the mean absolute error. One can use sklearn.metrics.mean absolute error to calculate the same mean_squared_error(y_true, y_pred, squared=False).

This hackathon will support private and public leaderboards.

  • The public leaderboard is evaluated on 30% of the dataset
  • The private leaderboard will be made available at the end of the hackathon, which will be evaluated on 100% of the dataset
  • The final score represents the score achieved based on the Best Score on the public leaderboard

How to generate a valid submission file?

In order to submit your file, the following steps have to be kept in mind.

  • Sklearn models should support the predict() method to generate the predicted values.
  • The participant should submit a .csv file with exactly 2,00,00 rows with 9 columns. The submission will return an Invalid Score if you have extra rows or columns.
  • The file should have exactly 9 columns.

Points to note:

  • One should not shuffle the sequence of the test series
  • If you are using pandas, use the following submission code:

submission_df.to_csv(‘my_submission_file.csv’, index=False

Dataset: 200000 rows x 26 columns

  • MONTH: Month
  • DAY_OF_WEEK: Day of Week
  • DEP_DEL15: TARGET Binary of a departure delay over 15 minutes (1 is yes)
  • DISTANCE_GROUP: Distance group to be flown by departing aircraft
  • DEP_BLOCK: Departure block
  • SEGMENT_NUMBER: The segment that this tail number is on for the day
  • CONCURRENT_FLIGHTS: Concurrent flights leaving from the airport in the same departure block
  • NUMBER_OF_SEATS: Number of seats on the aircraft
  • CARRIER_NAME: Carrier
  • AIRPORT_FLIGHTS_MONTH: Avg Airport Flights per Month
  • AIRLINE_FLIGHTS_MONTH: Avg Airline Flights per Month
  • AIRLINE_AIRPORT_FLIGHTS_MONTH: Avg Flights per month for Airline AND Airport
  • AVG_MONTHLY_PASS_AIRPORT: Avg Passengers for the departing airport for the month
  • AVG_MONTHLY_PASS_AIRLINE: Avg Passengers for the airline for the month
  • FLT_ATTENDANTS_PER_PASS: Flight attendants per passenger for airline
  • GROUND_SERV_PER_PASS: Ground service employees (service desk) per passenger for airline
  • PLANE_AGE: Age of departing aircraft
  • DEPARTING_AIRPORT: Departing Airport
  • LATITUDE: Latitude of departing airport
  • LONGITUDE: Longitude of departing airport
  • PREVIOUS_AIRPORT: Previous airport that aircraft departed from
  • PRCP: Inches of precipitation for the day
  •  SNOW: Inches of snowfall for the day
  • SNWD: Inches of snow on the ground for the day
  • TMAX: Max temperature for the day
  • AWND: Max wind speed for the day

START DATE: 13th April 2022, 6:00 PM

END DATE: 30th May 2022, 6:00 PM

REGISTER NOW

Prize

The three winners will be getting a chance to present their solution approaches at the Data Engineering Summit (DES 2022).

Submission deadline

If you want to be a part of this exciting hackathon, make sure to submit your entries by May 30, 2022, at 06:00 PM IST, as the private leaderboard will be frozen at that time.

Disqualification

  • If any of the details entered are found incorrect, Analytics India Magazine reserves the right to disqualify any participant.
  • Any external dataset usage is strictly prohibited. The participants will be disqualified if found using any external dataset.

So what are you waiting for? Register now to participate in this hackathon.

📣 Want to advertise in AIM? Book here

Picture of Sreejani Bhattacharyya

Sreejani Bhattacharyya

I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.