Skip to content

Big Mart Sales prediction

Lidia edited this page Sep 11, 2025 · 6 revisions

This page contains an example for a real case scenario, a pipeline that tackles the Big Mart Sales Prediction Practice Problem hackathon. The dataset is explained in the Problem Statement tab. Summing up, some information from an item is given and the idea is to predict the sale price of a given item in a given outlet.

Dataset information

For this problem, the organizers provide two files, one used for training and the other for performing a prediction and doing a submission. The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. The train dataset has 8523 samples and the submission one has 5681 values.

The pipeline

Main editor

The pipeline is composed of 7 stages:

  • Data Collection: data for training, testing and then preparing the submission for the hackthon is loaded.
  • Data Cleaning: columns that may have null values are replaced following different strategies. This stage is performed both for the train/test data and the valid/submission data. Note that there are two data cleaning stages, the one with the * is reused as it needs to be performed for both datasets.
  • Feature Engineering: transformations on the data are performed to simplify the data and ensure the model only receives numeric data as inputs, as it can't process categoric data.
  • Model Train: data is splitted in train and test datasets and the model is trained
  • Model Evaluation: the accuracy of the model is assessed using the test dataset
  • Deployment: the model is used to perform a prediction with the submission dataset and the result is stored into a CSV.

All the information required to reproduce the Big Mart Sales prediction example and the generated code of the ML pipeline can be found on the mls_pipeline examples repository.

The results

During the evaluation the model showed up a 61% accuracy. Once submited our results to the hackathon and got a score of 1152.2938, placing our submission in the 1189 place of the total 52384 participants by the time of the submission (February 18th, 2025).

Submission results

Clone this wiki locally