Predicting trip type of Walmart store visits.

The Problem

The challenge here was to classify customer trips to Walmart stores into 38 distinct trip types using only a transactional dataset of the items they’ve purchased. The objective behind this was to improve customers’ shopping experiences by segmenting their store visits into different trip types. The input features consisted of weekday of purchase, counts of items purchased, item codes, department description and finer product category.

Credit: Photo by Jomjakkapat Parrueng on Unsplash

Solution

I came across this competetion after it had already ended but found it very interesting and decided to work on it.

The appraoch that placed my model in the top 35% submitted models comprised of combining the power of multiple weaker classifiers optimally along with feature engineering to boost the predictive power of my model. Data was pre-processed and new features were created based on the model to be applied. For each visit, counts of items by department and by finer category, time based features and features based on missing data and product returns were created. Frequency encoding of high cardinality categorical variables like UPC numbers and finer category numbers were also created.

Bootstrapping was applied to Logistic Regression and KMeans classifiers to make them more robust. Finally 2-stage 5-fold Stacked Generalization was applied to combine the predictive power of the various models optimally.

Figure: Data procecssing and modeling pipleine involving 2-stage stacking of different models with appropriately processed feature sets.

Project Website (coming soon)

For details and for code please see the project website on Github.

Acknowledgement and Disclaimer

Kaggle has kindly permitted to use their logo for this website. However myself and this website are not officially connected to Kaggle.