Instacart Market Basket Analysis : Part 3 (Deployment)

6 min readApr 5, 2021

This is a 3-part series on end to end case study based on Kaggle problem.

In the last 2 posts, we discussed the business problem , EDA, F1 — Maximization, Feature Engineering and Trained few models.

Please refer Part 1 and Part 2 before moving forward.

Cold Start Problems
Solutions for Cold Start Problems
Build the pipeline
Build Web-API
Predictions
Future Work
End Notes
References

Cold Start Problems

Before we move on to deployment part , we need to address few cold start issues here.

New user : What products can we recommend to a new user ?

There can be many solutions for this , few of them can be :

provide most frequently purchased product
provide most frequently purchased products based on hour and day of week. ( we will use this one)

2. days_since_prior_order : What to do when we don’t have user’s last order date ?

As mentioned in the previous post, days_since_prior_order is an important feature, but after deploying the model, for a future order we can calculate this value only if we have user’s last order date.

To handle this, I have assumed that all users placed their last order on 21–3–2021 (assuming that is the time of deployment ) , and for any future order , I can calculate the difference in days , thus giving us days_since_prior_order feature.

Note:

This change introduced new challenges, such as there will be users , who never made a purchase in a window of ’n’ days where n ={0,1,…}. and we don’t have some misc. features ( specifically p_days_since_prior_order_reorder_rate, u_days_since_prior_order_reorder_rate and days_since_prior_reorder_rate , refer last post) as those combination might not exist in our training set.

For such cases, features depending on days_since_prior_order in misc. features are set to 0.0

Solutions for Cold Start Problems

Prediction for new users

We will now generate a pickle file with top 10 products for each hour for any given day of week

Top 10 products of 5th hour of 5th day of week

This will be used in cold start problem of New User

User last order date

Now that we have established the importance of days_since_prior_order feature , so we need to generate a file containing last order date of every user. Which in this case is 21–3–2021.

Build the Pipeline

Get user id as Input
Get current time and day of week using python datetime package.

Get current time and day of week

We will use python datetime package to extract Order_hour_of_day and Order_dow from current time

Read user_last_purchase.pkl
If user is not in user_last_purchase.pkl, i.e. a new user, then read top10_products.pkl

New user predictions

If the user doesn’t exist in user_last_purchase.pkl, we will select 10 products from top10_products.pkl based on Order_hour_of_day and Order_dow.

If it is an existing user, calculate days_since_prior_order

days_since_prior_order

user_last_order_date = ulp[ulp['user_id']==user_id]['date'].values.tolist()[0]days_since_prior_order = today - int(user_last_order_date.split('-')[-1])
    
del ulp, now, today, dt_string, user_last_order_date

Read all files which were built while generating features

read all necessary files

Filter these files based on user ID, order_dow, order_hour_of_day and days_since_prior_order.

Note:

Since there are 206209 users in total and entire training set was of shape (8474661 , 32). So, we saved intermediate files, which we will filter based on user ID and merge them to generate features at the run time.

Handle features based on days_since_prior_order feature

As discussed in cold start section, there might be cases when, we don’t have some misc. features ( features based on days_since_prior_order) as those combination might not exist in our training set. We will generate a new rows and set those values to 0.0

Inner join based on user ID
User these features to predict the reorder probability

First we load the model, then predict using the model. Finally , use F1- Maximization to get most probable products for this user

This entire pipeline is wrapped inside a function get_recommendations()

Build Web-API

A web API allows for information to be manipulated by other programs over the network. Flask is a web framework for Python, which helps in building web applications.

We build simple a Web-API now.

Code Walkthrough

Import all necessary libraries
Import get_recommendations from get_predictions.py → This will give recommendations based on user ID
Create a Flask object with name app with variable __name__ , which will be accessed by __main__ .
If the homepage is accessed i.e. URL with (‘/’), then the decorator @app.route(’/’) will execute home() function and flask with render index.html → a html file for homepage. By default, a route only answers to GET requests.
Once user enters the User ID ( new / existing), we will accept this request as a dictionary in our predict function.
predict function generates recommendations using get_recommendation function which is explained above.
From this point we will redirect to different webpages if user is new or existing.
Since predict is decorated with @app.route(’/predict’, method = ['POST]) , we will redirect to new_user_recommendation.html if its a new user or to predict.html if its the existing user and post the data in form of dictionary.
These webpages will display the recommendations.

I deployed this application locally at 0.0.0.0/8000

Predictions

local deployment of the model

The landing page

For existing user, say User_ID = 206209

The recommendations are based on user’s previous purchases

For a new user, say User_ID = 226172

These recommendations are based on most frequently bought items at this hour, on this day of week.

Future Work

Deploy this application on a remote server using AWS.
Display Images of products along with the names instead of names alone.
To find an end to end Deep Learning solution for this problem.
Extend this solution, to provide even more recommendations , such as for each product from the recommendations, suggest an item which was most frequently purchased with it . This can be done using Apriori Algorithm.

End Notes

This marks end of the case study. Right from understanding the case study to deployment, I tried to put in maximum information with crisp code snippets.

Feel free to reach out, to discuss more on this . I’d be happy to receive feedback.

If you want to check out whole code , please refer my Github repo below.

asagar60/Instacart-Market-Basket-Analysis

End to End case study based on Kaggle problem. Contribute to asagar60/Instacart-Market-Basket-Analysis development by…

github.com

You can connect with me on LinkedIn

Arun Sagar - South West Delhi, Delhi, India | Professional Profile | LinkedIn

View Arun Sagar's profile on LinkedIn, the world's largest professional community. Arun has 1 job listed on their…

www.linkedin.com

References

Flask Tutorials
AppliedAI course
HTML and CSS tutorials