Image source

Instacart Market Basket Analysis : Part 3 (Deployment)

Arun Sagar
6 min readApr 5, 2021

This is a 3-part series on end to end case study based on Kaggle problem.

In the last 2 posts, we discussed the business problem , EDA, F1 — Maximization, Feature Engineering and Trained few models.

Please refer Part 1 and Part 2 before moving forward.

Table of Contents

  1. Cold Start Problems
  2. Solutions for Cold Start Problems
  3. Build the pipeline
  4. Build Web-API
  5. Predictions
  6. Future Work
  7. End Notes
  8. References

Cold Start Problems

Before we move on to deployment part , we need to address few cold start issues here.

  1. New user : What products can we recommend to a new user ?

There can be many solutions for this , few of them can be :

  • provide most frequently purchased product
  • provide most frequently purchased products based on hour and day of week. ( we will use this one)

2. days_since_prior_order : What to do when we don’t have user’s last order date ?

As mentioned in the previous post, days_since_prior_order is an important feature, but after deploying the model, for a future order we can calculate this value only if we have user’s last order date.

To handle this, I have assumed that all users placed their last order on 21–3–2021 (assuming that is the time of deployment ) , and for any future order , I can calculate the difference in days , thus giving us days_since_prior_order feature.

Note:

This change introduced new challenges, such as there will be users , who never made a purchase in a window of ’n’ days where n ={0,1,…}. and we don’t have some misc. features ( specifically p_days_since_prior_order_reorder_rate, u_days_since_prior_order_reorder_rate and days_since_prior_reorder_rate , refer last post) as those combination might not exist in our training set.

For such cases, features depending on days_since_prior_order in misc. features are set to 0.0

Solutions for Cold Start Problems

Prediction for new users

We will now generate a pickle file with top 10 products for each hour for any given day of week

Top 10 products of 5th hour of 5th day of week

This will be used in cold start problem of New User

User last order date

Now that we have established the importance of days_since_prior_order feature , so we need to generate a file containing last order date of every user. Which in this case is 21–3–2021.

user_last_purchase

Build the Pipeline

  • Get user id as Input
  • Get current time and day of week using python datetime package.
Get current time and day of week

We will use python datetime package to extract Order_hour_of_day and Order_dow from current time

  • Read user_last_purchase.pkl
  • If user is not in user_last_purchase.pkl, i.e. a new user, then read top10_products.pkl
New user predictions

If the user doesn’t exist in user_last_purchase.pkl, we will select 10 products from top10_products.pkl based on Order_hour_of_day and Order_dow.

  • If it is an existing user, calculate days_since_prior_order
days_since_prior_order
user_last_order_date = ulp[ulp['user_id']==user_id]['date'].values.tolist()[0]days_since_prior_order = today - int(user_last_order_date.split('-')[-1])

del ulp, now, today, dt_string, user_last_order_date
  • Read all files which were built while generating features
read all necessary files

Filter these files based on user ID, order_dow, order_hour_of_day and days_since_prior_order.

Note:

Since there are 206209 users in total and entire training set was of shape (8474661 , 32). So, we saved intermediate files, which we will filter based on user ID and merge them to generate features at the run time.

  • Handle features based on days_since_prior_order feature

As discussed in cold start section, there might be cases when, we don’t have some misc. features ( features based on days_since_prior_order) as those combination might not exist in our training set. We will generate a new rows and set those values to 0.0

  • Inner join based on user ID
  • User these features to predict the reorder probability

First we load the model, then predict using the model. Finally , use F1- Maximization to get most probable products for this user

This entire pipeline is wrapped inside a function get_recommendations()

Build Web-API

Image Source

A web API allows for information to be manipulated by other programs over the network. Flask is a web framework for Python, which helps in building web applications.

We build simple a Web-API now.

Code Walkthrough

  • Import all necessary libraries
  • Import get_recommendations from get_predictions.py → This will give recommendations based on user ID
  • Create a Flask object with name app with variable __name__ , which will be accessed by __main__ .
  • If the homepage is accessed i.e. URL with (‘/’), then the decorator @app.route(’/’) will execute home() function and flask with render index.html → a html file for homepage. By default, a route only answers to GET requests.
  • Once user enters the User ID ( new / existing), we will accept this request as a dictionary in our predict function.
  • predict function generates recommendations using get_recommendation function which is explained above.
  • From this point we will redirect to different webpages if user is new or existing.
  • Since predict is decorated with @app.route(’/predict’, method = ['POST]) , we will redirect to new_user_recommendation.html if its a new user or to predict.html if its the existing user and post the data in form of dictionary.
  • These webpages will display the recommendations.

I deployed this application locally at 0.0.0.0/8000

Predictions

local deployment of the model
  • The landing page
homepage
  • For existing user, say User_ID = 206209
Recommendation for Existing User

The recommendations are based on user’s previous purchases

  • For a new user, say User_ID = 226172
Recommendation for New user

These recommendations are based on most frequently bought items at this hour, on this day of week.

Future Work

  • Deploy this application on a remote server using AWS.
  • Display Images of products along with the names instead of names alone.
  • To find an end to end Deep Learning solution for this problem.
  • Extend this solution, to provide even more recommendations , such as for each product from the recommendations, suggest an item which was most frequently purchased with it . This can be done using Apriori Algorithm.

End Notes

This marks end of the case study. Right from understanding the case study to deployment, I tried to put in maximum information with crisp code snippets.

Feel free to reach out, to discuss more on this . I’d be happy to receive feedback.

If you want to check out whole code , please refer my Github repo below.

You can connect with me on LinkedIn

References

--

--

Arun Sagar

Deep Learning Engineer with particular focus on applications of Computer Vision and Autonomous vehicles