TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images [Part-2]

Arun Sagar
6 min readJul 21, 2021

--

This is a 2-part series on end to end case study based TableNet research paper.

In the last post we discussed Pytorch implementation of TableNet, refer Part 1.

Table of Contents

  1. Post EDA of the solution
  2. Fixing Image Problems and Re-Training
  3. Improving model predictions using OpenCV2
  4. OCR predictions
  5. Deployment
  6. Future Work
  7. End Notes
  8. References

Post EDA of the solution

Looking at evaluation metrics to judge the model behavior is not enough. We should be able to answer questions like:

→Are we able to explain the outputs of Model w.r.t. Input ? ( Explainability is highly effective in classification and regression models)

→ Can the model performance be improved ?

→ Can we improve the data to have better performing model ?

→ Can we somehow know, what kind of data scores higher by the model ?

To answer these questions, we need to look at our training data and categorize the input data into Bad, Good and Best data. As in real world, we wont be able to get test data with similar distribution to train data, Thus we do Post Training EDA on train data . To do this we will predict table and column masks from the model and rank them / categorize them based on F1 Scores.

For this purpose , we only use Table F1 Scores as benchmark.

Lets pick a threshold 0.5 and 0.85 for categorizing the images. After plotting the scores, we see, there are images which have F1 Score of 0.0.

Bad Images [ Threshold : 0.0–0.5]

We see that there are 3 images in Bad predictions / Images category

Bad Image 1
Bad Image 2
Bad Image 3

Good Images [ Threshold : 0.5–0.85]

Many Images fall in this category, below are 2 of them

Best Images [ Threshold : >0.85]

Many Images fall in this category, below are 2 of them

Observations

  • From above Images, we can see that Bad / worst predictions are given by images with colored tables. Model didn't predict any thing and F1 score is close to 0.0. There are very few images in the dataset which have colored tables .
  • Good predictions come from those images which predicted good Table mask, but it also predicted columns in the table where in actual there were no columns.
  • Best Predictions are images which helped model learn table and column boundaries even without line demarcations

Fixing Image Problems

We have 2 options, which might improve model performance,

  • Remove colored images, or [ Problem : Data reduction is an issue here as we already have less data]
  • We can have uniform data by converting all images to grayscale first and then increase the number of channels in preprocessing , and Train model again.
Code for approach 2

After following the second approach and fixing the dataset, the model was trained again.

Evaluation metrics

Unfortunately, model performance didn't increase. We see same performance metrics before fixing the dataset.

But lets take a look at the Bad Images from previous section and see if anything has improved .

Re-Evaluating Bad Images from previous section

Output of Bad Image 1
Output of Bad Image 2
Output of Bad Image 3

A significant increase in F1 Scores can be seen here. From no predictions for Table and column on colored table images, we managed to get F1 score on both table and column to 0.92. These Images can now be categorized under Best images.

Lets look at Bad images according to our new model

Only 1 Bad Image

We can see Only 1 Bad Image below threshold and 2 good prediction / images. Rest seems to be in Best predictions Category.

Bad Predictions / Images

Lowest F1 score is around 0.37 , only 1 bad image.

It is not a good idea to conclude the pattern of Bad prediction images when we have only 1 image. But it seems, the input image , has no proper line demarcations which would lead to Table structure, that's why wherever model sees a line demarcations, it assumes that there is a table in that area.

We can now say, even though new model didn't improve in terms of performance metrics, but it improved the learning and predictions.

Improving model predictions using OpenCV2

We can still see uneven boundaries of predicted table and column masks. In some cases, Table mask predictions are not even filled inside. If we directly crop the mask portions of the image to get Table, we might lose some information. Not to mention, there are other areas with activations in the predicted table mask (which are not tables).

To solve these issues, we will use contours from classical image processing techniques.

Basic Idea :

  • Get contours around the activation from the predicted table mask.
  • Remove contours which cant be rectangle / small patch of activations.
  • Get bounding coordinates of the remaining contour.
  • Repeat the same process with Column Masks

Below code applies the process on both table and column masks and returns Table and Column Coordinates.

Fix Table and Column Masks

Lets look at the outputs →

Step 1 : we first get predictions from the model,

Masks prediction from the model.

Step 2: Then we pass our predicted mask to the fixMasks() function

Left : Fixing Table Masks , Right: Fixing Column Masks

OCR predictions

After getting table bounding boxes, Pytesseract OCR is applied on each tables, and Output is saved to a dataframe

Pytesseract Prediction Code

Here are the final Outputs for each Table detected in the previous section

Deployment

We will deploy this new model locally using Streamlit API. It is an open-source Python library to create custom web app for machine learning and deep learning projects.

Future Work

  • Deploy this application on a remote server using AWS /StreamLit sharing/heroku.
  • Model Quantization for faster inference time.
  • Train for more epochs and compare the performances.
  • Increase data size by adding data from ICDAR 2013 Table recognition dataset.

End Notes

This marks end of this case study. I tried to put in maximum information with crisp code snippets for every stage of this case study.

Feel free to reach out, to discuss more on this . I’d be happy to receive feedback.

If you want to check out whole code , please refer my Github repo below.

You can connect with me on LinkedIn

References

  1. TableNet research paper
  2. Applied AI
  3. 7 tips for squeezing maximum performance from pytorch
  4. StreamLit.io

--

--

Arun Sagar
Arun Sagar

Written by Arun Sagar

Deep Learning Engineer with particular focus on applications of Computer Vision and Autonomous vehicles

Responses (1)