FinMan - An Android OCR Application

We will update our documentation here.

Abstract:

FinMan is an android application for managing you personal finance similar to one by mint.com. But what we intended to provide in addition to basic features is the ability to automatically add bill items from the photo of the bill taken by an android mobile.

We used google's Tesseract Engine for OCR and OpenCV for image preprocessing.

Note: We will first try to describe the aim of our project and later what we could practically achieve. We have also hosted our code on github.

Report

We have divided our project into three modules (which were done independently which we planned to integrate later)

1. Preprocessing the Image

Check the code to this part of our application here

Aim: To give binary image of bill to tesseract as input
The main challenge involved is to find the four corners of the bill and do the perspective transformation.
Input Image
Fig. 0. Input Image
Input Image
Fig. 1. Converting the given image into grayscale
Adaptive Thresholding
Fig. 2. Background in the image is removed by Adaptive Thresholding
Morphological Close - Find closed Regions
Fig. 3. A Morphological close operation is done to find the closed regions. (One of the regions is the bill)
Largest Countour
Fig. 4. The bill i.e., the largest contour is found by using cvFindContours()
Hough Transform
Fig. 5. Hough Transform is then run on outline of bill for line detection
Quadrilateral
Fig. 6. Best approximation for four lines of the quadrilateral are found.
Corner Detection
Fig. 7. Four corners are found by intersection of four lines
Perspective Correction
Fig. 8. Perspective Transformation is done using four points
Difference of Gaussians
Fig. 9. Illumination correction - Difference of Gaussians
Normalization
Fig. 10. Illumination correction - normalization
Inversion of Image
Fig. 11. Illumination correction - Inversion of resultant image

2. OCR and Feature Extraction

Objective of main.cpp is to extract the Total of the bill, Items and their prices from the processed binary image from imp.cpp.

Challenges

3. Android Application:

It involves designing and developing UI. Building an android application involves a steep learning curve. We need to first understand its API and usage.

Then integrating native C++ code into Java based android using JNI (Java Native Interface). Thankfully android simplifies this process for you by providing you with Android NDK (Native Development Kit) which would compile the native code and link it with you main android application.

Diagram describing basic workflow of our application:
Android App Welcome Screen
Fig. 14. Android Welcome Screen
Camera View
Fig. 16. Image Captured from Camera
Add Item
Fig. 17. Details from C++ struct (hard-coded) (ideally would be resulting from processing image)
List of Items
Fig. 18. List of Past Bills
Status of the project:

We could complete the image processing code and extract features from image of the bill using the C++ code. We also made the android app which could capture image send it to a C++ program and take a C++ struct with all details of bill filled in.
But we couldn't integrate the former C++ code with android using JNI as it uses library version of tesseract which couldn't be compiled on an android platform.

What went Wrong?

We intitially worked on tess-two, a fork of Tesseract Tools for Android, made an sample android app following guidelines from here

Later when we divided the work we used native tesseract-ocr engine and thought of integrating it with Java using JNI. But after the completion of the individual modules, found that it is not as simple as we expected and couldn't be done in time limit

Suggestion for next teams: Directly use tess-two (instead of tesseract) though it is a little imperfect

Team & Work Division:
1. Image pre-processing -
Sriram Bhargav & Rajesh Rao
2. OCR and feature extraction -
Varun Reddy & Vinod Reddy
3. Android application, UI and JNI integration -
Hasan Kumar & Tarun

Note: This is a project report for CS663 Digital Image Processing course under Prof.Sharat Chandran

Finally we would like to conclude by saying that it has been a wonderful experience past few weeks and each of us have learnt a lot in the process


Thanks,
Team FinMan

References:
  1. Overview of Tesseract OCR Engine http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseracticdar2007.pdf
  2. Android API Guide http://developer.android.com/guide/components/index.html
  3. Most of the image pre-processing techniques i.e., Illumination correction (difference of gaussians, normalization) are figured while experimenting with gimp and then looked up for corresponding OpenCV functions.
We also thank various blogs on internet which helped us build our android app UI and helped us resolve all the issues, bugs  we faced quickly