FinMan - An Android OCR Application

FinMan is an android application for managing you personal finance similar to one by mint.com. But what we intended to provide in addition to basic features is the ability to automatically add bill items from the photo of the bill taken by an android mobile.

Note: We will first try to describe the aim of our project and later what we could practically achieve. We have also hosted our code on github.

Report

We have divided our project into three modules (which were done independently which we planned to integrate later)

1. Preprocessing the Image

Check the code to this part of our application here

Aim: To give binary image of bill to tesseract as input
The main challenge involved is to find the four corners of the bill and do the perspective transformation.

Fig. 2. Background in the image is removed by Adaptive Thresholding

Morphological Close - Find closed Regions — Fig. 3. A Morphological close operation is done to find the closed regions. (One of the regions is the bill)

Largest Countour — Fig. 4. The bill i.e., the largest contour is found by using cvFindContours()

Fig. 5. Hough Transform is then run on outline of bill for line detection

Quadrilateral — Fig. 6. Best approximation for four lines of the quadrilateral are found.

Corner Detection — Fig. 7. Four corners are found by intersection of four lines

Perspective Correction — Fig. 8. Perspective Transformation is done using four points

Fig. 9. Illumination correction - Difference of Gaussians

Normalization — Fig. 10. Illumination correction - normalization

Inversion of Image — Fig. 11. Illumination correction - Inversion of resultant image

2. OCR and Feature Extraction

Objective of main.cpp is to extract the Total of the bill, Items and their prices from the processed binary image from imp.cpp.

Challenges

OCR of an image by tesseract is not inline i.e. Words seperated a bit farther may be outputed in different lines by tesseract. This is especially problematic for bills as items and prices are seperated by large spaces.
Way to tackle this is to send each text line at a time to tesseract.
Distinguish between sections of image related to items, price and those of not.
We used presence of price (X*.XX, assumption mentioned below) as a token to detect the items section.
Have to remove salt and pepper noise in the image to detect text lines properly. Used median filter to smooth.
To vertical lines in the image as they may be interpreted as "l" by tesseract if we send that section to it.
Tesseract is not accurate with its recognition of its digits because of varying intensities of characters
Examples : "." is mistaken sometimes as "," or "-" or "_" or " ' " etc and 0 as D or O.
Way to tackle to add these error tolerant codes.

Line By Line Parsing — Fig. 12. Line by Line parsing

<pre>
Output for above image (Fig.11) is

2 Carlsbem Bottle 16. 00
3 Heineken Draft Standard 24. 60��
1 Heineken Draft Half Liter 15. 20�
2 Carlsberg Bucket (5 bottles) 80.00��
3 Sirloin Steak 96.00x
1 Coke 3.500�
5 Ice Cream 18.00y

Total items 7
Total cost 376.40
</pre> — Fig. 13. Output for Fig.11

Finally this image stores all the above information in the struct.

Assumptions

In the items, quantity, price section we assume price to occupy last column and quantity to be middle column if present.
We assume price to be of the format X*.XX where X is digit.

3. Android Application:

It involves designing and developing UI. Building an android application involves a steep learning curve. We need to first understand its API and usage.

Then integrating native C++ code into Java based android using JNI (Java Native Interface). Thankfully android simplifies this process for you by providing you with Android NDK (Native Development Kit) which would compile the native code and link it with you main android application.

Diagram describing basic workflow of our application:

Android App Welcome Screen — Fig. 14. Android Welcome Screen

Camera View — Fig. 16. Image Captured from Camera

Add Item — Fig. 17. Details from C++ struct (hard-coded) (ideally would be resulting from processing image)

List of Items — Fig. 18. List of Past Bills

Status of the project:

We could complete the image processing code and extract features from image of the bill using the C++ code. We also made the android app which could capture image send it to a C++ program and take a C++ struct with all details of bill filled in.
But we couldn't integrate the former C++ code with android using JNI as it uses library version of tesseract which couldn't be compiled on an android platform.

What went Wrong?

We intitially worked on tess-two, a fork of Tesseract Tools for Android, made an sample android app following guidelines from here

Later when we divided the work we used native tesseract-ocr engine and thought of integrating it with Java using JNI. But after the completion of the individual modules, found that it is not as simple as we expected and couldn't be done in time limit

Suggestion for next teams: Directly use tess-two (instead of tesseract) though it is a little imperfect

Note: This is a project report for CS663 Digital Image Processing course under Prof.Sharat Chandran

Finally we would like to conclude by saying that it has been a wonderful experience past few weeks and each of us have learnt a lot in the process

References:

Overview of Tesseract OCR Engine http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseracticdar2007.pdf
Android API Guide http://developer.android.com/guide/components/index.html
Most of the image pre-processing techniques i.e., Illumination correction (difference of gaussians, normalization) are figured while experimenting with gimp and then looked up for corresponding OpenCV functions.

We also thank various blogs on internet which helped us build our android app UI and helped us resolve all the issues, bugs we faced quickly

FinMan - An Android OCR Application

Abstract:

Report

1. Preprocessing the Image

2. OCR and Feature Extraction

3. Android Application: