FONT DETECTOR - INTUIT INC

Title:

FONT DETECTOR

Document Type and Number:

WIPO Patent Application WO/2018/056973

Kind Code:

Abstract:

The present disclosure includes techniques for selecting a candidate presentation style for individual documents for inclusion in an aggregate training data set for a document type that may be used to train an OCR processing engine prior to identifying text in an image of a document of the document type. In one embodiment, text input corresponding to a text sample in a document is received, and an image of the text sample in the document is received. For each of a plurality of candidate presentation styles, an OCR processing engine is trained using a training data set corresponding to the given candidate presentation style, and the OCR processing engine is used, as trained, to identify text in the received image. The OCR processing results for each candidate presentation style are compared to the received text input. A candidate presentation style for the document is selected based on the comparisons.

Inventors:

KRIVOPALTSEV EUGENE (US)
MADDIKA SREENEEL K (US)
YELLAPRAGADA VIJAY (US)

Application Number:

PCT/US2016/053007

Publication Date:

March 29, 2018

Filing Date:

September 22, 2016

Export Citation:

Click for automatic bibliography generation Help

Assignee:

INTUIT INC (US)

International Classes:

G06V30/224; G06V30/40; G06K7/10; G06K19/08; G06K21/04; G06V30/10

Foreign References:

US20100246963A1	2010-09-30
US20100023511A1	2010-01-28
US20100115097A1	2010-05-06
US20090263019A1	2009-10-22
US20120134589A1	2012-05-31

Attorney, Agent or Firm:

PATTERSON, B. Todd et al. (US)

Download PDF:

View/Download PDF PDF Help

Claims:

What is claimed is:

1 . A computer-implemented method for creating an aggregate training data set corresponding to a document type, the method comprising:

for each document in a set of documents sharing the document type:

receiving input corresponding to a text sample in the given document, receiving an image of the text sample in the given document, for each of a plurality of candidate presentation styles:

training an OCR processing engine using a training data set corresponding to the candidate presentation style,

identifying text in the received image and producing OCR processing results using the OCR processing engine as trained,

comparing the OCR processing results to the received input, and

calculating a score for the candidate presentation style based on the comparison,

ranking the candidate presentation styles based on the calculated scores, and

selecting a candidate presentation style for the document based on the ranking, and

creating an aggregate training data set for the document type that includes the training data sets corresponding to the selected candidate presentation styles.

2. The method of claim 1 , wherein the input is the actual text in the text sample in the given document.

3. The method of claim 1 , further comprising determining the time required to identify text in the received image and produce OCR processing results for each candidate presentation style.

4. The method of claim 3, wherein the candidate presentation styles are additionally ranked based on the determined times for the candidate presentation styles.

5. The method of claim 1 , wherein calculating a score for the given candidate presentation style based on the comparison comprises calculating the Levenshtein distance between the OCR processing results and the received input.

6. The method of claim 1 , further comprising, after creating an aggregate training data set for the document type:

receiving a new image of a new document,

receiving a new input classifying the new document as the document type, training the OCR processing engine using the created aggregate training data set, and

identifying text in the received new image and producing new OCR

processing results using the OCR processing engine as trained.

7. The method of claim 1 , wherein the text sample comprises one or more lines of text in the given document.

8. The method of claim 1 , wherein the candidate presentation styles each correspond to a text font and a text size.

9. A non-transitory computer-readable storage medium containing instructions that, when executed by one or more processors, perform an operation for creating an aggregate training data set corresponding to a document type, the operation comprising: