Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS FOR RECOGNIZING TEXT IN A DIGITAL IMAGE
Document Type and Number:
WIPO Patent Application WO/2008/065520
Kind Code:
A2
Abstract:
The present invention concerns a method and an apparatus for recognizing text in an image, wherein said image represents a subject (4), comprising the steps of: (a) processing a digital image (3) to obtain a first modified image (3A); (b) performing Optical Character Recognition (OCR) on said first modified image (3A) to determine at least one text-containing region (3B) within said modified image (3A) and associating a first reliability value (C1) to said first region (3B); (c) converting said recognized text into a first voice signal (S1); wherein said processing step (a) comprises the additional step of automatic correction of the graphic parameters of said image (3), said graphic parameters including brightness, contrast, saturation and/or equivalent color spaces.

More Like This:
Inventors:
GREGNANIN MARCO (IT)
Application Number:
PCT/IB2007/003687
Publication Date:
June 05, 2008
Filing Date:
November 29, 2007
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ITEX DI MARCO GREGNANIN (IT)
GREGNANIN MARCO (IT)
International Classes:
G06V10/20; G06V30/142; G06V30/10
Domestic Patent References:
WO1997030415A11997-08-21
WO2002025575A22002-03-28
Foreign References:
US20050071167A12005-03-31
US6169882B12001-01-02
Attorney, Agent or Firm:
CICERI, Fabio et al. (Piazza San Babila 5, Milan, IT)
Download PDF:
Claims:

CLAIMS

1. A method for recognizing text within an image (3), wherein said image (3) represents a subject (4), comprising the steps of:

(a) processing a digital image (3) to obtain a first modified image (3A); (b) performing Optical Character Recognition (OCR) on said first modified image (3A) to determine at least one text-containing region (3B) within said modified image (3A) and associate a first reliability value (Cl) to said first region (3B);

(c) converting said recognized text into a first voice signal (Sl); wherein said processing step (a) comprises the additional step of automatic correction of the graphic parameters of said first modified image (3A), said graphic parameters including brightness, contrast, saturation and/or equivalent color spaces.

2. A method for recognizing text within an image as claimed in claim 1, comprising the additional steps of:

(d) comparing said first reliability value (Cl) with a preset reliability value (C2); (e) selecting at least one recognition strategy (Xi... X 7 ) from a plurality of recognition strategies, if said first reliability value (Cl) is lower than said preset reliability value (C2), each of said plurality of recognition strategies (Xi ... Xy) being adapted to adjust the graphic and geometrical/perspective parameters of said first modified image (3A) to generate a second modified image (3C) whereby said first value (Cl) can be increased. 3. A method for recognizing text within an image as claimed in claim 2, wherein in said method the steps (e), (b) and (c) are repeated until said first reliability value (Cl) is close to or higher than said preset reliability value (C2).

4. A method for recognizing text within an image as claimed in any one of claims 1 to 3, wherein a first recognition strategy (Xi) includes the step of: (f) selecting a class for said subject (4) from a predetermined list, said list including

subjects that can be associated to any cylindrically, spherically or conically-shaped geoznetric primitive, or other subjects that can be associated to any single, double and/or composite perspective primitive having the shape of a parallelepiped.

5. A method for recognizing text within an image as claimed in claim 4 wherein, if said subject (4) belongs to the class of cylindrically, spherically and/or conically shaped subjects, said step (f) comprises the additional step (f.l) of applying an inverse cylindrical transformation function or a spherical and/or conical function (Xi i) to said text region (3B).

6. A method for recognizing text within an image as claimed in claim 4 or 5, wherein said step (f) comprises the additional step (f.2) in which a second voice signal (S2) is generated, specifying a predetermined angular rotation, to impart a predetermined angular rotation about the vertical axis of said subject (4).

7. A method for recognizing text within an image as claimed in claim 4 wherein, if said subject (4) belongs to the class of objects that can be associated to single, double and/or composite perspective primitives having the shape of a parallelepiped, said step (f) comprises the additional step (f.3) of applying an inverse perspective function (X 1 2 ) to said text region (3B).

8. A method for recognizing text within an image as claimed in any one of claims 4 to 7, wherein said step (f) comprises the additional step (f.4) in which an interpolation function (Xj 3) is applied, whereby the native resolution of said first modified image (3A) is increased to a resolution that is higher to the native resolution of said first modified image (3A). 9. A method for recognizing text within an image as claimed in any one of claims 4 to 8, wherein said step (f) comprises the additional step (f.5) of applying a first vector or graphic filter (Xi 4), of the edge-erosion type to said text-containing region (3B).

10. A method for recognizing text within an image as claimed in any one of claims 4 to 9, wherein said step (f) comprises the additional step (f.6) of applying a second vector or graphic filter (Xi 5), of the maximum contrast type to said text-containing region (3B).

11. A method for recognizing text within an image as claimed in any one of claims 4 to 9, wherein said step (f) comprises the additional step (f.7) of applying a third vector or graphic filter (Xi 6 ), of the minimum contrast type to said text-containing region (3B).

12. A method for recognizing text within an image as claimed in any one of claims 4 to 11, wherein said step (f) comprises the additional step (f.8) of applying a fourth vector or graphic filter (Xi 7 ), having a first threshold value to filter pixels from white into black in said text- containing region (3B).

13. A method for recognizing text within an image as claimed in any one of claims 4 to 11, wherein said step (f) comprises the additional step (f.9) of applying a fifth vector or graphic filter (Xi β), having a second threshold value to filter pixels from black into white in said text- containing region (3B).

14. A method for recognizing text within an image as claimed in claims 10 to 13, wherein said threshold value and/or said maximum and minimum contrast values can be determined as a function of the preset reliability value (C2). 15. A method for recognizing text within an image as claimed in any one of claims 4 to 14, wherein said step (!) comprises the additional step (f .10) in which an interpolation function is applied to decrease the native resolution of said first modified image (3A).

16. A method for recognizing text within an image as claimed in any one of claims 1 to 4, wherein a second recognition strategy (X 2 ) comprises the step of interpolating the native resolution of said first modified image (3A) to generate said second modified image (3C) having a higher or lower resolution than the native resolution of said first modified image (3A).

17. A method for recognizing text within an image as claimed in any one of claims 1 to 4 or 16, wherein a third recognition strategy (X 3 ) comprises the step of applying said first vector or graphic filter (Xi 4 ) to said text-containing region (3B). 18. A method for recognizing text within an image as claimed in any one of claims 1 to 4 or 16

to 17, wherein a fourth recognition strategy (X 4 ) comprises the step of applying said second vector or graphic filter (Xi 5 ) of the maximum-contrast type to said text-containing region (3B).

19. A method for recognizing text within an image as claimed in any one of claims 1 to 4 or 16 to 18, wherein a fifth recognition strategy (X 5 ) comprises the step of applying said third vector or graphic filter (Xi 6 ) of the minimum-contrast type to said text-containing region (3B).

20. A method for recognizing text within an image as claimed in any one of claims 1 to 4 or 16 to 19, wherein a sixth recognition strategy (X 6 ) comprises the step of applying said fourth vector or graphic filter (Xi 7 ) having a first threshold value for filtering pixels from white into black of said text-containing region (3B).

21. A method for recognizing text within an image as claimed in any one of claims 1 to 4 or 16 to 20, wherein said sixth recognition strategy (X 6 ) comprises the step of applying said fifth vector or graphic filter (Xi s) having a second threshold value for filtering pixels from black into white of said text-containing region (3B). 22. A method for recognizing text within an image as claimed in claims 18 and 21, wherein said threshold value and/or said maximum and minimum contrast values of said fourth vector or graphic filter (Xi 7 ) and said fifth vector or graphic filter (Xj $) can be determined as a function of the preset reliability value (C2).

23. A method for recognizing text within an image as claimed in any one of claims 1 to 4 or 16 to 22, wherein a seventh recognition strategy (X 7 ) comprises the step of obtaining a weighted average of the pixels in the neighborhood of a fixed- and/or variable size area of said text- containing region (3B).

24. A method for recognizing text within an image as claimed in any one of claims 1 to 4 or 16 to 22, wherein a seventh recognition strategy (X 7 ) comprises the step of sampling the image at fixed and/or variable intervals in the neighborhood of a fixed- and/or variable size area of said

text-containing region (3B).

25. A method for recognizing text within an image as claimed in claim 23 or 24 wherein, according to said seventh strategy (X 7 ) a third voice signal (S3) is generated to suggest a suitable spatial displacement of said subject (4) to a user. 26. A method for recognizing text within an image as claimed in claim 4, wherein said subject (4) acquired in said image (3), that can be associated to said geometric primitives is selected from the group comprising bottles, cans, pots.

27. A method for recognizing text within an image as claimed in claim 7, wherein said subject (4) acquired in said image (3), that can be associated to said perspective primitives having the shape of a parallelepiped, is selected from the group comprising single — sided books, double- sided books, brochures, newspapers, magazines, envelopes, signs, synoptic tables.

28. A method for recognizing text within an image as claimed in any one of the preceding claims 1 to 27, wherein all the above strategies (from Xi to X 7 ) can be combined by merging/combination of a multitude of contiguous spatially and temporally spaced images (3A), generating, as a result, a single comprehensive image corresponding to the sum of all the previous images.

29. A method for recognizing text within an image as claimed in claim 1, wherein said processing step (a) comprises an additional step of correcting the perspective of said image (3B), if the shape of the subject 4 represented in the image (3A) is known beforehand. 30. An IT product designed to be directly loaded into the memory of a computer, comprising program code portions, and adapted to carry out the method as claimed in any one of claims 1 to

29, when it runs on said computer.

31. An apparatus for recognizing text within an image (3), wherein said image (3) represents a subject (4), said apparatus comprising: - a processing device (1) having means for processing a digital image (3) and for

obtaining a first modified image (3A);

- means (7, 7A) for acquisition and Optical Character Recognition (OCR) on said first modified image (3A) to determine at least one text-containing region (3B) within said modified image (3A), said means (7, 7A) being adapted to associate a first reliability value (Cl) to said first region (3B);

- means (8) for converting said recognized text into a first audible voice signal (Sl), said audible voice signal (Sl) being designed to be emitted through an audio interface (9) that is operably connected to said processing device (1); characterized in that said means (7, 7A) for acquisition and Optical Character Recognition (OCR) on said first modified image (3A) include an IT product as claimed in claim 30.

32. An apparatus for recognizing text within an image as claimed in claim 31, wherein said means (7) for acquiring said digital image are operably connected with said processing device

( D.

33. An apparatus for recognizing text within an image as claimed in any one of claims 31 to 32, wherein said processing means (8) are adapted to compare said first reliability value (Cl) with a preset reliability value (C2) and, as a result of such comparison, said means (8) are adapted to select at least one recognition strategy (Xi; ... X 7 ) from a plurality of recognition strategies if said first reliability value (Cl) is lower than said preset reliability value (C2), each of said plurality of recognition strategies (Xi; ... X 7 ) being adapted to adjust the graphic parameters of said first modified image (3A) to generate a second modified image (3C) until said first reliability value (Cl) is close to or higher than said preset reliability value (C2).

34. An apparatus for recognizing text within an image as claimed in any one of claims 31 to 33, wherein said processing device (1) includes a desktop computer, a notebook, a handheld computer, a PDA or a telecommunication system such as a cell phone.

Description:

Title: "Method and apparatus for recognizing text in a digital image"

DESCRIPTION The present invention relates to a method and an apparatus for recognizing text in a digital image, particularly a method and an apparatus as defined in the preamble of claims 1 and 31.

At the present time, most information is designed for visual acquisition. While current laws are increasing the amount of information available to the touch (see elevator buttons, Braille writings on drugs, tactile indications on airport or train station floors), there are still many fields in which a blind or visually-impaired user finds it difficult to reach the available information. Consider, for instance, beverage cans or bottles, food cans or other food or drug containers. In all these cases, a blind or visually impaired user can most likely need information that is only available to the sight, i.e. information that a visually impaired or blind user cannot easily decode. For instance, all cans have the same shape and about the same size, and only differ in their contents, which are extensively described all over the can surface; glass bottles may contain wine, water, oil, vinegar, spirits and plastic bottles of equal shape may contain trichloroethylene or other liquids; cans of equal shape may contain beans, tomato puree or tuna; cardboard boxes may contain pasta types having very different cooking times and finally drugs only have tactile information about the name of the medicine and the expiry date on the package: when the blister is removed therefrom, a blind or visually impaired user cannot properly decode such information from the blister shape.

It can be clearly ascertained from these examples that any misinterpretation of such visual information can have adverse consequences, from slight, such as eating overcooked pasta with a tuna sauce instead of beans, to poisoning caused by ingestion of a wrong medicine or a corrosive

liquid instead of simple water. To avoid these problems, blind or visually impaired users generally rely on the help of sighted people to read that important information for them, and give up external help for less relevant information, such as the trademark of a detergent or the cooking time of pasta. Nevertheless, it would be desirable to have a system that allows visually impaired and/or blind users, but also normally sighted users that can't read (dyslexic people, or people with learning disability) to be less dependent on the help of other people, and to also utilize less important information, which is anyway useful to improve life quality. There are known in the art digital image acquisition devices, OCR (Optical Character Recognition) programs that can convert the text contained in a digital image into a text code) and voice synthesizers that can read the OCR results.

Image acquisition devices, including scanners, (digital or analog) cameras, webcams, planetary scanners, digital video cameras or the like suffer from poor accuracy in the setting of image acquisition parameters; this often results in an out-of-focus and/or blurred, over- or underexposed image, with wrong framing, undesired reflexes, and color, contrast and/or depth- of-field calibration errors.

In addition to all these problems concerning the quality of the starting image, there also arises a problem in the shape of the subject to be shot which, considering for instance a can, might require even a normally sighted user to rotate the can about its axis to read the whole trademark name and get to know the can content. These images are currently unusable for an OCR system to obtain good results, as currently available OCR systems have been substantially developed to read documents containing black-and-white text, with regular and standard fonts (Arial, Times New Roman, Courier, etc.), with at least 10 to 12 dots, and aligned along the vertical or horizontal direction. Currently obtainable results suffer from limits in terms of practical reading of drug leaflets and/or similar elements. The above image framing, lighting and stability problems have been

addressed by the owner of this invention and at least partly solved by the device disclosed in the patent entitled "DISPOSITIVO DI ASSISTENZA PER IPOVEDENTI PER SCATTARE

FOTOGRAFIE" ("PICTURE-TAKING AID DEVICE FOR VISUALLY IMPAIRED"), filed for the Applicant hereof on 29 November 2006. In view of the prior art as described above, the object of the present invention is to provide a method and an apparatus for recognizing text within a digital image that is free from the prior art drawbacks.

According to the present invention, this object is fulfilled by a method as defined in claim

1 and an apparatus as defined in claim 31. Such subject is also fulfilled by an IT product to be loaded into the memory of a computer system and allowed to operate in such computer, for carrying out the method of the present invention.

The features and advantages of the invention will appear from the following detailed description of one practical embodiment, which is illustrated without limitation in the annexed drawings, in which:

- Figure 1 shows an apparatus according to one embodiment of the present invention;

- Figures 2A, 2B and 2C are graphic representations of the application of respective recognition strategies of the present invention;

- Figures 3A - 3E are graphic representations of the problems that can arise if the image that has been taken is not recognizable due to improper mutual positioning of the subject and the camera.

The following terms, as used herein, will have the following meanings.

The term "digital image" will be intended to mean an image in digital format, regardless of the way it was acquired the first time; while such acquisition advantageously occurs using a digital camera, other methods can be envisaged, as detailed in the introduction hereof. The

subject of the image will be clarified by the relevant context.

The term "subject" will be intended to mean any element that can be represented in a digital image, such as bottles, cans, pots but also single-sided books, double-sided books, brochures, newspapers, magazines, envelopes, signs, synoptic tables, TV screens, CRT / LCD / PLASMA monitors and displays, LED / OLED monitors and displays, etc.

The term "text within the image" is intended to indicate portions of the subject- representing digital image that contain graphic representations of text elements, i.e. either alphanumeric characters or other text characters in any language, including Asiatic, Arabian and Cyrillic languages, or other commercially known symbols, such as 0, ey, &% ( D, handwash, pure lambswool, etc.; as a rule, the "text within the image" may be deemed to include anything that can be associated to a character that a computer can read as "text" or to glyphs, including barcodes (either 2- or 3-dimensional barcodes).

The term "OCR means" is intended to indicate means that can analyze a digital image and recognize the presence and position of any text therein, to convert it into text. Bearing this in mind, with reference to the annexed figures, an apparatus 1 is show, for determining the text 2 contained in an image 3 that represents a subject 4. Such apparatus 1 comprises:

- a processing apparatus 5 having processing means 6;

- means for acquiring 7 a digital image 3, said processing means 6 being adapted to process the digital image 3 to produce a first modified image 3A;

- means for performing Optical Character Recognition (OCR) 7A on said first modified image 3A to determine at least one text-containing region 3B within said modified image 3A;

- means 8 for converting said text contained in the region 3B into a first voice signal Sl, said voice signal Sl being designed to be emitted through an audio interface 9 that is operably connected to said processing device 1.

The means for performing Optical Character Recognition 7A are embodied by a scanner or, advantageously, by a single device for both acquisition and recognition.

In the view of Figure 1, the means 7, 7A for acquisition and optical recognition are represented as a camera, but they can also consist of a webcam, a planetary scanner or a digital camera.

It shall be noted that, in these means 7, 7A for acquisition and/or Optical Character Recognition, the first region 3B is assigned a first reliability value Cl, i.e. a value to estimate the quality of the text contained in such first region 3B, as well as the position of such region 3B relative to the first modified image 3A. It shall be noted that the means 7, 7A for acquisition and/or Optical Character

Recognition (OCR) are operably connected with said processing device 1, e.g. by a Bluetooth communication protocol, USB, a Wi-Fi communication protocol, cable or wireless communication, etc.

Particularly, it shall be appreciated that, when a digital camera is used for acquisition of the digital image 3, in addition to transferring the information concerning such digital image 3, it can also transfer all information stored in a so-called "exif file" (or similar proprietary format), containing details about the camera, the shooting parameters, such as focusing distance, diaphragm, exposure time, focal length, etc. and storage configuration.

Particularly, the processing device 1 is a computer, such as desktop PC, a notebook, a handheld computer, a PDA or a telecommunication device, such as a cell phone.

Advantageously, a program is installed in such processing device 1, which can recognize the text contained in the region 3B of the modified image.

To this end, said processing means 6 are able to carry out the steps of:

(a) processing said digital image 3 to obtain a first modified image 3A; (b) performing Optical Character Recognition (OCR) on said first modified image 3 A to

determine at least one text-containing region 3B within said modified image 3A and associate a first reliability value (Cl) to said first region;

(c) converting said recognized text into a first voice signal S 1 ; wherein said processing step (a) comprises the additional step of automatic correction of the graphic parameters of said image, said graphic parameters including brightness, contrast, saturation and/or equivalent color spaces.

The graphic parameters may be also changed using the information contained in the "exif" parameters of said image 3 A.

It shall be noted that this automatic changing step may be also carried out by changing two or more of the above parameters, i.e. brightness, contrast and/or saturation or equivalent color spaces.

It shall be further noted that the processing step (a) comprises an additional perspective correction step if, for instance, the shape of the subject 4 represented in the image 3 is known beforehand, Also, the means 6 for processing the image 3 are also able to carry out the steps of:

(d) comparing said first reliability value Cl with a preset reliability value C2;

(e) selecting at least one recognition strategy Xj from a plurality "n" of recognition strategies, if said first reliability value Cl is lower than said preset reliability value C2.

It shall be noted that each of said plurality of recognition strategies Xj can adjust the graphic and geometrical/perspective parameters of said first modified image 3 A to generate a second modified image 3C whereby said first value Cl can be increased.

Advantageously, the above steps (e), (b) and (c) may be repeated until said first reliability value Cl is close to or higher than said preset reliability value C2.

It shall be noted that each recognition strategy Xi is able to adjust the graphic parameters of the first modified image 3 A to generate the second modified image 3C, repeating one or more

of said plurality of strategies X 1 until the reliability value C2 is reached or approximated.

In other words, once one or more of such strategies X 1 have been applied to the modified image 3A a reliability value C2 can be reached that is sufficiently high for the text contained in the text region 3B of the modified image 3 A to be read with a sufficient accuracy as to generate a user understandable voice signal S 1.

It shall be noted that the preset reliability value C2 may be set by the user or at the factory.

The above strategies X, for obtaining recognition of the text contained in the region 3B of the image 3A will be now described, still with reference to Figures 2A to 2D. A first strategy X 1 of the plurality of recognition strategies X 1 comprises the steps of:

(f) selecting a class for said subject 4 from a predetermined list, said list including subjects that can be associated to any cylindrically, spherically or conically-shaped geometric primitive, or other subjects that can be associated to any single, double and/or composite perspective primitive having the shape of a parallelepiped. It shall be noted that such class can be also selected manually by a user by setting the specific geometric primitive or prospective primitive with which the subject 4 acquired within the image 3 is associated.

Particularly, if the subject 4 belongs to the class of cylindrically, spherically and/or conically shaped subjects, according to the strategy Xi the step (f) comprises the additional step (f. l) of applying an inverse cylindrical, spherical and/or conical function Xi i to the text region 3B.

For instance, the subject 4 can be associated to these geometric primitives and is selected from the group including bottles, cans, pots and the like.

Therefore, still referring to Figure 2A, thanks to such inverse cylindrical transfoπnation function Xi i, the baselines of the text contained in such region 3B, which are curved due to the

perspective, are turned back, through said inverse transformation Xj. i, into straight lines 3D.

The step (f) comprises the additional step (f.2) of imparting a predetermined angular rotation about the vertical axis of said subject 4. Advantageously, a second voice signal S2 is generated to inform the user that the subject

4 has to be rotated.

If the subject 4 within the image 3 has two columns with a very low reliability (with a reliability value Cl much lower than the preset value C2), such subject 4 will most likely belong to a geometric primitive of the above list, wherefore such subject 4 has probably been acquired by the acquisition means 7 from the wrong side.

The audible signal S2 warns the user and proposes to impart the angular rotation about the vertical axis of the subject 4, such as a 90° rotation.

Furthermore, such rotation can be also applied to subjects 4 with more than two text areas 3B considering, for instance, only the two outermost text areas. Even when the subject 4 has one text area 3B only with lower reliability on one of the two sides, the subject 4 shall be likely rotated by a predetermined angle, for instance 90 degrees.

However, if the subject 4 within the image 3 has two columns with a medium-to-high reliability (i.e. with a reliability value Cl close to the preset value C2), such subject 4 will most likely belong to a geometric primitive of the above list, wherefore such subject 4 has probably been acquired by the acquisition means 7 from the right side.

Here, the subject 4 is as wide as the combined widths of the columns.

It shall be noted that the width of the subject 4 is determined as the width of a single column or the combined width of both columns.

It will be appreciated that, advantageously, during the steps (f.l) and (f.2), the method sets the inverse cylindrical transformation function Xu in the middle of the width and height of

the subject 4.

If the subject 4 belongs to the class of subjects that can be associated to single or double perspective primitives having the shape of a parallelepiped like that of a single-sided (see Figure

2B) or double-sided book (see Figure 2C), the step (f) comprises an additional step (f.3), to be implemented after or separately from the above described steps, during which an inverse perspective function Xi 2 is applied to said text region 3B.

Thanks to such inverse perspective function Xj, 2 , the baselines of the text contained in the region 3B, curved by the structure of their respective geometric primitive, are turned back, through such inverse transformation Xi, 2 into straight lines, as shown in the region 3D. The subject 4 that can be associated to said perspective primitives having the shape of a parallelepiped, is selected from the group comprising single-sided books, double-sided books, brochures, newspapers, magazines, envelopes, signs, synoptic tables.

The step (f) comprises an additional step (f.4), to be implemented after or separately from the above described steps, during which an interpolation function Xi. 3 is applied, whereby the native resolution of said first modified image 3 A is increased to generate a second modified image 3C having a higher resolution. Advantageously, this step (f.4) allows determination of small point size characters within said text region 3B.

The step (f) comprises an additional step (f.5) to be implemented after or separately from the above described steps, during which a first vector or graphic filter X] 4 is applied, of the edge-erosion type to the text-containing region 3B,

Advantageously, this step (f.5) allows deteπnination of bold characters within said text region 3B.

The step (f) comprises an additional step (f.6) to be implemented after or separately from the above described steps, during which a second vector or graphic filter X 1 ^ is applied, of the maximum contrast type to said text-containing region 3B.

Also, the step (f) comprises an additional step (f.7) to be implemented after or separately from the above described steps, during which a third vector or graphic filter Xi 6 is applied, of the minimum contrast type to said text-containing region 3B.

It shall be noted that said maximum or minimum contrast values of the steps (f.6) and/or (f.6) can be determined as a function of the preset reliability value C2.

The step (f) comprises an additional step (f.8) to be implemented after or separately from the above described steps, during which a fourth vector or graphic filter Xi 7 is applied, having a first threshold value to filter the pixels from white into black in said text-containing region 3B.

Furthermore, the step (f) comprises an additional step (f.9), to be implemented after or separately from the above described steps, during which a fifth vector or graphic filter Xi 8 is applied, having a second threshold value to filter the pixels from black into white in said text- containing region 3B.

It shall be noted that said threshold values of the steps (f.6) and/or (f.6) can be determined as a function of the preset reliability value C2. The step (f) comprises an additional step (f.lO), to be implemented after or separately from the above described steps, during which an interpolation function Xi 9 is applied, whereby the native resolution of said first modified image 3A can be increased to generate said second modified image 3 C having a lower resolution than the native resolution of said first modified image 3 A. Advantageously, this step (f .10) allows determination of very large point size characters within said text region 3B.

A second recognition strategy X 2 , to be implemented after or separately from the above described strategy Xi, comprises the step of applying the interpolation function Xi 3 to interpolate the native resolution of said first modified image 3 A to generate said second modified image 3C, the latter having a higher or lower resolution than the native resolution of said first

modified image 3A.

Further, a third recognition strategy X 3 may be provided, to be implemented after or separately from the above described strategies Xi and X 2 , which comprises the step of applying said first vector or graphic edge-erosion filter Xi 4 , to said text-containing region 3B, as described above.

A fourth recognition strategy X 4 , to be implemented after or separately from the above described strategies Xi to X 3 , comprises the step of applying said second vector or graphic maximum contrast filter X] 5 to said text-containing region 3B.

A fifth recognition strategy X 5 , to be implemented after or separately from the above described strategies Xi to X 4 , comprises the step of applying said third vector or graphic minimum contrast filter Xi 6 to said text-containing region 3B.

It shall be noted that said maximum and/or minimum contrast values can be determined as a function of the preset reliability value C2.

A sixth recognition strategy X 6 , to be implemented after or separately from the above described strategies Xi to X 5 , comprises the step of applying said fourth vector or graphic filter Xi η, having a first threshold value to filter the pixels from white to black in said text-containing region 3B.

Alternatively, the sixth recognition strategy X 6 comprises the step of applying said fifth vector or graphic filter Xi g, having a second threshold value to filter the pixels from black to white in said text-containing region 3B.

It shall be noted that said threshold value for filtering the pixels from black to white and/or from white to black can be determined as a function of the preset reliability value C2.

It shall be noted that any graphic and/or vector parameter of said modified image 3 A can be determined as a function of the preset reliability value C2. It shall be noted that said at least one text-containing region 3B within said modified

image 3 A, if said first reliability value Cl is close to zero coincides with said modified image 3A.

A seventh recognition strategy X 7 to be implemented after or separately from the above described strategies Xi to X 6 , may be advantageously used when the text 3B to be recognized is contained in an image 3A reproduced by TV screens, CRT / LCD / PLASMA monitors and displays, LED / OLED screens and displays, etc, comprises the step of obtaining a weighted average of the pixels in the neighborhood of a fixed and/or variable size region.

Particularly, it shall be noted that the fixed size area may be equal to the distance between two pixels that form the text 3B, whereas the variable size area is connected to the reliability value C2 that can be deduced by implementing one of the above strategies Xi to X 6 .

In other words, by the seventh strategy X 7 , the gap between the pixels (or similarly, the gap between the pixels of different colors) is filled by the weighted average.

Alternatively, this strategy X 7 can be also implemented by sampling the image 3B at fixed and/or variable intervals, or by lowering the resolution. It should be noted that the user, due to its impaired sight and/or other disabilities, cannot ascertain whether the image 3A has been properly acquired, i.e. whether the subject 4 and the acquisition means 7 have such relative positions that the text region 3B to be recognized within the image 3A is actually contained in the image.

This problem may impair the results obtained with the above described process. To obviate this drawback, any image 3 acquired by the acquisition means 7 has a feedback associated thereto, that can be heard by the transmission of an audible signal S3, to inform the user of whether or not the image 3 A has been correctly acquired.

To this end, the following situations may be provided, still with reference to Figures 3A to 3E: - in Figure 3A, if the text region 3B to be recognized is part of the image 3A and such

region 3B is properly positioned relative to a grid 10, an audible signal S3 is emitted, to inform the user that the image 3A has been correctly acquired by the acquisition means 7;

- in Figure 3B, if the text region 3B to be recognized is close to the edge of such grid 10 beyond a predetermined threshold value δ for one of the sides of said grid 10, the audible signal S3 is emitted, to inform the user that the subject 4 has to be displaced away from the position of the acquisition means 7;

- in Figure 3C, if the text region 3B to be recognized, in addition to exceeding the predetermined threshold δ on one side, also exceeds the half and/or the three quarters of the grid 10 of the other side on the left, an audible signal S3 is emitted, to inform the user that the subject 4 has to be moved to the right, away from the position of the acquisition means 7;

- in Figure 3D, if the text region 3B to be recognized is close to two opposite sides of the grid 10, an audible signal S3 is emitted, to inform the user that the subject 4 has to be moved away from the position of the acquisition means 7;

- in Figure 3E, if the text region 3B to be recognized is close to two contiguous sides of the grid 10, an audible signal S3 is emitted, to inform the user that the subject 4 has to be moved away from the position of the acquisition means 7.

Also, if the recognized text 3B is close to three or four sides of the grid 10, an audible signal S3 is emitted, to inform the user that the subject 4 has to be moved away from the position of the acquisition means 7. Finally, it should be noted that all the strategies Xi to X 7 as described above may be advantageously combined using known merging techniques, for merging/combining a multitude of spatially and temporally contiguous images, generating, as a result, one comprehensive image corresponding to the sum of all the previous images, so that the subject can be acquired as a whole. As clearly shown in the above description, the method and apparatus of the invention

fulfill the needs as set out in the introduction of this disclosure and obviate the drawbacks of prior art methods and apparatus.

Those skilled in the art will obviously appreciate that a number of changes and variants may be made to the arrangements as described hereinbefore to meet specific needs, without departure from the scope of the invention, as defined in the following claims.