Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD OF AND SYSTEM FOR ENHANCING AN IMAGE
Document Type and Number:
WIPO Patent Application WO/2019/058121
Kind Code:
A1
Abstract:
A method of enhancing an image uses a portable system (1) having an image capture device (6) and a display (8). The method includes capturing one or more frames of image data using the image capture device, dividing a captured frame of image data into one or more blocks of image data, the blocks having plural data elements having attribute values associated with the data elements. For the blocks of image data, a threshold value representative of the attribute values is determined. When an attribute value for a data element is greater than the threshold, a first output attribute value for display is associated with the data element. When an attribute value for a block is less than the threshold, a second output attribute value for display is associated with the data element. The output attribute values are displayed.

Inventors:
HICKS STEPHEN (GB)
RUSSELL NOAH (GB)
ROZETTI CONRAD (GB)
Application Number:
PCT/GB2018/052683
Publication Date:
March 28, 2019
Filing Date:
September 20, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
OXSIGHT LTD (GB)
International Classes:
G06V10/56; G06V30/40
Foreign References:
GB2267793A1993-12-15
Other References:
WANG CHUNYAN ET AL: "Pixel classification algorithms for noise removal and signal preservation in low-pass filtering for contrast enhancement", 2014 19TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, IEEE, 20 August 2014 (2014-08-20), pages 480 - 485, XP032644133, DOI: 10.1109/ICDSP.2014.6900712
Attorney, Agent or Firm:
DEHNS (GB)
Download PDF:
Claims:
Claims

1. A method of enhancing an image using a portable system comprising an image capture device and a display, the method comprising:

capturing one or more frames of image data using the image capture device;

dividing a captured frame of image data into one or more blocks of image data, each block comprising a plurality of data elements each having an attribute value associated with the respective data element;

for one or more of the blocks of image data:

determining a threshold value that is representative of the plurality of attribute values associated with the plurality of data elements in the block; and for each of the plurality of data elements in the block:

when the attribute value associated with the data element is greater than the threshold value for the block:

associating with the data element a first output attribute value for display; and

when the attribute value associated with the data element is less than the threshold value for the block:

associating with the data element a second output attribute value for display; and

displaying the output attribute values for the plurality of data elements of the one or more blocks in the frame of image data on the display of the portable system.

2. A method as claimed in claim 1 , wherein the attribute value associated with each of the plurality of data elements comprises the intensity associated with each of the plurality of data elements.

3. A method as claimed in claim 1 or 2, wherein the number of data elements in the frame of image data that is to be processed in the manner of the present invention is equal to the number of data elements that are output for display.

4. A method as claimed in claim 1 , 2 or 3, comprising, for the one or more of the blocks of image data: calculating a mean value of the plurality of attribute values associated with the plurality of data elements in the block; and determining a threshold value using the mean value of the plurality of attribute values in the block.

5. A method as claimed in claim 4, wherein the step of determining the threshold value for a block comprises, for a first set of the plurality of data elements in the block having associated with the first set of the plurality of data elements a plurality of attribute values that are each greater than the mean value calculated for the block:

calculating an upper mean value of the plurality of attribute values associated with the plurality of data elements in the first set;

for a second set of the plurality of data elements in the block having associated with the second set of the plurality of data elements a plurality of attribute values that are each lower than the mean value calculated for the block:

calculating a lower mean value of the plurality of attribute values associated with the plurality of data elements in the second set; and

determining the threshold value by calculating a central mean value of the upper mean value and the lower mean value.

6. A method as claimed in claim 5, comprising determining if the difference between the lower mean value and the upper mean value is less than a noise threshold, and when the difference between the lower mean value and the upper mean value is less than a noise threshold, associating a default output attribute value with each of the data elements in the block of the frame of image data.

7. A method as claimed in any one of the preceding claims, wherein the first and second output attribute values comprise contrasting colours. 8. A method as claimed in any one of the preceding claims, comprising, for the first and second sets of the plurality of data elements in a block that have associated with them attribute values that are greater than and less than the threshold value respectively: associating a foreground output attribute value for display with the less numerous set of data elements and associating a background output attribute value for display with the more numerous set of data elements.

9. A method as claimed in any one of the preceding claims, comprising determining when a block in the frame of image data contains text. 10. A method as claimed in any one of the preceding claims, wherein the one or more of the blocks in the frame of image data are displayed allocentrically, egocentrically or retinocentrically.

1 1. A method as claimed in any one of the preceding claims, comprising selecting and/or adjusting the output attribute values using a control, selecting and/or adjusting the size of the one or more blocks to be displayed using the control, and/or adjusting the threshold value using the control.

12. A portable system for enhancing an image comprising:

an image capture device for capturing one or more frames of image data; processing circuitry configured to, for a captured frame of image data comprising a plurality of data elements each having an attribute value associated with the data element:

divide the frame of image data into one or more blocks of image data, each block comprising a plurality of data elements; and

for one or more of the blocks of image data:

determine a threshold value that is representative of the plurality of attribute values associated with the plurality of data elements in the block; and

wherein the processing circuitry is configured to, for each of the plurality of data elements in the block:

associate with the data element a first output attribute value for display when the attribute value associated with the data element is greater than the threshold value for the block; and

associate with the data element a second output attribute value for display when the attribute value associated with the data element is less than the threshold value for the block; and

a display configured to display the output attribute values for the plurality of data elements of the one or more blocks in the frame of image data.

13. A portable system as claimed in claim 12, wherein the attribute value associated with each of the plurality of data elements comprises the intensity associated with each of the plurality of data elements. 14. A portable system as claimed in claim 12 or 13, wherein the number of data elements in the frame of image data that is to be processed in the manner of the present invention is equal to the number of data elements that are output for display. 15. A portable system as claimed in claim 12, 13 or 14, wherein the processing circuitry is configured to, for the one or more of the blocks of image data: calculate a mean value of the plurality of attribute values associated with the plurality of data elements in the block; and determine a threshold value using the mean value of the plurality of attribute values in the block.

16. A portable system as claimed in claim 15, wherein, to determine the threshold value for a block, the processing circuitry is configured to, for a first set of the plurality of data elements in the block having associated with the first set of the plurality of data elements a plurality of attribute values that are each greater than the mean value calculated for the block:

calculate an upper mean value of the plurality of attribute values associated with the plurality of data elements in the first set;

for a second set of the plurality of data elements in the block having associated with the second set of the plurality of data elements a plurality of attribute values that are each lower than the mean value calculated for the block:

calculate a lower mean value of the plurality of attribute values associated with the plurality of data elements in the second set; and

determine the threshold value by calculating a central mean value of the upper mean value and the lower mean value.

17. A portable system as claimed in claim 16, wherein the processing circuitry is configured to determine if the difference between the lower mean value and the upper mean value is less than a noise threshold, and the processing circuitry is configured to, when the difference between the lower mean value and the upper mean value is less than a noise threshold, associate a default output attribute value with each of the data elements in the block of the frame of image data.

18. A portable system as claimed in any one of claims 12 to 17, wherein the first and second output attribute values comprise contrasting colours.

19. A portable system as claimed in any one of claims 12 to 18, wherein the processing circuitry is configured to, for the first and second sets of the plurality of data elements in a block that have associated with them attribute values that are greater than and less than the threshold value respectively: associate a foreground output attribute value for display with the less numerous set of data elements and associate a background output attribute value for display with the more numerous set of data elements. 20. A portable system as claimed in any one of claims 12 to 19, wherein the processing circuitry is configured to determine when a block in the frame of image data contains text.

21. A portable system as claimed in any one of claims 12 to 20, wherein the processing circuitry and or the display is configured to display the one or more of the blocks in the frame of image data allocentrically, egocentrically or

retinocentrically.

22. A portable system as claimed in any one of claims 12 to 21 , comprising a control for selecting and/or adjusting the output attribute values, selecting and/or adjusting the size of the one or more blocks to be displayed, and/or adjusting the threshold value.

23. A computer program comprising computer software code for performing the method of any one of claims 1 to 11 when the program is run on a data processor.

Description:
Method of and System for Enhancing an Image

This invention relates to a method of and a system for enhancing an image, in particular a method of and a system for enhancing text in an image of a scene captured by a head mounted device.

In order to assist visually impaired people, the majority of whom have at least some residual visual function, optical devices may be provided to assist their viewing of a scene, e.g. as disclosed in WO 2012/1 14123 A1. The device disclosed in WO 2012/114123 A1 operates to discretise the scene the wearer of the device is viewing and to display this to the user using an array of a finite number of discrete light sources, thus enhancing the image. The device disclosed in WO 2012/1 14123 A1 may help a visually impaired user to improve their comprehension of the visual environment, particular their awareness of large objects. However, such devices struggle to be able to discriminate between certain objects, e.g. to recognise and present text, in a way that can be recognised easily by visually impaired people who may have a smaller or disjointed field of view, and/or difficulty in resolving low light or low contrast images, for example.

Optical character recognition (OCR) may be used to convert images containing text into machine encoded text, which may then be presented to a user in a variety of formats, e.g. as an audio or visual output. However, OCR works best when presented with clear printed text and struggles to recognise text accurately when the text is in an open environment, when the text may be warped, in shadow or unevenly illuminated, for example. Furthermore, OCR is computationally intensive and thus if provided in a wearable device, may not be able to be performed quickly enough to enable a user to be able to read or otherwise comprehend text in real time when viewing a scene.

The present invention seeks to provide an improved method for enhancing an image, e.g. for visually impaired people. When viewed from a first aspect the invention provides a method of enhancing an image using a portable system comprising an image capture device and a display, the method comprising:

capturing one or more frames of image data using the image capture device;

dividing a captured frame of image data into one or more blocks of image data, each block comprising a plurality of data elements each having an attribute value associated with the respective data element;

for one or more of the blocks of image data:

determining a threshold value that is representative of the plurality of attribute values associated with the plurality of data elements in the block; and for each of the plurality of data elements in the block:

when the attribute value associated with the data element is greater than the threshold value for the block:

associating with the data element a first output attribute value for display; and

when the attribute value associated with the data element is less than the threshold value for the block:

associating with the data element a second output attribute value for display; and

displaying the output attribute values for the plurality of data elements of the one or more blocks in the frame of image data on the display of the portable system.

When viewed from a second aspect the invention provides a portable system for enhancing an image comprising:

an image capture device for capturing one or more frames of image data; processing circuitry configured to, for a captured frame of image data comprising a plurality of data elements each having an attribute value associated with the data element:

divide the frame of image data into one or more blocks of image data, each block comprising a plurality of data elements; and

for one or more of the blocks of image data:

determine a threshold value that is representative of the plurality of attribute values associated with the plurality of data elements in the block; and wherein the processing circuitry is configured to, for each of the plurality of data elements in the block:

associate with the data element a first output attribute value for display when the attribute value associated with the data element is greater than the threshold value for the block; and

associate with the data element a second output attribute value for display when the attribute value associated with the data element is less than the threshold value for the block; and

a display configured to display the output attribute values for the plurality of data elements of the one or more blocks in the frame of image data.

The present invention provides a method of and a portable system for enhancing an image, i.e. a system that a user can carry around. In preferred embodiment the portable system comprises a wearable apparatus (e.g. comprising the image capture device, the processing circuitry and the display). A frame of image data is captured (by an image capture device of the portable system) processed, with the frame of image data comprising a plurality of data elements (e.g. pixels) that each have associated with them an attribute value (e.g. YUV colour data). The frame is divided into multiple blocks (each block including multiple data elements) and, for one or more of the blocks, a threshold value is determined that is representative of the plurality of attribute values that are associated with the plurality of data elements in the block.

The threshold value determined for a block is then used to separate the data elements in the block into two groups: a first group for which the attribute values associated with the data elements are each greater than the determined threshold value, and a second group for which the attribute values associated with the data elements are each lower than the threshold value. A particular (first) output attribute value (which may be used for the purposes of display) is associated with each of the data elements in the first group and a different (second) output attribute value (which again may be used for the purposes of display) is associated with each of the data elements in the second group. These output attribute values are then displayed for the plurality of data elements in the frame of image data, on a display of the portable system (and thus the portable system captures, processes and displays the image). It will be appreciated that by processing a frame of image data in the manner of the present invention to convert an input image into a binary output (i.e. the data elements in the frame of image data each have associated with them either the first output attribute value or the second (different) output attribute value), this helps to enhance (e.g. increase the contrast of) the image. This binary (e.g. contrasting) output, is particularly helpful to assist visually impaired people whose visual function may not be sufficient to distinguish between the a whole spectrum of different attribute values (e.g. colours and/or intensities) that may be present in a scene, but who may be able to discern the difference between the two different output attribute values (e.g. contrasting colours) that are provided.

The method of the present invention is particularly helpful in enhancing text (or similar objects, e.g. glyphs, symbols, logos, signs, drawings, etc.) in an image, which may be difficult (e.g. for OCR) to identify in a scene owing to being, inter alia, warped, displayed in low contrast colours, low light or uneven illumination, for example. In addition, there may not be sufficient power available (particularly in a low-powered portable system) to decipher the image using conventional methods (e.g. OCR) which are computationally expensive and therefore power intensive. The method and system of the present invention helps visually impaired people to comprehend, e.g. text in, a scene they are viewing by providing them with the additional contrast that they may require between different objects in the image, e.g. between text and the background on which text may be presented. The Applicant has recognised that by determining and using a threshold value that is representative of the attribute values associated with a plurality of data elements in an image, objects (e.g. text) in an image can be separated from each other (e.g. differently coloured or brightness) objects, e.g. to distinguish text from the background on which the text is presented. It will be appreciated that for areas of text (or similar objects, e.g. glyphs, symbols, logos, signs, drawings, etc.) in an image, the area of the background on which the text is presented is a different (but not necessarily contrasting) colour to the area of the text itself.

Thus, by using the threshold value to separate different parts of an image having differing appearances (i.e. differing attribute values) and then associating (e.g. only) two different attribute values with the two respective sets of data elements that have been categorised using the threshold value (thus binarising the image), the appearance of such images can be enhanced (e.g. the contrast between areas in the image increased) for display in an output image.

Once the enhanced image has been presented in this way, e.g. to a visually impaired person, they are (more likely) able to interpret (e.g. read) the enhanced image, e.g. as text (when text is present in an image being processed in the manner of the present invention). It is therefore unnecessary to use methods (e.g. OCR) that explicitly recognise and convert any text (or other objects) present in an image into a machine-recognisable (e.g. text) output, and no assumptions may need to be made about the particular font used for the text in an image. The method of the present invention may therefore be effective at enhancing other objects such as handwriting, glyphs, symbols, logos, signs, drawings, etc., in an image.

It will thus be appreciated that the method of the present invention is simpler, i.e. separating parts of an image into two separate groups, based on a threshold value that is representative of a set of attribute values, than the complicated processing, e.g. of OCR, that is needed to explicitly recognise and interpret objects (e.g. text) in an image. The system of the present invention may therefore be able to enhance (e.g. text in) an image much quicker (e.g. in real time) and using less power than OCR methods of text recognition.

The image (and thus the frame of image data) to be enhanced may be any suitable image (and frame of image data) that the image capture device captures, e.g. of a scene that the user of the portable system (e.g. that the wearer of the wearable apparatus) is looking at. In a preferred embodiment the method is performed on an image that may or does contain text. Thus preferably the method is a method of (and the system is a system for) enhancing text in an image. (However, because the method preferably does not perform character recognition, the method need not make any assumptions on what the image does (or does not) contain. Rather the method of the present invention may simply be performed on a whole image (thus assuming that the whole image may contain text, or other similar objects that may require enhancement, e.g. handwriting, glyphs, symbols, logos, signs, drawings, etc.), such that anything visible in the image is enhanced.) ln a preferred embodiment the image (or sequence of images) is captured (e.g. in real time) and processed according to the method (and by the portable system) of the present invention. The image capture device, e.g. a (e.g. video) camera, is preferably connected to and arranged to provide the frame(s) of (e.g. video) image data to the processing circuitry for processing according to the method of the present invention. The image capture device and the processing circuitry may be separate components as part of the portable system or they may be part of the same (e.g. integrated) device (e.g. a wearable apparatus).

The frame of image data comprises one or more (e.g. a plurality of) blocks of data elements and is processed in a blockwise manner. Thus the threshold value is determined (separately) for the attribute values associated with the data elements in a (and each of the one or more) block(s).

As well as being able to process the blocks of a frame of image data in parallel (e.g. on a graphics processing unit (GPU), which is particularly suited to such tasks and for use in a, e.g. low powered, portable system (e.g. a wearable apparatus)), owing to the frame of image data being processed in a blockwise manner, performing the method of the present invention for each of the one or more blocks of the frame separately (e.g. independently) helps to account for any variations in the attribute values (e.g. owing to uneven lighting across an image), for the plurality of data elements, that are used to determine the threshold value (without, for example, having to normalise the attribute values across the plurality of data elements in the frame of image data). This helps the enhancement of the (e.g. contrast of the) image to be performed effectively across the whole image. (It will be appreciated that this may be difficult otherwise, owing to the variations in the attribute values (e.g. contrast and/or intensity, e.g. owing to uneven lighting across an image) which may be large across a frame of image data, for which a normal human eye compensates.)

Furthermore, processing the frame of image data as separate blocks helps to reduce the power and time required to enhance the captured image using the method of the present invention. This is because the block-based approach may only require a single pass over the image data (e.g. on a GPU). This contrasts with non-block based schemes which generally require a statistical analysis or histogram generation as a first pass and a second pass of applying a threshold. Such an approach is generally expensive computationally. The method of the present invention is repeated for each of the one or more of the blocks to be processed in a frame of image data, such that the frame may be processed. Similarly, preferably the method of the present invention is repeated for each frame in a sequence of frames of (e.g. video) image data. The blocks of data elements that the frames are divided into for processing can be any suitable and desired blocks of the frames. They are preferably rectangular in shape, e.g. square. The rectangular blocks may take any suitable and desired size. Preferably the rectangular blocks each have a size between and including 8 x 8 data elements (e.g. pixels) and 64 x 64 data elements (e.g. 16 x 16 data elements), preferably with each edge of the rectangular blocks having a size of 2" data elements, where n is an integer (e.g. between and including 3 and 6).

The size (and, e.g., number) of the blocks in a frame of image data may be chosen in any suitable way. For example, the size and/or number of the blocks may depend on the expected size of glyphs (e.g. text characters, symbols, etc.) that are expected to form part of the image (the Applicants have found, for example, that the method of the present invention works well when a large proportion of at least one glyph, but not too many glyphs, is present in a block). The size and/or number of the blocks may depend on the shared memory available and/or the maximum block size that is able to be processed by the processing circuitry (e.g. when the method is being performed on a GPU). The size and/or number of the blocks may depend on the memory bandwidth and/or processing speed of the processing circuitry. The size and/or number of the blocks may depend on the resolution of the input image and/or the output display. The size and/or number of the blocks may depend on the expected variation (e.g. gradient) of the attribute value being used to determine the threshold value, e.g. owing to uneven lighting, or the perspective or orientation of the scene in the input image.

The size and/or number of the blocks in a frame of image data may depend on whether the frame of image data is enlarged (e.g. if only a portion of the frame of image data is to be processed according to the method of the present invention), which may be performed before or after the frame of image data is divided into blocks. In another embodiment, the desired size and/or number of blocks may be achieved by downscaling the captured frame of image data.

In one embodiment, a given frame may comprise blocks of plural different sizes, e.g. that tessellate over the area of the frame, and/or the size and number of the blocks may change between different frames, e.g. depending on the image data of the frame.

Preferably there are between 500 and 5,000 blocks in a frame of image data, e.g. between 2,000 and 4,000 blocks, e.g. approximately 3,600 blocks. In a particularly preferred embodiment a frame of image data is divided into 45 blocks (e.g.

vertically) by 80 blocks (e.g. horizontally), e.g. for a landscape display. Thus, in a particularly preferred embodiment, the frame of image data has 1280x720 data elements (e.g. for the output image).

The attribute value, associated with each of the plurality of data elements in the (e.g. block of the) frame of image data to be processed, may be any suitable and desired attribute value for image data, e.g. one or more of YUV or RGB colour data. In one embodiment the attribute value is the intensity (e.g. the luma or luminosity attribute value from YUV colour data) associated with each of the plurality of data elements. Basing the discrimination of the two groups of data elements on the intensity of the data elements is a helpful attribute, particularly when the image contains text, to be able to enhance the image.

For example, when the image contains text that is desired to be enhanced, generally data elements of the foreground text will have an intensity that is different (and therefore distinguishable, using the method of the present invention) from the intensity of data elements in the background of the image on which the foreground text is present.

Thus preferably an intensity value is associated with each of the data elements in the block of the frame of image data, and the method comprises (and the processing circuitry is configured to), for the one or more blocks of the frame of image data:

determining a threshold value that is representative of the plurality of intensity values associated with the plurality of data elements in the block; and for each of the plurality of data elements in the block:

when the intensity value associated with the data element is greater than the threshold value for the block:

associating with the data element a first output attribute value for display; and

when the intensity value associated with the data element is less than the threshold value for the block:

associating with the data element a second output attribute value for display. In other embodiments the attribute value is a representative value of the data values (e.g. in a multi-channel encoding scheme) associated with each data element. The representative value may be determined in any suitable and desired way, e.g. by combining the plural data values for each data element (e.g. using a voting (or other suitable) scheme), or by choosing an optimum (e.g. "best") value from the plural values for each data element or by using a weighted combination of the plural data values for each data element.

While the image data may comprise colour data (e.g. owing to the image being captured and provided in colour), it will be appreciated that the input image to be processed may comprise (or be converted to) a grayscale image (and thus, for example, the only attribute value associated with the plurality of data elements may be the intensity (luminosity)).

In other embodiments, e.g. when the image data comprises different colours that are of the same or similar luminosity (e.g. red text on a blue-green background), a different type of attribute value (e.g. colour (hue)) may be used to determine the threshold to discriminate between the different areas of the capture frame of image data. The type of attribute value to use may be chosen for (and thus differ between) each captured frame of image data dependent on the characteristics of the image data, e.g. when multiple attribute values are associated with each data element. In one set of embodiments the type of attribute value used may vary between different blocks of the same captured frame of image data, e.g. dependent on the

characteristics of the respective blocks of image data.

The threshold value, which is representative of the plurality of attribute values associated with the plurality of data elements, may be determined in any suitable and desired way. To determine the threshold value, all of the data elements in the originally captured image may be used. However, in one embodiment the originally captured image (which may, for example, contain 5 megapixels) is first down- sampled to convert it into a frame of image data containing fewer pixels (e.g. the plurality of data elements that are then processed in the manner of the present invention). The original image may be down-sampled in any suitable and desired way.

In one embodiment the number of data elements in the frame of image data that is to be processed in the manner of the present invention is equal to the number of data elements that are output for display (and thus, for example, equal to the resolution of the display). However, the relationship between the number of data elements in the frame of image data that is to be processed and the number of data elements that are output for display may vary depending, for example, on the size of the frame of image data (e.g. owing to the resolution of the image capture device), the level that the frame of image data is to be zoomed to, and/or the size of the display. Thus, in some embodiments, the number of data elements in the output (i.e. enhanced) frame of image data may be up-sampled, e.g. for the purposes of zooming into an area of the output frame of image data for display or down- sampled.

In one embodiment the threshold value is determined by calculating a characteristic (e.g. central) value of the plurality of attribute values associated with the plurality of data elements, and using the characteristic value to determine (e.g. set) the threshold value. For example, the characteristic value may comprise a descriptor (e.g. based on the hue values associated with the data elements) or may use Otsu's method. In one embodiment the characteristic value comprises the modal value or median value of the plurality of attribute values associated with the plurality of data elements. In a preferred embodiment the characteristic value is the mean value of the plurality of attribute values associated with the plurality of data elements. Thus preferably the method comprises (and the processing circuitry is configured to): calculating a mean value of the plurality of attribute values associated with the plurality of data elements; and determining a threshold value using the mean value of the plurality of attribute values. It will be appreciated that the calculation of the mean value of a plurality of attribute values is simple, computationally. Thus, while other (more complex) calculations may obtain a more accurate characteristic value for setting the threshold value, the Applicant has found that (the simple calculation of) the mean value may be sufficient for use in discriminating between and thus enhancing objects in a frame of image data, and is particularly suited for (e.g. parallel) processing of blocks of an image using a (e.g. low powered) GPU, e.g. owing to the efficiency of such a calculation.

Preferably the mean of the attribute values is calculated (separately) for the attribute values associated with the data elements in each of the one or more blocks and the threshold value is determined (separately) for the attribute values associated with the data elements in each of the one or more blocks, i.e. using the respective mean values for each of the one or more blocks. It will be appreciated that the same type of attribute value (e.g. intensity) may be used to calculate the mean attribute value in each block of the frame of image data, and in one embodiment this is what is done. However, in some embodiments different types of attribute value are used to calculate the mean attribute value for different blocks of the frame, e.g. when multiple attribute values (luminosity, colour) are associated with each data element.

The mean value of the plurality of attribute (e.g. intensity) values associated with the plurality of data elements of the block of the frame of image data may be calculated in any suitable and desired way. The mean value may be calculated by arranging the plurality of attribute values in a histogram (which may thus involve some rounding or quantisation of the attribute values) and then calculating the mean using the histogrammed values of the attribute values of the plurality of data elements. However preferably the mean is calculated using the raw attribute values associated with the plurality of data elements. This helps to enable the mean to be computed quickly, e.g. on a GPU, as it may not be necessary to construct a histogram.

The threshold value may be determined in any suitable and desired way using the characteristic (e.g. mean) value of the plurality of attribute (e.g. intensity) values, e.g. the threshold value could simply be equal to the mean value. In one

embodiment the characteristic value comprises the attribute value at which the second derivate in a smoothed attribute value histogram (of the attribute values associated with the plurality of data elements) is (e.g. near) zero. This characteristic value is then used to set (e.g. used as) the threshold value. In a preferred embodiment the step of determining the threshold value comprises (and the processing circuitry is configured to):

for a first set of the plurality of data elements having associated with the first set of the plurality of data elements a plurality of attribute values that are each greater than the mean value:

calculating an upper mean value of the plurality of attribute values associated with the plurality of data elements in the first set;

for a second set of the plurality of data elements having associated with the second set of the plurality of data elements a plurality of attribute values that are each lower than the mean value:

calculating a lower mean value of the plurality of attribute values associated with the plurality of data elements in the second set; and

determining the threshold value by calculating a central mean value of the upper mean value and the lower mean value. Thus preferably the threshold value is (equal to) the central mean value.

Determining the threshold value in this way helps to take account (and therefore helps to make the threshold value independent of) the number of data elements that are to be found in the two groups of data elements (e.g. foreground and background). This may therefore help to distinguish more accurately between the two separate groups of data elements when creating the binary output. However, in one embodiment, before the central mean value is calculated to set the threshold value, the method comprises (and the processing circuitry is configured to) determining if the difference between the lower mean value and the upper mean value is less than a (e.g. low) noise threshold (e.g. the lower mean value is (e.g. approximately) equal to the upper mean value), and when the difference between the lower mean value and the upper mean value is less than a (e.g. low) noise threshold, associating a default output attribute value with each of the data elements in the block of the frame of image data. This helps to identify when the (input) attribute values associated with the plurality of data elements are all similar (e.g. approximately equal) (such that there is little or nothing to discriminate between in the image and so the output attribute values can be set without (unnecessarily) performing any further calculations). To account for noise in the input frame of image data (e.g. owing to JPEG noise), the noise threshold helps to set a margin (e.g. tolerance) when the lower mean value is compared with the upper mean value, e.g. when the lower mean value is equal to the upper mean value within the noise threshold that has been set then the default output attribute value is used for all of the data elements in the block of the frame of image data.

In one embodiment, when the upper mean value is less than a particular (e.g. minimum) value, the method comprises associating the first output attribute value with each of the data elements in the block of the frame of image data. In one embodiment, when the lower mean value is greater than a particular (e.g.

maximum) value, the method comprises associating the second output attribute value with each of the data elements in the block of the frame of image data. This may help to identify (e.g. non-text) regions (e.g. owing to the region having relatively average attribute values) that may not desired to be enhanced (e.g.

instead of using the mask described below).

In one embodiment it may (e.g. also) be determined if the input attribute values associated with the plurality of data elements (to be used for determining the threshold value) are (e.g. approximately) equal (e.g. within a noise threshold) to each other (e.g. before the mean values are calculated). Preferably it is determined if the input attribute values associated with the plurality of data elements are (e.g. approximately) equal (e.g. within a noise threshold) to a minimum (e.g. zero) or maximum (e.g. saturated) value for the attribute variable. In these embodiments preferably the method comprises associating a default output attribute value with each of the data elements in the block of the frame of image data, when the plurality of data elements have associated with them (e.g. approximately) equal (e.g. within a noise threshold) input attribute values.

When a default output attribute value is associated with each of the plurality of data elements in the block of the frame of image data, in one embodiment the default output attribute value is the minimum (e.g. zero) or maximum (e.g. saturated) value for the attribute variable. In another embodiment the default output attribute value comprises a value that is representative of the plurality of attribute values associated with the plurality of data elements in the block of the frame of image data.

The threshold value may be fine-tuned, e.g. by a user. This may help to increase the contrast in (or focus) the output, e.g. when the threshold value that has been set is not providing the user with a satisfactory output display. Thus preferably the portable system is configured (e.g. comprises a control) to allow the user to adjust (e.g. fine-tune) the threshold value. Preferably the control comprises a slider, rotary knob, thumb-wheel (or similar discrete or continuous input control), a voice control or a gyroscope based control (e.g. that may use rotation of the control) to allow the user to adjust the threshold value.

Once the threshold value has been determined for a block of a frame of image data, the plurality of data elements are then categorised depending on the attribute values associated with them. The data elements are sorted (i.e. split) into two groups: one having associated attribute values that are (each) greater than the threshold value and the other having associated attribute values that are (each) lower than the threshold value.

For the data elements that have attribute values which are each greater than the threshold value, a first value for an output attribute is associated with each of these data elements. For the data elements that have attribute values which are each less than the threshold value, a second value for an output attribute is associated with each of these data elements (the second value of the, e.g. same, output attribute being different to the first value). The output attribute may be any suitable and desired attribute for outputting for display. Preferably the first value and the second value of the output attribute are (e.g. maximally) contrasting values of the output attribute. This helps to enhance (e.g. increase the contrast of) the image for a visually impaired person. Thus, for example, the output attribute could be the intensity for the data element, such that two different (e.g. contrasting) intensities are output for the two groups of data elements. In one embodiment the two different intensities are maximally contrasting (i.e. black and white).

However, in a preferred embodiment the output attribute comprises colour, and the first value comprises a first colour (value) and the second value comprises a second colour (value). Preferably the first and second colours are (e.g. maximally) contrasting. Thus preferably two different colours are associated with the two groups of data elements. The two (e.g. contrasting) colours may be any suitable and desired colours. Such colour values may be predetermined or may be able to be selected (e.g. from a set of predetermined options) by a user. Thus preferably the system is configured (e.g. comprises a control) to allow the user to select and/or adjust the output attribute value(s). It will be appreciated that some (e.g. contrasting) colours may be better to help a user distinguish between the two groups of data elements than others, and such (e.g. contrasting) colours may differ between different users, e.g. depending on their visual impairment. Thus, for example, some users may find that associating black with one group of data elements and white with the other group of data elements is better, while other users may find that blue and yellow are better.

Preferably the same particular attribute value (e.g. colour) is (e.g. always) assigned to the foreground (e.g. text) data elements and the same particular (but different) attribute value (e.g. colour) is (e.g. always) assigned to the background data elements. These attribute values may be assigned (e.g. associated with the respective data elements) depending on whether the attribute value associated with a data element is greater than or less than the threshold value, e.g. the first value may always be a particular colour value and the second value may always be a particular (different) colour value. Similarly, the output attribute values associated with the data elements may be correlated to the input attribute values for the data elements. For example, when the input attribute value associated with a data element is less than the threshold value, the lower of the two output attribute values is associated with the data element, and when the input attribute value associated with a data element is greater than the threshold value, the upper of the two output attribute values is associated with the data element. Thus, for example, brighter inputs may be associated with a bright output and darker inputs may be associated with a darker output, so to more closely match the output image to the input image. However, this may result in, e.g. text, being displayed differently depending on how it is presented in the original image, e.g. owing to text being displayed in a range of different colours (such as black on white or vice versa), which may not be helpful to a visually impaired person. Preferably the same output attribute (e.g. colour) value is provided for the foreground (e.g. text) data elements each time an image is processed and the same output (e.g. colour) attribute value (having a different value from the foreground output attribute value) is provided for the background (e.g. text) data elements each time an image is processed. This helps to provide consistency to the user, e.g. when the output attribute values are displayed. For example, the foreground output attribute value may always be black and the background output attribute value may always be white (e.g. black text on a white background), though again this can be selected as desired to suit a particular user.

Therefore, in a preferred embodiment the method comprises (and the processing circuitry is configured to), for the two sets of data elements that have associated with them attribute values that are greater than and less than the threshold value respectively: associating a foreground output attribute value for display with the less numerous set of data elements and associating a background output attribute value for display with the more numerous set of data elements. (Thus, in this

embodiment, sometimes the first output attribute value may be the foreground output attribute value and the second output attribute value may be the background output attribute value.)

Thus preferably the method comprises (and the processing circuitry is configured to) determining the number of data elements for which the attribute value

(associated with each of the data elements) is greater than the threshold value and determining the number of data elements for which the attribute value (associated with each of the data elements) is less than the threshold value (e.g. to assess which is the less numerous set of data elements and which the more numerous set of data elements).

Once the output attribute values have been associated with each of the data elements in the two sets of data elements, the output attribute values are displayed for the plurality of data elements. Preferably the method comprises (and the system comprises display processing circuitry (e.g. a display processor) configured to) generating an output surface (e.g. for each frame to be displayed) using the output attribute values for the plurality of data elements. Preferably the system comprises display processing circuitry (e.g. a display processor) configured to display the output surface on the display.

Thus, the data elements having output attribute values associated with them are output (e.g. by an output (e.g. display) processor) for display, e.g. as a block or as a frame (or as an output surface representing the block(s) or frame). In these embodiments, the data elements may correspond to pixels.

When the blocks of data elements of the (e.g. monochrome) output attribute values are output for display, preferably the blocks are compressed (e.g. using run length encoding (RLE)) before transmitting them to the display. This makes it possible transmit the display using a low-speed (e.g. Bluetooth) link. Thus, as will be appreciated, the output blocks comprise (e.g. only) text-like foreground and/or background data elements, the method of the present invention may be able to prepare the output blocks for RLE compression without the loss of information.

The output attribute values of the data elements may be present to a user in any suitable and desired way, e.g. on a display screen or using an array of discrete light sources (e.g. as disclosed in WO 2012/1 14123 A1). Thus the portable system comprises a display configured to display the output attribute values for the plurality of data elements. The display may be provided simply as a screen but preferably the portable system (e.g. the wearable apparatus) comprises a (e.g. augmented reality) head mounted display, e.g. smart glasses.

The display may have any suitable and desired properties. Preferably the display comprises a low resolution display, e.g. approximately 900 pixels across. The display may be opaque and only allow a user to view the display (e.g. output surface) that may be generated according to the present invention. Alternatively the display may be translucent or transparent (e.g. when the user is not being presented with a display) and the output to be displayed according to the present invention may be superimposed on top of this (e.g. in the manner of augmented reality). This may enable the user to view a scene normally (e.g. as they would do through spectacles) but to also be able to see, e.g., enhanced text when this has been (e.g. detected) and processed according to the present invention.

In one embodiment the method of the present invention is performed for each of the one or more (e.g. plurality of) blocks into which the frame of image data has been divided. However, as the present invention is preferably applicable for enhancing text in an image, it will be appreciated that not all images, or blocks in an image, may contain text. The Applicant has therefore appreciated that the method of the present invention may not need to be performed, or the output displayed, for such images or blocks. There may, for example, be alternative ways of enhancing different types of blocks of images that do not contain text.

Therefore preferably the method comprises (and the processing circuitry is configured to) determining when a block in the frame of image data contains text; and when it has been determined that the block in the frame of image data contains text: performing the method of the present invention or displaying the output attribute values. This enables (the e.g. parts of) a frame that contain text to be identified, which can then be processed and/or displayed on this basis.

The determination of whether (or not) a block in a frame of image data contains text may be performed continuously (e.g. automatically), e.g. for each of a sequence of frames of (e.g. video) image data. In another embodiment the determination of whether (or not) a block in a frame of image data contains text may be performed (e.g. only) on request by a user, e.g. when they know that text may be present in a scene (e.g. in a newspaper they would like to read).

A block of a frame of image data that has been identified as not containing text may be, e.g. displayed, in any suitable and desired way. For example, only the blocks of a frame that contain text may be displayed, and the other blocks of the frame may be blank. In one embodiment the method comprises (and the processing and/or display processing circuitry is configured to) applying a mask to blocks of a frame that has been determined not to contain text or applying a mask to blocks of a frame that has been determined to contain text (i.e. the mask may be applied negatively or positively). The mask may be applied only for the purposes of display (e.g. such that only blocks of text or non-text is displayed, e.g. in an enhanced form) but it may be applied for the whole of the method of the present invention, e.g. indicating which blocks of a frame are or are not to be enhanced. The (e.g. masked) non-text blocks of a frame may be output (e.g. displayed) in any suitable and desired way. In one embodiment a mask output attribute value is associated with each of the data elements in the masked blocks of a frame. The masked output attribute value may, for example, be equal to the background attribute value. In a preferred embodiment the masked output attribute value comprises an intermediate attribute value (i.e. between the first and second output attribute values (e.g. between the background and foreground attribute values)). In one embodiment the intermediate attribute value comprises the threshold value. This may help the user to distinguish between areas of an image identified as containing text and area of an image identified as not containing text.

In another embodiment the attribute (e.g. colour data) values associated with the data elements in the masked blocks of a frame are the original attribute (e.g. colour data) values associated with these data elements, e.g. they may simply be output without being processed. Thus these blocks of a frame appear as they do in the input image (e.g. shown as the basic raw image, such as a coloured or grayscale area of the image). This helps to address false negatives as it provides the user with the raw image (e.g. when text has not been detected) which may allow the user to recognise that there may be text in the image that has not been detected. In another embodiment, the output attribute values associated with the data elements in a masked block of a frame may be segmented or enhanced using an alternative method, e.g. using Otsu's method.

When a mask is applied, preferably the mask is applied liberally, such that false positives may be tolerated. This helps to detect as much text as possible.

It will be appreciated that in some of these embodiments the determination of whether a block of a frame of image data contains text may be performed (e.g. by a central processing unit (CPU)) at the same time as, e.g. in parallel with, the method of the present invention (e.g. the enhancement of the image) which may be performed by a different, e.g. graphics processing unit (GPU). Alternatively, the determination of whether a block of a frame of image data contains text may be performed initially on a block of a frame of image data and only (e.g. blocks of a) frame that has been determined to contain text is then processed according to the method of the present invention.

A block of a frame may be determined to contain text, e.g. text may be recognised, in any suitable and desired way. In a preferred embodiment the step of determining when a block in the frame of image data contains text comprises performing blob detection on the block in the frame of image data. Preferably the blob detection comprises using maximally stable extremal regions (MSER), although any suitable method may be used. The Applicant has appreciated that MSER is computationally expensive and so the blob detection may comprise using a simpler (but possibly less reliable) method, e.g. by looking for two distinct peaks in a histogram for a block.

A user may be presented with (and thus an embodiment of the present invention may generate) an alert to indicate that text has been detected in a block of a frame and thus (potentially) contains text to be displayed. A user may then be able to request (or otherwise) to view the enhanced text. Thus the determination of whether (or not) a block in a frame of image data contains text may be performed (e.g. in the background) continuously (e.g. automatically) and when a user requests to view the text, the method of the present invention may then be performed to enhance the text for display. Alternatively the method of the present invention may be performed automatically, e.g. following it being determined that text is present in a frame or indeed all of the time (and, e.g., an enhanced image may, in some embodiments, only be displayed when text has been determined to be present in the frame).

When a block(s) of a frame of image data has been determined as containing text, the output attribute values for the block(s) of the frame of image data may be, e.g., displayed in any suitable and desired way. For example, the text may be displayed in the position in which it is presented in the (input) frame of image data. However, owing to the nature of a user's visual function and impairment, it may be

advantageous to present the text in a particular (e.g. relative) position in the display, e.g. for a sequence of frames of image data, as will now be described.

In one embodiment the block(s) of the frame of image data are displayed

allocentrically. Thus, for example, the block(s) of the frame of image data that contains text may retain their position relative to the other parts of an image, e.g. fixed as the user looks at a scene.

In one embodiment the block(s) of the frame of image data are displayed

egocentrically. Thus, for example, the block(s) of the frame of image data that contains text may retain their position in the displayed output, e.g. for as long as a user wishes to look at the text. In this embodiment the text may, for example, be presented at the centre of the display (e.g. at the centre of the user's field of view) or in any other suitable and desired location, which may be predetermined or chosen by a user, e.g. depending on the visual impairment of the user.

In one embodiment the block(s) of the frame of image data are displayed

retinocentrically. Thus, for example, the block(s) of the frame of image data that contains text may be displayed in different positions depending on the eye movement of the user, e.g. to follow their gaze around. In this embodiment the text may, for example, be presented at the centre of the user's gaze or in any other suitable and desired position, which may be predetermined or chosen by a user, e.g. depending on the visual impairment of the user. When blocks of, e.g. text in, a frame is displayed in one of the ways outlined above (e.g. allocentrically, egocentrically or retinocentrically), a (e.g. dynamic) mask may be used to mask out the, e.g., background block(s) in the frame so that the block(s) of text to be displayed may be displayed as desired. It will be appreciated that other, e.g. status, information may be also displayed in this manner.

In these embodiments preferably the position in which block(s) of, e.g. text in, a frame are displayed (e.g. allocentrically, egocentrically or retinocentrically) preferably depends on received view orientation (e.g. head pose or tracking) data. Thus preferably the system comprises view orientation determining (e.g. head tracking) sensors. Preferably the view orientation sensors comprise an (e.g. nine degrees of freedom) inertial measurement unit (IMU). The IMU preferably comprises one or more (preferably all of) an accelerometer, a gyroscope and a magnetometer. This helps to provide head position and tracking information in the degrees of compass as well as the angle relative to gravity.

Preferably the view orientation sensors periodically generate view orientation (e.g. tracking) information based on the current and/or relative position of the system (e.g. of the image capture device), and are operable to provide that view orientation data periodically to the apparatus (to the processing circuitry and, when required, to the display processing circuitry of the system) for use when determining the position of block(s) of the frame for display. When block(s) of, e.g. text in, a frame is to be displayed retinocentrically, preferably the position in which block(s) of a frame are displayed depends on received eye- gaze (e.g. tracking) information. Thus preferably the system comprises one or more eye-tracking sensors. Preferably the one or more eye-tracking sensors periodically generate eye-gaze (e.g. tracking) information based on the current and/or relative direction of the user's gaze, and are operable to provide that eye-gaze (e.g. tracking) data periodically to the system (to the processing circuitry and, when required, to the display processing circuitry of the system) for use when determining the position of block(s) of the frame for display. The block(s) of, e.g. text in a, frame to be displayed may be displayed in any suitable and desired format. For example, the frame of image data (e.g. that has been processed according to the present invention) may be displayed in its entirety or only a portion (e.g. some block(s) of) the frame may be displayed. For example, only (e.g. block(s) of the frame containing) text that has been, e.g. identified, and enhanced may be displayed.

The (e.g. output surface for) display may be enlarged or reduced (i.e. zoomed into or out of) as is suitable and desired. In one embodiment the (e.g. block(s) of) text may be enlarged or reduced in the display. Text to be displayed may be displayed at the size at which it is present in the frame of image data or the text may be displayed at a pre-set (e.g. predetermined or selectable) size, e.g. height as a fraction of the display. The text may be presented automatically at this (e.g. pre-set) size or a user may be able to control the size, e.g. by selecting and/or adjusting the size of the one or more blocks to be displayed.

When a sequence of frames is processed and displayed according to the present invention, the (e.g. output surface for) display may be paused (i.e. frozen) as is suitable and desired, e.g. by the user using a control. In one embodiment the (e.g. block(s) of) text may be paused in the display and, for example, the background may continue to be displayed in real time. Such, e.g. paused, text may also be moved (e.g. panned) around the display, or zoomed, to allow a user to read a block of text, e.g. by the user using a control. Preferably the text is also rotated, skewed and/or has perspective applied thereto, depending on how the text is moved around the display, e.g. to level up a rotated line of text. Preferably the text is moved depending on how a user moves their head, e.g. the IMU is used to provide an input to determine how to transform the text appropriately. In one embodiment the (e.g. output surface(s) for) display may be recorded and then played back, e.g. were they to be missed when the display was paused. Thus a series of output surfaces for display may be replayed. In one embodiment the (e.g. output surface(s) for) display may be recorded automatically, e.g. when it is detected that text is (or may be) present in an output surface. One or more or all of these display attributes and/or variables outlined above may be controlled (e.g. selected and/or adjusted) by a user, e.g. using a control on the system, as is suitable and desired. As well as the processing circuitry and the display processing circuitry discussed above, the system of the present invention can otherwise include any one or more or all of the processing stages and elements that such a system (e.g. a data processing system) may suitably comprise. In a preferred embodiment, the system further comprises a write-out stage operable to write a frame of image data (and, e.g., the output attribute values for the plurality of data elements) to external memory. This will allow the processing circuitry to write (e.g. the output attribute values for the plurality of data elements in) an input surface or surfaces to external memory (such as a frame buffer), e.g., and preferably, from where it can be read (e.g. selectively) by the display processing circuitry when generating an output surface (e.g. frame).

The various circuitry and stages of the apparatus or data processing system may be implemented as desired, e.g. in the form of one or more fixed-function units (hardware) (i.e. that is dedicated to one or more functions that cannot be changed), or as one or more programmable processing stages, e.g. by means of

programmable circuitry that can be programmed to perform the desired operation. There may be both fixed function and programmable stages. One or more of the various stages of the system may be provided as separate circuit elements to one another. Additionally or alternatively, some or all of the stages may be at least partially formed of shared circuitry.

The display that the system of the present invention is used with may be any suitable and desired display (display panel), such as for example, a screen. It may comprise the apparatus's (device's) local display (screen) and/or an external display. There may be more than one display output, if desired.

In a particularly preferred embodiment, the display that the system is used with comprises an augmented reality head-mounted display, e.g. smart glasses. That display accordingly preferably comprises a display panel for displaying the output surfaces generated in the manner of the present invention to the user, and a lens or lenses through which the user will view the displayed output frames. Correspondingly, the display preferably has associated view orientation determining (e.g. head tracking) sensors as outlined above.

The apparatus may and preferably does also comprise one or more of, and preferably all of: a central processing unit, a graphics processing unit, a video processor (codec), a display controller, a system bus, and a memory controller.

The system may be, and preferably is, configured to communicate with one or more of (and the present invention also extends to an arrangement comprising one or more of): an external memory (e.g. via the memory controller), one or more local displays, and/or one or more external displays. The external memory preferably comprises a main memory (e.g. that is shared with the central processing unit (CPU)) of the system.

Thus, in some embodiments, the system comprises, and/or is in communication with, one or more memories and/or memory devices that store the data described herein, and/or store software for performing the processes described herein. The system may also be in communication with and/or comprise a host microprocessor, and/or with and/or comprise a display for displaying images based on the data generated by the system.

In use of the processing of the present invention, in an embodiment, one or more input frames of image data surfaces will be generated (e.g. captured) by a (e.g. video) camera and stored in memory. Thus the apparatus or system may comprise, and/or may be in communication with a (e.g. video) camera that generates the frames of video image data. Those input frames will then be processed by the processing circuitry to provide the output attribute values, e.g. for an output surface for display, e.g. on a display.

The processing circuitry and/or the display processing circuitry may be

implemented in any suitable and desired component of the system. In one embodiment the system comprises a graphics processing unit (GPU) comprising the processing circuitry and/or display composition circuitry.

Preferably the GPU is configured to write out the output surface(s) to an output frame buffer for display. The system may therefore also comprise a display controller operable to provide the output surface to a display, e.g. by reading in the output surface from the output frame buffer and sending the output surface to the display. Although the present invention has been described above with particular reference to the generation of a single output surface from an input frame, as will be appreciated by those skilled in the art, in preferred embodiments of the present invention at least, there will be plural input frames being provided, representing successive frames of a sequence of frames to be displayed to a user. The display processing circuitry of the system will accordingly preferably operate to provide a sequence of plural output surfaces for display. Thus, in a particularly preferred embodiment, the operation in the manner of the present invention is used to generate a sequence of plural output surfaces for display to a user.

Correspondingly, the operation in the manner of the present invention is preferably repeated for plural output frames to be displayed, e.g., and preferably, for a sequence of frames to be displayed.

The generation of output surfaces may also, accordingly, and correspondingly, comprise generating a sequence of "left" and "right" output surfaces to be displayed to the left and right eyes of the user, respectively. Each pair of "left" and "right" output surfaces may be generated from a common input surface, or from respective "left" and "right" input surfaces, as desired. In the embodiments in which the system comprises eye tracking sensors, the eye tracking sensors may be arranged to collect eye vergence information. Such eye vergence information may then be used to adjust the lateral offset of the left and right output surfaces accordingly. In the embodiments in which the system comprises view orientation determining (e.g. head tracking) sensors, the head roll (and therefore an estimation of the eye roll) information may be determined. This may then be used to adjust the vertical shift of the left and right output surfaces accordingly. It will be appreciated that the left and right output surfaces produced may be the same. However, the left and right output surfaces may differ, e.g. to work around scotomata or other visual field asymmetries. The present invention can be implemented in any suitable system, such as a suitably configured micro-processor based system. Preferably, the present invention is implemented in a computer and/or micro-processor based system. Preferably the portable system is battery-powered. The different components of the portable system may be provided and coupled to each other in any suitable and desired manner. For example, the components may all be separate (e.g. distinct) from each other or one or more (e.g. all) of the components may be provided in the same (e.g. integrated) apparatus or device. When the components are provided in separate devices, preferably the

components are arranged to communicate with each other wirelessly (e.g. using Bluetooth). This helps the components of the system to transfer the data used by the method between the components (e.g. the image capture device, the processing circuitry and the display). In one embodiment the image capture device and the processing circuitry are housed in a remote (e.g. hand-held) device that is wirelessly connected to the display.

The present invention is preferably implemented in an augmented reality display device such as, and preferably, an augmented reality headset, e.g. smart glasses. Thus, according to another aspect of the present invention, there is provided an augmented reality display device comprising the portable system of any one or more of the aspects and embodiments of the present invention. Correspondingly, according to another aspect of the present invention, there is provided a method of operating an augmented reality display device, comprising operating the augmented reality display device in the manner of any one or more of the aspects and embodiments of the present invention.

The various functions of the present invention can be carried out in any desired and suitable manner. For example, the functions of the present invention can be implemented in hardware or software, as desired. Thus, for example, unless otherwise indicated, the various functional elements, stages, and "means" of the present invention may comprise a suitable processor or processors, controller or controllers, functional units, circuitry, processing logic, microprocessor

arrangements, etc., that are operable to perform the various functions, etc., such as appropriately dedicated hardware elements (processing circuitry), and/or programmable hardware elements (processing circuitry) that can be programmed to operate in the desired manner. In one embodiment the processing circuitry comprises a (e.g. low-powered) microcontroller. This may be suitable when the images captured are small (e.g. 320 x 240 pixels or smaller). Preferably the processing circuitry comprises an accelerator (e.g. a DSP (digital signal processor)) for the micro-controller. Owing, in at least preferred embodiments, to the simplicity of the step of dividing the captured image into the one or more blocks, this may be performed by such a (e.g. low-powered) micro-controller. The step of enhancing (e.g. binarising) each block may then be performed by a GPU, for example.

The methods in accordance with the present invention may be implemented at least partially using software, e.g. computer programs. It will thus be seen that when viewed from further aspects the present invention provides computer software specifically adapted to carry out the methods herein described when installed on a data processor, a computer program element comprising computer software code portions for performing the methods herein described when the program element is run on a data processor, and a computer program comprising software code adapted to perform all the steps of a method or of the methods herein described when the program is run on a data processing system. The data processor may be a microprocessor system, a programmable FPGA (field programmable gate array), a CPU, a GPU, a DSP, etc..

The present invention also extends to a computer software carrier comprising such software which when used to operate a data processing system, or microprocessor system comprising a data processor causes in conjunction with said data processor said controller or system to carry out the steps of the methods of the present invention. Such a computer software carrier could be a physical storage medium such as a ROM chip, CD ROM, RAM, flash memory, or disk, or could be a signal such as an electronic signal over wires, an optical signal or a radio signal such as to a satellite or the like. It will further be appreciated that not all steps of the methods of the present invention need be carried out by computer software and thus from a further broad embodiment the present invention provides computer software and such software installed on a computer software carrier for carrying out at least one of the steps of the methods set out herein.

The present invention may accordingly suitably be embodied as a computer program product for use with a computer system. Such an implementation may comprise a series of computer readable instructions fixed on a tangible, non-transitory medium, such as a computer readable medium, for example, diskette, CD-ROM, ROM, RAM, flash memory, or hard disk. It could also comprise a series of computer readable instructions transmittable to a computer system, via a modem or other interface device, over either a tangible medium, including but not limited to optical or analogue communications lines, or intangibly using wireless techniques, including but not limited to microwave, infrared or other transmission techniques. The series of computer readable instructions embodies all or part of the functionality previously described herein.

Those skilled in the art will appreciate that such computer readable instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Further, such instructions may be stored using any memory technology, present or future, including but not limited to,

semiconductor, magnetic, or optical, or transmitted using any communications technology, present or future, including but not limited to optical, infrared, or microwave. It is contemplated that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation, for example, shrink-wrapped software, pre-loaded with a computer system, for example, on a system ROM or fixed disk, or distributed from a server or electronic bulletin board over a network, for example, the Internet or World Wide Web. A number of preferred embodiments of the present invention will now be described by way of example only and with reference to the accompanying drawings, in which:

Figure 1 shows a pair of smart spectacles according to an embodiment of the present invention being used by a user;

Figure 2 shows schematically the hardware components of the smart spectacles shown in Figure 1 ;

Figure 3 shows schematically the processing of the data by the components shown in Figure 2;

Figures 4 and 5 are flow charts showing methods of enhancing an image according to embodiments of the present invention;

Figure 6a shows an image to be processed using a method according to an embodiment of the present invention; and

Figure 6b shows an enhanced version of Figure 6a that has been generated using a method according to an embodiment of the present invention.

In order to assist visually impaired people, the majority of whom have at least some residual visual function, optical devices may be provided to assist their viewing of a scene. Figure 1 shows a pair of "smart" spectacles 1 according to an embodiment of the present invention being used by a user 2 to view a scene 4.

The "smart" spectacles 1 include a camera 6 that is operable to capture images of the scene 4 the user 2 is looking at (pointing their head towards when wearing the spectacles 1). The spectacles 1 also include a display 8 that is operable to display images that are representative of the images the camera 6 has captured, after they have been processed (enhanced) by the smart spectacles 1. The smart spectacles 1 further includes a control 10 that is connected to the camera 6 and the display 8 via a wired connection 12. The control 10 may be used by the user 2 to control one or more functions of the smart spectacles 1.

Figure 2 shows schematically the hardware components of the smart spectacles 1 shown in Figure 1. The smart spectacles 1 include a camera 6 and a display 8 (as shown in Figure 1), and a processor 14 (e.g. a GPU) that receives an input from the camera 6 (e.g. one or more captured images) and the control 10, and provides an output to the display 8 (e.g. one or more enhanced images). As shown in Figure 2, the display 8 (as part of the smart spectacles 1) includes separate left and right displays 16, 18 for each eye. Operation of the smart spectacles, according to one embodiment of the present invention, will now be described with reference to Figures 1 to 4. Figure 3 shows schematically the processing of the data by the components shown in Figure 2. Figure 4 is a flow chart showing a method of enhancing an image according to an embodiment of the present invention.

First, the camera 6 takes an image 20 (or a series of images) of a scene 4 that the user 2 is looking at. An example of such an image 20 is shown in Figure 3. This may be when a user 2 signals, using the control 10, that they wish for an image 20 to be enhanced (e.g. when the image contains text) or images 20 may be taken automatically by the camera 6.

The image 20 is divided into an array of blocks ("cells") 22 (step 101 , Figure 4), as shown in Figure 3. Each block 22 is processed separately by the processor 14. Each block 22 (of which an example "sub-image" block 22 is shown in Figure 3) of the raw image 20 includes an array of pixels. Each pixel has associated with it data that represents the intensity (i.e. the brightness) of the grayscale image 20 in the block 22.

Each block 22 is processed (step 102, Figure 4) by the processor 14 to calculate the mean of the intensity values of the pixels in the block 22 (step 103, Figure 4). This mean value sets a threshold for classifying the pixels in the block 22 into foreground pixels and background pixels for display as a binarised image 24 for the block 22. Using the determined mean, the number of pixels in the block 22 having an intensity that is less than the mean is counted (step 104, Figure 4). If the counted number of pixels is less than half the number of pixels in the block 22 (step 105, Figure 5), each of the pixels having an intensity that is less than the mean is assigned a "foreground" intensity for display and the remaining pixels are each assigned a "background" (i.e. contrasting) intensity for display (step 106, Figure 4). Alternatively, if the counted number of pixels is not less than half the number of pixels in the block 22 (step 105, Figure 5), each of the pixels having an intensity that is less than the mean is assigned a "background" intensity for display and the remaining pixels are each assigned a "foreground" (i.e. contrasting) intensity for display (step 107, Figure 4).

The resultant binarised block 24 (with the majority of pixels being those of the "background" having an intensity less than the mean and coloured black, with the remaining "foreground" pixels being coloured white) is then output to the display 8, with the resultant left and right 16, 18 versions of the binarised block 24 produced from all of the blocks 22 that have been processed (e.g. in parallel) in this manner.

The user 2 may use the control 10 to perform a variety of tasks: e.g. to freeze the displayed image, to zoom into or out of the displayed image, to adjust the output colours in the display, to pan around the image, to adjust the threshold value, etc..

Operation of the smart spectacles 1 , according to another embodiment of the present invention, will now be described with reference to Figures 1 to 3 and 5. Figure 5 is a flow chart showing a method of enhancing an image according to another embodiment of the present invention.

The process shown in Figure 5 is similar to that shown in Figure 4, except that the threshold (for determining whether a pixel in a block should be classified as being a "background" or a "foreground" pixel).

Once an image 20 (or a series of images) of a scene 4 that the user 2 is looking at has been taken by the camera 6, the image 20 is divided into an array of blocks 22 (step 201 , Figure 5), as shown in Figure 3. Each block 22 is processed (step 202, Figure 5) by the processor 14 to calculate the mean of the intensity values of the pixels in the block 22 (step 203, Figure 5). This mean value is then used to determine a threshold for classifying the pixels in the block 22 into foreground pixels and background pixels for display as a binarised image 24 for the block 22.

However, if the mean of the intensity values of the pixels in the block 22 is less than a tolerance margin (e.g. approximately zero), all the pixels in the output (binarised) image 24 are set to be black (step 204, Figure 5). (Similarly, if the mean of the intensity values of the pixels in the block 22 is approximately one (within a tolerance margin), all the pixels in the output (binarised) image 24 are set to be white.)

To calculate the threshold value for discriminating between the pixels in a block 22, two different mean values (a "lower mean" and an "upper mean") are determined for two different sets of pixels in the block 22. For the sub-set of pixels in the block 22 having an intensity value that is less than the mean value, the mean of the intensity values of this sub-set of pixels is calculated (step 205, Figure 5) to determine the lower mean value. For the sub-set of pixels in the block 22 having an intensity value that is greater than the mean value, the mean of the intensity values of this sub-set of pixels is calculated (step 206, Figure 5) to determine the upper mean value.

If the lower mean value and the upper mean value differ only by a tolerance margin (e.g. are approximately equal, e.g. owing to the JPG noise of the raw intensity values) (step 207, Figure 5), all of the pixels in a block 22 are assigned the intensity value of 0 (black) or 1 (white), i.e. "saturated" (step 208, Figure 5).

If the lower mean value and the upper mean value are not approximately equal (i.e. outside of the tolerance margin), the mean value of the lower mean value and the upper mean value is calculated (step 209, Figure 5). This "central" mean value is then used as the threshold for discriminating between the pixels in the block 22, e.g. in the same manner in which the threshold is applied in Figure 4.

Thus, first the number of pixels in the block 22 having an intensity that is less than the central mean is counted. If the counted number of pixels is less than half the number of pixels in the block 22, each of the pixels having an intensity that is less than the central mean is assigned a "foreground" intensity for display and the remaining pixels are each assigned a "background" (i.e. contrasting) intensity for display. Alternatively, if the counted number of pixels is not less than half the number of pixels in the block 22, each of the pixels having an intensity that is less than the central mean is assigned a "background" intensity for display and the remaining pixels are each assigned a "foreground" (i.e. contrasting) intensity for display. (The roles of the foreground and the background may be reversed, e.g. when the zoom level is such that the assumption that there are a greater number of foreground pixels than background pixels may no longer hold. One way to achieve this may be to allow the neighbouring blocks (e.g. the surrounding eight blocks) to be used to determine (e.g. to "vote" on) the background colour to be used.)

The resultant binarised block 24 (with the majority of pixels being those of the "background" having an intensity less than the mean and coloured black, with the remaining "foreground" pixels being coloured white) is then output to the display 8, with the resultant left and right 16, 18 versions of the binarised block 24 produced from all of the blocks 22 that have been processed (e.g. in parallel) in this manner.

Figures 6a and 6b show an example of an image that has been processed in the manner of the present invention (in particular the method shown in Figure 5). The grayscale image shown in Figure 6a (e.g. taken using the camera 6 on the smart spectacles 1) is provided as an input to be enhanced. The image of Figure 6a is then processed using the method of the present invention to output the binarised image shown in Figure 6b. As will be appreciated, the enhanced (higher contrast) image of Figure 6a (and particularly the text in the image) is easier for a visually impaired user 2 to comprehend, owing to the high contrast which does not contain areas of intermediate intensity that they may not be able to resolve and will thus be confusing.

It can be seen from the above that, at least in preferred embodiments, the present invention provides a method of and a portable system for enhancing (e.g.

increasing the contrast of) an image that has been captured by the portable system and then displaying the enhanced, binarised image.

This binary (e.g. contrasting) output, is particularly helpful to assist visually impaired people and is particularly helpful in enhancing text (or similar objects, e.g. glyphs, symbols, logos, signs, drawings, etc.) in an image.

Using a threshold value to binarise an image is much simpler than the complicated processing, e.g. of OCR, that is needed to explicitly recognise and interpret objects (e.g. text) in an image. The method of the present invention is thus particularly suited to being performed using a (e.g. low-powered) portable system, e.g. using a GPU.

Processing the captured image in a blockwise manner is also advantageous as this allows blocks of an image to be processed in parallel, which again is particularly suited to being performed by a (e.g. low-powered) GPU.

The skilled person will appreciate that a number of variants to the preferred embodiments may also be provided, within the scope of the present invention. For example, the processor may operate to (e.g. when requested by the user using the control) identify regions (e.g. blocks) of an image that (e.g. are likely to) contain text and then only performing the method of the present invention for such regions.