COMPUTER-IMPLEMENTED METHODS OF BLURRING A DIGITAL IMAGE; COMPUTER TERMINALS AND COMPUTER PROGRAM PRODUCTS

Title:

COMPUTER-IMPLEMENTED METHODS OF BLURRING A DIGITAL IMAGE; COMPUTER TERMINALS AND COMPUTER PROGRAM PRODUCTS

Document Type and Number:

WIPO Patent Application WO/2024/018166

Kind Code:

Abstract:

There is disclosed a computer-implemented method of blurring a digital image, the digital image comprising pixels, the method including the steps of: (i) processing the digital image, using original pixel blocks (e.g. 8x8 pixel blocks) to determine in which original pixel blocks the image satisfies a criterion of being smoothly varying; (ii) for each original pixel block in which the image satisfies the criterion of being smoothly varying, producing a corresponding blurred pixel block, in which bilinear interpolation of pixels of a respective original pixel block is used to produce a respective corresponding blurred pixel block; (iii) for each original pixel block in which the image does not satisfy the criterion of being smoothly varying, producing a corresponding blurred pixel block, including using a two pass Gaussian blur for pixels of a respective original pixel block to produce a respective corresponding blurred pixel block; (iv) assembling a blurred digital image, using the corresponding blurred pixel blocks produced in steps (ii) and (iii). Related methods, computer terminals and computer program products are disclosed.

Inventors:

STREATER STEPHEN (GB)

Application Number:

PCT/GB2023/050454

Publication Date:

January 25, 2024

Filing Date:

February 28, 2023

Export Citation:

Click for automatic bibliography generation Help

Assignee:

BLACKBIRD PLC (GB)

International Classes:

H04N21/234; G06F16/957; G06Q30/0241; G06T3/40; G06T5/20; H04N19/30; H04N19/63; H04N21/231; H04N21/2347; H04N21/2665; H04N21/431; H04N21/4402; H04N21/462; H04N21/472; H04N21/81; H04N21/845

Domestic Patent References:

WO2018197911A1	2018-11-01
WO2018127695A2	2018-07-12
WO2005048607A1	2005-05-26
WO2007077447A2	2007-07-12
WO2005101408A1	2005-10-27

Foreign References:

US20180109804A1	2018-04-19
EP3477582A1	2019-05-01
US20160142747A1	2016-05-19
JP2002077726A	2002-03-15
EP3103258B1	2020-07-15
US20140359656A1	2014-12-04
US5953503A	1999-09-14
US20020118828A1	2002-08-29
US20140002598A1	2014-01-02
US20050185795A1	2005-08-25
US20210084304A1	2021-03-18
EP1738365B1	2009-11-04
JP2011129979A	2011-06-30
JPH10224779A	1998-08-21
EP3296952B1	2020-11-04
EP1494174B1	2018-11-07
US9179143B2	2015-11-03
US8711944B2	2014-04-29
US8660181B2	2014-02-25
EP1738365B1	2009-11-04
US8255802B2	2012-08-28

Other References:

MILLER LISA: "Approximate Non-Stationary Convolution Using a Block Separable Matrix", RESAERCHGATE, 16 December 2014 (2014-12-16), pages 1 - 16, XP093042267, Retrieved from the Internet [retrieved on 20230426]
HURTIK PETR ET AL: "Bilinear Interpolation over fuzzified images: Enlargement", 2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), IEEE, 2 August 2015 (2015-08-02), pages 1 - 8, XP032819025, DOI: 10.1109/FUZZ-IEEE.2015.7338082
ANONYMOUS: "Efficiently compressing dynamically generated web content", 6 December 2012 (2012-12-06), pages 1 - 14, XP093057362, Retrieved from the Internet [retrieved on 20230623]

Attorney, Agent or Firm:

ORIGIN LIMITED (GB)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

Image Blur or Video Blur

1. A computer-implemented method of blurring a digital image, the digital image comprising pixels, the method including the steps of:

(i) processing the digital image, using original pixel blocks (e.g. 8x8 pixel blocks) to determine in which original pixel blocks the image satisfies a criterion of being smoothly varying;

(ii) for each original pixel block in which the image satisfies the criterion of being smoothly varying, producing a corresponding blurred pixel block, in which bilinear interpolation of pixels of a respective original pixel block is used to produce a respective corresponding blurred pixel block;

(iii) for each original pixel block in which the image does not satisfy the criterion of being smoothly varying, producing a corresponding blurred pixel block, including using a two pass Gaussian blur for pixels of a respective original pixel block to produce a respective corresponding blurred pixel block;

(iv) assembling a blurred digital image, using the corresponding blurred pixel blocks produced in steps (ii) and (iii).

2. The method of Claim 1, in which in step (iii), in a first pass, a one-dimensional kernel is used to Gaussian blur pixels for an original pixel block in a first direction, to produce an intermediate pixel block, and in a second pass, the same one-dimensional kernel is used to Gaussian blur pixels for the intermediate pixel block in a direction orthogonal to the first direction.

3. The method of Claims 1 or 2, in which the Gaussian blur is produced for pixels within a pixel radius, e.g. a pixel radius of from 4 to 12 pixels.

4. The method of Claim 3, wherein the criterion of being smoothly varying includes that an area (e.g. a square area) is smoothly varying as far as the pixel radius, in which the area may extend outside a pixel block. 5. The method of any previous Claim, in which the method is used when executing in a browser.

6. The method of any previous Claim, in which the method is implemented in j avascript.

7. The method of any previous Claim, in which the method is used when executing on a smart TV, on a desktop computer, on a laptop computer, on a tablet computer or on a smartphone computer.

8. The method of any previous Claim, in which the method is used for processing video.

9. The method of Claim 8, in which the method is used for processing video in real-time.

10. The method of any previous Claim, in which the pixels are represented by pixel values.

11. The method of any previous Claim, in which the criterion of being smoothly varying includes that the expression a+d-b-c is less than a predefined percentage of a, b, c or d, where a, b, c and d are pixel values at the comers of the pixel block, and where a and d are pixel values of opposite corners of the pixel block.

12. The method of Claim 11, in which the predefined percentage is 10%, or 5%, or 3%, or 2% or 1%.

13. The method of any previous Claim, in which the pixel blocks are 4x4 pixel blocks, or 8x8 pixel blocks, or at least 4x4 pixel blocks.

14. The method of Claim 13, in which 8x8 blocks are used for the bilinear interpolation blur, and a pixel radius of from 4 to 12 pixels is used for the Gaussian blur portions of the blur. 15. The method of any of Claims 1 to 12, in which the pixel blocks are 16x16 pixel blocks, or in which the pixel blocks are in the range of 4x4 to 16x16 pixel blocks.

16. The method of any previous Claim, in which the pixel values at the corners of the pixel block are obtained by averaging nearby pixel values, e.g. by averaging one pixel further (e.g. averaging the pixel and its eight nearest neighbours), or by averaging two pixels further (e.g. averaging the pixel and its 24 nearest neighbours).

17. The method of any previous Claim, in which the criterion of being smoothly varying includes that the magnitude of a gradient change across a boundary between a pixel block identified as potentially being smoothly varying and a pixel block that is identified as being not smoothly varying is less than a predefined amount.

18. The method of any previous Claim, in which the criterion of being smoothly varying includes that a pixel block identified as potentially being smoothly varying does not include a sharp edge.

19. The method of any previous Claim, in which block edges are obfuscated, if some jagged block edges are produced.

20. The method of any previous Claim, including the step of storing the assembled image.

21. The method of any previous Claim, including the step of displaying the assembled image on a display.

22. A computer terminal configured to blur a digital image, the digital image comprising pixels, the computer terminal configured to:

(i) process the digital image, using original pixel blocks (e.g. 8x8 pixel blocks) to determine in which original pixel blocks the image satisfies a criterion of being smoothly varying; (ii) for each original pixel block in which the image satisfies the criterion of being smoothly varying, produce a corresponding blurred pixel block, in which bilinear interpolation of pixels of a respective original pixel block is used to produce a respective corresponding blurred pixel block;

(iii) for each original pixel block in which the image does not satisfy the criterion of being smoothly varying, produce a corresponding blurred pixel block, including using a two pass Gaussian blur for pixels of a respective original pixel block to produce a respective corresponding blurred pixel block;

(iv) assemble a blurred digital image, using the corresponding blurred pixel blocks produced in (ii) and (iii).

23. The computer terminal of Claim 22, configured to perform a method of any of Claims 1 to 21.

24. A computer program product, the computer program product executable on a computer terminal to blur a digital image, the digital image comprising pixels, the computer program product executable on the computer terminal to:

(i) process the digital image, using original pixel blocks (e.g. 8x8 pixel blocks) to determine in which original pixel blocks the image satisfies a criterion of being smoothly varying;

(ii) for each original pixel block in which the image satisfies the criterion of being smoothly varying, produce a corresponding blurred pixel block, in which bilinear interpolation of pixels of a respective original pixel block is used to produce a respective corresponding blurred pixel block;

(iv) assemble a blurred digital image, using the corresponding blurred pixel blocks produced in (ii) and (iii).

25. The computer program product of Claim 24, executable to perform a method of any of Claims 1 to 21. Identifying portions of a digital image suitable for advertising

26. A computer-implemented method of identifying portions of a digital image suitable for presenting advertising, the digital image comprising pixels, the method including the steps of:

(i) processing the digital image, using original pixel blocks (e.g. 8x8 pixel blocks) to determine in which original pixel blocks the image satisfies a criterion of being smoothly varying;

(ii) receiving an advertisement and a size of the advertisement, to be inserted into the digital image;

(iii) identifying a portion of the digital image which contains an uninterrupted area of pixel blocks which satisfy the criterion of being smoothly varying, which is large enough to receive the size of the advertisement;

27. The method of Claim 26, including checking that the colours of the advertisement differ from the colours of the uninterrupted area of pixel blocks which satisfy the criterion of being smoothly varying, which is large enough to receive the size of the advertisement, before performing step (iv).

28. The method of Claims 26 or 27, wherein the method is used when executing in a browser.

29. The method of any of Claims 26 to 28, wherein the method is implemented in javascript.

30. The method of any of Claims 26 to 29, wherein the method is used when executing on a smart TV, on a desktop computer, on a laptop computer, on a tablet computer or on a smartphone computer. 31. The method of any of Claims 26 to 30, wherein the method is used for processing video, e.g. in real time.

32. The method of any of Claims 26 to 31, wherein the pixels are represented by pixel values.

33. The method of any of Claims 26 to 32, wherein the criterion of being smoothly varying includes that the expression a+d-b-c is less than a predefined percentage of a, b, c or d, where a, b, c and d are pixel values at the corners of the pixel block, and where a and d are pixel values of opposite corners of the pixel block.

34. The method of Claim 33, wherein the predefined percentage is 10%, or 5%, or 3%, or 2% or 1%.

35. The method of any of Claims 26 to 34, wherein the pixel blocks are 4x4 pixel blocks, or 8x8 pixel blocks, or at least 4x4 pixel blocks.

36. The method of any of Claims 26 to 34, wherein the pixel blocks are 16x16 pixel blocks, or wherein the pixel blocks are in the range of 4x4 to 16x16 pixel blocks.

37. The method of any of Claims 26 to 36, wherein the pixel values at the corners of the pixel block are obtained by averaging nearby pixel values, e g. by averaging one pixel further (e.g. averaging the pixel and its eight nearest neighbours), or by averaging two pixels further (e.g. averaging the pixel and its 24 nearest neighbours).

38. The method of any of Claims 26 to 37, wherein the criterion of being smoothly varying includes that the magnitude of a gradient change across a boundary between a pixel block identified as potentially being smoothly varying and a pixel block that is identified as being not smoothly varying is less than a predefined amount.

39. The method of any of Claims 26 to 38, wherein the criterion of being smoothly varying includes that a pixel block identified as potentially being smoothly varying does not include a sharp edge. 40. A computer terminal configured to perform a method of any of Claims 26 to 39.

41. A computer program product executable on a computer to perform a method of any of Claims 26 to 39.

Blurred borders

42. A computer implemented method for blurring a left hand border and a right hand border next to a video displayed in a landscape orientation display, the video being a portrait orientation video, the method including the steps of:

(i) identifying a respective portion of a portrait orientation video frame to be enlarged and blurred, for each respective border, and displaying the portrait orientation video frame in the landscape orientation display;

(ii) performing a Gaussian blur of a n x n pixel block of an identified respective portion, to produce a Gaussian blurred n x n pixel block, where n>=2 (e g. n=2);

(iii) performing bilinear interpolation of the Gaussian blurred n x n pixel block, to produce a m x m pixel block, where m>= 2n (e.g. m= 2n);

(iv) displaying the m x m pixel block in a border of the landscape orientation display, at a position corresponding to the position of the n x n pixel block in the identified respective portion of the portrait orientation video frame;

(v) repeating steps (ii) to (iv) for all n x n pixel blocks in the identified respective portions of the portrait orientation video frame;

(vi) repeating steps (i) to (v) for all frames in the portrait orientation video.

43. The method of Claim 42, wherein step (v) includes storing a landscape orientation video frame including the portrait orientation video frame, a blurred left hand border and a blurred right hand border, and wherein step (vi) includes storing a video file comprising the stored landscape orientation video frames.

44. The method of Claims 42 or 43, wherein n=2. 45. The method of any of Claims 42 to 44, wherein m=2n.

46. The method of any of Claims 42 to 45, in which computer memory used to store the blurred borders is first used to store an identified respective portion of a portrait orientation video frame to be enlarged and blurred, and then the results of step (iii) are used to overwrite the memory used to store the blurred borders as the process progresses, so no additional workspace outside the memory used to store the blurred borders is required.

47. The method of any of Claims 42 to 46, in which the method is performed in real-time, on a client device, which is displaying the portrait orientation video, the blurred left hand border and the blurred right hand border on its landscape orientation display.

48. The method of any of Claims 42 to 47, wherein the method is used when executing on a smart TV, on a desktop computer, on a laptop computer, on a tablet computer or on a smartphone computer.

49. The method of any of Claims 42 to 48, wherein the method is executed in a browser environment.

50. The method of any of Claims 42 to 49, wherein the method is executed in javascript.

51. A computer terminal configured to perform a method of any of Claims 42 to 50.

52. A computer program product executable on a computer to blur a left hand border and a right hand border next to a video displayed in a landscape orientation display of the computer, the video being a portrait orientation video, the computer program product executable on the computer to

(i) identify a respective portion of a portrait orientation video frame to be enlarged and blurred, for each respective border, and displaying the portrait orientation video frame in the landscape orientation display;

(ii) perform a Gaussian blur of a n x n pixel block of an identified respective portion, to produce a Gaussian blurred n x n pixel block, where n>=2 (e.g. n=2);

(iii) perform bilinear interpolation of the Gaussian blurred n x n pixel block, to produce a m x m pixel block, where m>= 2n (e.g. m= 2n);

(iv) display the m x m pixel block in a border of the landscape orientation display, at a position corresponding to the position of the n x n pixel block in the identified respective portion of the portrait orientation video frame;

(v) repeat (ii) to (iv) for all n x n pixel blocks in the identified respective portions of the portrait orientation video frame;

(vi) repeat (i) to (v) for all frames in the portrait orientation video.

53. The computer program product of Claim 52, executable on the computer to perform a method of any of Claims 42 to 50.

Blurring a region

54. A computer implemented method for blurring a region of a video displayed in a display, the method including the steps of:

(i) identifying a region of a video frame to be blurred, the region including a selected portion;

(ii) performing a Gaussian blur of a n x n pixel block of the selected portion, to produce a Gaussian blurred n x n pixel block, where n>=2 (e.g. n=2);

(iii) performing bilinear interpolation of the Gaussian blurred n x n pixel block, to produce a m x m pixel block, where m>= 2n (e.g. m= 2n);

(iv) storing the m x m pixel block in the region of the video frame to be blurred, at a position corresponding to the position of the n x n pixel block in the selected portion of the region of the video frame;

(v) repeating steps (ii) to (iv) for all n x n pixel blocks in the selected portion of the identified region of the video frame; and displaying the video frame including the blurred region;

(vi) repeating steps (i) to (v) for all frames in the video. 55. The method of Claim 54, wherein n=2.

56. The method of Claims 54 or 55, wherein m=2n.

57. The method of any of Claims 54 to 56, wherein the identified region of the video frame to be blurred is a rectangle, an ellipse, a square, a circle or a squircle.

58. The method of any of Claims 54 to 57, wherein the identified region of the video frame to be blurred is a vehicle number plate, or a person’s face.

59. The method of any of Claims 54 to 58, wherein the method includes tracking an object within a video (e.g. someone’s face), and blurring that object as it moves in the video.

60. The method of any of Claims 54 to 59, wherein the method is executed in realtime.

61. The method of any of Claims 54 to 60, wherein the method is executed in a browser environment.

62. The method of any of Claims 54 to 61, wherein the method is executed in javascript.

63. The method of any of Claims 54 to 62, wherein the method is used when executing on a smart TV, on a desktop computer, on a laptop computer, on a tablet computer or on a smartphone computer.

64. A computer terminal configured to perform a method of any of Claims 54 to 63.

65. A computer program product executable on a computer to blur a region of a video displayed in a display, the computer program product executable on the computer to:

(i) identify a region of a video frame to be blurred, the region including a selected portion;

(ii) perform a Gaussian blur of a n x n pixel block of the selected portion, to produce a Gaussian blurred n x n pixel block, where n>=2 (e.g. n=2);

(iii) perform bilinear interpolation of the Gaussian blurred n x n pixel block, to produce a m x m pixel block, where m>= 2n (e.g. m= 2n);

(iv) store the m x m pixel block in the region of the video frame to be blurred, at a position corresponding to the position of the n x n pixel block in the selected portion of the region of the video frame;

(v) repeat (ii) to (iv) for all n x n pixel blocks in the selected portion of the identified region of the video frame; and display the video frame including the blurred region;

(vi) repeat (i) to (v) for all frames in the video.

66. The computer program product of Claim 65, executable on the computer to perform a method of any of Claims 54 to 63.

Reducing bandwidth required for web page delivery

67. A computer implemented method of reducing bandwidth required for webpage delivery, the method including the steps of:

(i) analysing a first webpage for content which may be included in a different (e.g. future) webpage;

(ii) identifying content which may be included in the different (e.g. future) webpage;

(iii) storing at a server a unique identifier which identifies content, the identified content which may be included in the different web page, and the first web page;

(iv) the server serving a served first web page in response to a request for the first web page, the served first web page including the unique identifier and including the content corresponding to the unique identifier;

(v) analysing a second webpage, and identifying content in the second webpage that corresponds to the unique identifier, and storing the second web page and a relation between the unique identifier, the identified content and the second web page;

68. The method of Claim 67, wherein the server is a news website server, or a social media server.

69. The method of Claims 67 or 68, wherein the identified content is video, graphics or text.

70. The method of any of Claims 67 to 69, wherein in the analysis of graphics in step (ii), the graphics is analyzed using a grid, and each portion of the grid is given a respective reference id, to generate a unique identifier for each portion of the grid.

71. The method of any of Claims 67 to 70, wherein the server sends a javascript player to a user terminal, together with the first web page, the javascript player executable to display the first web page and the second web page on the display of the user terminal.

72. A computer system including an analysis computer and a server, the analysis computer communicatively connected to the server wherein

(i) the analysis computer is configured to analyse a first webpage for content which may be included in a different (e.g. future) webpage;

(ii) the analysis computer is configured to identify content which may be included in the different (e g. future) webpage;

(iv) the server is configured to serve a served first web page in response to a request for the first web page, the served first web page including the unique identifier and including the content corresponding to the unique identifier;

(v) the analysis computer is configured to analyse a second webpage, and to identify content in the second webpage that corresponds to the unique identifier, and to instruct the server to store the second web page and a relation between the unique identifier, the identified content and the second web page;

(vi) the server is configured to serve a served second web page in response to a request for the second web page, the served second web page including the unique identifier and not including the content corresponding to the unique identifier, wherein, subsequent to receiving the request for the second web page, the server is configured to only serve the content corresponding to the unique identifier upon receipt of a request for the content corresponding to the unique identifier.

73. A computer system of Claim 72, configured to perform a method of any of Claims 67 to 71

74. A computer program product executable on a computer to:

(i) analyse a first webpage for content which may be included in a different (e.g. future) webpage;

(ii) identify content which may be included in the different (e.g. future) webpage;

(iii) store a unique identifier which identifies content, the identified content which may be included in the different web page, and the first web page;

(iv) serve a served first web page in response to a request for the first web page, the served first web page including the unique identifier and including the content corresponding to the unique identifier;

(v) analyse a second webpage, and identify content in the second webpage that corresponds to the unique identifier, and store the second web and a relation between the unique identifier, the identified content, and the second web page;

(vi) serve a served second web page in response to a request for the second web page, the served second web page including the unique identifier and not including the content corresponding to the unique identifier, wherein, subsequent to receiving the request for the second web page, to only serve the content corresponding to the unique identifier upon receipt of a request for the content corresponding to the unique identifier.

75. A computer program product of Claim 74, configured to perform a method of any of Claims 67 to 71. 76. Computer implemented method of reducing bandwidth required for webpage delivery, the method including the steps of:

(i) a user terminal requesting a first webpage from a server;

(ii) the user terminal receiving from the server the first web page, the first web page including a content item and a unique identifier which identifies the content item;

(iii) the user terminal storing the first web page including the content item, and the unique identifier which identifies the content item in its cache;

(iv) the user terminal displaying the first webpage on a display of the user terminal;

(v) the user terminal requesting a second webpage from a server;

(vi) the user terminal receiving from the server a received second web page, the received second web page including the unique identifier which identifies the content item, the received second web page not including the content item;

(vii) the user terminal identifying the content item in its cache, using the unique identifier which identifies the content item, and the user terminal retrieving the content item from its cache;

(viii) the user terminal displaying the second webpage on a display of the user terminal, the displayed second webpage including the content item that was retrieved from its cache.

77. The method of Claim 76, wherein the user terminal includes a browser operable to communicate with the server.

78. The method of Claim 77, the browser executing a javascript program, to communicate with the server.

79. The method of any of Claims 77 to 78, wherein the browser receives j avascript (e.g. player) code from the server, together with the first web page.

80. The method of any of Claims 77 to 78, wherein the browser receives j avascript (e.g. player) code from the server, and executes the javascript (e g. player) code received from the server, to perform at least steps (iii), (vii) and (viii). 81. The method of Claim 76, wherein the user terminal includes an app operable to communicate with the server.

82. The method of any of Claims 76 to 81, wherein the content item is video, graphics or text.

83. The method of Claim 82, wherein the content item is video, which is a video clip from a larger video, e.g. in which the video is stored at the server in Blackbird format.

84. The method of any of Claims 76 to 83, wherein the user terminal is a smartphone, a tablet computer, a laptop, a desktop computer, or a smart TV.

85. A user terminal, configured to perform a method of any of Claims 76 to 84.

86. A computer program product, executable on the user terminal, to perform a method of any of Claims 76 to 84.

87. Computer implemented method of reducing bandwidth required for webpage delivery, the method including the steps of:

(i) a user terminal requesting a second webpage from a server;

(ii) the user terminal receiving from the server a received second web page, the received second web page including a unique identifier which identifies a content item, the received second web page not including the content item;

(iii) the user terminal searching for the content item in its cache, using the unique identifier;

(iv) the user terminal not identifying the content item in its cache, using the unique identifier, and in response, the user terminal requesting the content item from the server, using the unique identifier which identifies the content item;

(v) the user terminal receiving the content item from the server;

89. The method of Claim 88, the browser executing a javascript program, to communicate with the server.

90. The method of Claims 88 or 89, the browser executing j avascri pt (e.g. player) code, to perform at least steps (iii), (iv) and (vi).

91. The method of Claim 87, wherein the user terminal includes an app operable to communicate with the server.

92. The method of any of Claims 87 to 91, wherein the content item is video, graphics or text.

93. The method of Claim 92, wherein the content item is video, which is a video clip from a larger video, e.g. in which the video is stored at the server in Blackbird format.

94. The method of any of Claims 87 to 93, wherein the user terminal is a smartphone, a tablet computer, a laptop, a desktop computer, or a smart TV.

95. A user terminal, configured to perform a method of any of Claims 87 to 94.

96. A computer program product, executable on the user terminal, to perform a method of any of Claims 87 to 94.

Lower Energy Distribution

97. A computer-implemented method of low energy file distribution, the method including encrypting a video file, the video file including a compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the video file, wherein frames in level zero of the video file have the lowest temporal resolution, wherein content of the frames in level zero of the video file is displayable when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the video file is displayable when decompressed only using content of at least one or more frames not in level x of the frames of the video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy, the method including the steps of

(i) accessing the video file including the compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the video file, wherein frames in level zero of the video file have the lowest temporal resolution, wherein content of the frames in level zero of the video file is displayable when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the video file is displayable when decompressed only using content of at least one or more frames not in level x of the frames of the video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy;

(ii) encrypting the frames in level zero of the video file;

(iii) caching the parts of the video file that have not been encrypted at a proxy;

(iv) transmitting to a user device the parts of the video file that have been encrypted, and transmitting from the proxy to the user device the parts of the video file that have not been encrypted.

98. The method of Claim 97, the method including the step of assembling a file comprising a partially encrypted version of the video file, the partially encrypted version of the video file including level zero frames including the encrypted frames in level zero of the video file from step (ii), and further including the levels of frames of the video file that do not include level zero of the frames of the video file.

99. The method of Claim 98, further including storing the assembled file.

100. The method of any of Claims 97 to 99, wherein the lowest level, level zero, of the hierarchy are key frames. 101. The method of Claim 100, wherein level one comprises delta frames, which are the deltas between the key frames.

102. The method of Claim 101, wherein level two comprises delta frames, which are the deltas between the level one frames.

103. The method of Claims 101 or 102, wherein the delta frames have a chain of dependency back to the key frames.

104. The method of any of Claims 97 to 103, wherein decoding each level of content relies on all lower levels having been decoded, with an adaptive code where codewords depend on previous data.

105. The method of any of Claims 97 to 104, wherein for the non-encrypted video file portions, compression uses transition tables for encoding and decoding, and to perform decoding successfully for any given level, you need to have decoded all the lower levels of lower temporal resolution.

106. The method of any of Claims 97 to 105, wherein compressed level zero frames comprise 20% or less of the total compressed data of all levels, or 10% or less of the total compressed data of all levels, or 5% or less of the total compressed data of all levels.

107. The method of any of Claims 97 to 106, in which non-zero level frames are not encrypted.

108. The method of any of Claims 97 to 107, wherein compressed data in the nonzero level frames is 80% or more of the total compressed data of all levels, or 90% or more of the total compressed data of all levels, or 95% or more of the total compressed data of all levels.

109. The method of any of Claims 97 to 108, in which if a file size of an encrypted level zero frame is less than a predetermined size, then the corresponding level one frame is also encrypted.

110. The method of Claim 109, wherein the predetermined size is lOkB, or less.

111. The method of any of Claims 97 to 110, in which the compressed format structure is an MPEG structure.

112. The method of any of Claims 97 to 111, in which the compressed format structure is a Blackbird codec structure.

113. The method of any of Claims 97 to 112, in which the encryption uses a symmetric key cryptography.

114. The method of any of Claims 97 to 113, in which the encryption uses an asymmetric key cryptography.

115. The method of any of Claims 97 to 114, in which the user device is a smartphone, a mobile phone, a tablet computer, a laptop, a desktop computer, a mobile device, or a smart TV.

116. The method of any of Claims 97 to 115, in which the non-encrypted data is sent together with some hashed data, where the hashed data is generated using a hash function of at least some of the non-encrypted data, so that the non-encrypted data may be authenticated using the hashed data.

117. The method of any of Claims 97 to 116, in which the frames are provided in the (e.g. Blackbird) codec, for only some of the lower levels (e.g. level zero and level one) for free, and require payment for the frames in the higher levels (e.g. level two to level six).

118. The method of any of Claims 97 to 117, in which a live broadcast by video (e.g. election results) is provided at a lower frame rate, by providing the frames in the (e.g. Blackbird) codec, for only some of the lower levels (e.g. level zero and level one) for free, and not sending the frames in the higher levels (e.g. level two to level six), to reduce transmission bandwidth, to reduce energy usage, and to reduce transmission costs.

119. The method of any of Claims 97 to 118, in which an option to interpolate between frames is provided, to make playback smoother.

120. A computer system configured to perform a method of any of Claims 97 to 119.

121. A computer program product executable on a computer to perform a method of any of Claims 97 to 119.

122. A computer-implemented method of low energy file distribution, the distributed file including a partially encrypted video file of a video file, the video file including a compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the video file, wherein frames in level zero of the video file have the lowest temporal resolution, wherein content of the frames in level zero of the video file is displayable when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the video file is displayable when decompressed only using content of at least one or more frames not in level x of the frames of the video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy, wherein the frames in level zero of the video file are encrypted and stored at a server, the method including the steps of:

(i) caching at a proxy the parts of the video file that have not been encrypted;

123. The method of Claim 122, including a method of any of Claims 97 to 119. Il l

124. A computer system configured to perform a method of any of Claims 122 to

123.

125. A computer program product executable on a computer to perform a method of any of Claims 122 to 123.

126. A computer-implemented method of low energy file distribution, the method including decrypting a partially encrypted video file to produce a decrypted video file, the decrypted video file including a compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the decrypted video file, wherein frames in level zero of the decrypted video file have the lowest temporal resolution, wherein content of the frames in level zero of the decrypted video file is displayable when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the decrypted video file is displayable when decompressed only using content of at least one or more frames not in level x of the frames of the decrypted video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy, the method including the steps of:

(i) a user device receiving the parts of the video file that have been encrypted, and the user device receiving from a proxy the parts of the video file that have not been encrypted, to receive the partially encrypted video file;

(ii) the user device processing the partially encrypted video file including the compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the partially encrypted video file, wherein frames in level zero of the partially encrypted video file have the lowest temporal resolution, wherein content of the frames in level zero of the partially encrypted video file is displayable after decryption when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the partially encrypted video file is displayable after decryption when decompressed only using content of at least one or more frames not in level x of the frames of the decrypted video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy;

(iii) the user device decrypting the frames in level zero of the partially encrypted video file;

(iv) the user device assembling a decrypted video file, the decrypted video file including level zero frames including the decrypted frames in level zero of the partially encrypted video file from step (ii), and further including the levels of frames of the partially encrypted video file that do not include level zero of the frames of the partially encrypted video file;

(v) the user device displaying the decrypted video file on a screen of the user device.

127. The method of Claim 126, further including storing the file assembled in step (iv).

128. The method of any of Claims 126 to 127, wherein for the decrypted video file, decoding each level of content relies on all lower levels having been decoded, with an adaptive code where codewords depend on previous data.

129. The method of any of Claims 126 to 128, wherein for the non-encrypted video file, compression uses transition tables for encoding and decoding, and to perform decoding successfully for any given level, you need to have decoded all the lower levels of lower temporal resolution.

130. The method of any of Claims 126 to 129, wherein level zero frames comprise 20% or less of the total (e.g. compressed) data of all levels, or 10% or less of the total (e.g. compressed) data of all levels, or 5% or less of the total (e.g. compressed) data of all levels.

131. The method of any of Claims 126 to 130, wherein the data in the non-zero level frames is 80% or more of the total (e.g. compressed) data of all levels, or 90% or more of the total (e.g. compressed) data of all levels, or 95% or more of the total (e.g. compressed) data of all levels.

132. The method of any of Claims 126 to 131, in which if a level one frame of the partially encrypted video fde is also encrypted, then it is also decrypted. 133. The method of any of Claims 126 to 131, in which the non-zero levels of the partially encrypted video fde are not encrypted.

134. The method of any of Claims 126 to 133, in which the compressed format structure is an MPEG structure.

135. The method of any of Claims 126 to 133, in which the compressed format structure is a Blackbird codec structure.

136. The method of any of Claims 126 to 135, wherein the decryption uses a symmetric key cryptography.

137. The method of any of Claims 126 to 136, wherein the decryption uses an asymmetric key cryptography.

138. The method of any of Claims 126 to 137, wherein the user device is a smartphone, a mobile phone, a tablet computer, a laptop, a desktop computer, a mobile device or a smart TV.

139. The method of Claim 138, wherein the user device is a smart TV which includes a web browser, and the web browser is executable to play a video, which is received in the form of encrypted key frames, and non-encrypted non-key frames.

140. The method of Claim 138, wherein the user device is a mobile device which includes a web browser, and the web browser is executable to play a video, which is received in the form of encrypted key frames, and non-encrypted non-key frames.

141. The method of Claims 139 or 140, wherein the browser playing the video including encrypted key frames, and non-encrypted non-key frames is informed which frames are encrypted, and which frames are non-encrypted.

142. The method of any of Claims 126 to 141, wherein the user (e.g. mobile) device includes an application program, and the application program is executable to play a video, which is received in the form of encrypted key frames, and nonencrypted non-key frames.

143. The method of any of Claims 126 to 142, wherein the user (e.g. mobile) device plays back at a lower frame rate in (e.g. Blackbird) codecs to reduce CO2 emissions in power generation, as only displayed frames are downloaded and decompressed.

144. The method of any of Claims 126 to 142, wherein the user (e.g. mobile) device includes an option to interpolate between frames to make playback smoother.

145. The method of any of Claims 126 to 144, wherein the non-encrypted data is sent together with some hashed data, where the hashed data is generated using a hash function of at least some of the non-encrypted data, so that the non-encrypted data may be authenticated using the hashed data.

146. A computer system configured to perform a method of any of Claims 126 to 145.

147. A computer program product executable on a processor to perform a computer-implemented method of decrypting a partially encrypted video file of any of Claims 126 to 145.

148. A video file encryption apparatus including a processor configured to perform a computer-implemented method of encrypting a video file of any of Claims 97 to 119.

149. The video file encryption apparatus of Claim 148, wherein the video file encryption apparatus is a chip.

150. A video file decryption apparatus including a processor configured to perform a computer-implemented method of decrypting a partially encrypted video file of any of Claims 126 to 145. 151. The video file decryption apparatus of Claim 150, wherein the video file decryption apparatus is a chip.

152. A video file encryption and decryption apparatus, including a processor configured to perform a computer-implemented method of encrypting a video file of any of Claims 97 to 119, and wherein the processor is configured to perform a computer-implemented method of decrypting a partially encrypted video file of any of Claims 1 6 to 145.

153. The video file encryption and decryption apparatus of Claim 152, wherein the video file encryption and decryption apparatus is a chip.

Video Analysis e.g. including Artificial Intelligence (Al) Analysis

154. A computer implemented method of identifying significant images in a video file, the video file including source images, the method including the steps of

(i) generating a plurality of token images, each being a digitized representation of a scaled down version of a respective source image in the video file, by transforming said source images into token images;

(ii) creating an arrangement of said token images in a continuous band of token images arranged adjacently as a function of time in the video;

(iii) transforming the continuous band of token images, each token image having a multi pixel width and a multi pixel height into at least one new squashed band by squashing the token images in a continuous band of token images in a longitudinal direction only, by one or more factors using pixel averaging, to create said at least one new squashed band of squashed token images, wherein each individual squashed token image is reduceable to a maximum of a single pixel width and a multi-pixel height;

(iv) analysing the new squashed band of squashed token images, to identify a squashed token image which differs from an adjacent squashed token image so as to satisfy a threshold criterion, wherein the adjacent squashed token image precedes in time the identified squashed token image;

155. The method of Claim 154, wherein steps (iv) and (v) are executed in a browser on a client device (e.g. smartphone, tablet computer, laptop, desktop computer, smart TV), or in an app on a client device (e.g. smartphone, tablet computer, laptop, desktop computer, smart TV).

156. The method of Claim 155, wherein steps (iv) and (v) are executed in javascript in the browser on the client device.

157. The method of any of Claims 154 to 156, wherein tasks (e.g. Al tasks) (e.g. video cut detection, face recognition, player identification in sport, vehicle detection from a drone) are performed following this client-side processing.

158. The method of any of Claims 154 to 157, wherein step (iv) includes without using analysis (e.g. Al analysis) to search an ingested video on a server.

159. The method of Claim 154, wherein steps (iv) and (v) are executed at a server.

160. The method of any of Claims 154 to 159, wherein the threshold criterion is that the identified squashed token image contains pixels of a selected colour (e.g. red) above a predetermined threshold, and the adjacent squashed token image does not contain pixels of the selected colour above the predetermined threshold.

161. The method of any of Claims 154 to 159, wherein the threshold criterion is that the identified squashed token image contains an increase in pixels of a selected colour (e.g. red) above a predetermined threshold, relative to the adjacent squashed token image.

162. The method of Claims 160 or 161, wherein the selected colour is black, white, red, blue or green. 163. The method of any of Claims 154 to 162, wherein the threshold criterion is that the identified squashed token image is black, and the adjacent squashed token image is not black.

164. The method of any of Claims 154 to 163, wherein the video is at least 2 minutes in duration.

165. The method of any of Claims 154 to 164, wherein the threshold criterion includes that the identified squashed token image’s content differs from the adjacent squashed token image’s content, by a threshold amount.

166. The method of any of Claims 154 to 165, wherein the threshold criterion includes that the identified squashed token image’s audio content differs from the adjacent squashed token image’s audio content, by a threshold amount (e.g. indicating a loud cheer).

167. The method of any of Claims 154 to 166, wherein the step (iv) includes using artificial intelligence analysis.

168. The method of any of Claims 154 to 167, wherein an analysis (e.g. Al analysis) is performed of content viewed by a viewer in the past, and the analysis (e.g. Al analysis) then searches for similar content within a library of ingested video content, and the content identified by the analysis (e.g. Al analysis) is offered to the viewer, so the viewer can select what content to view.

169. The method of any of Claims 154 to 168, wherein step (iv) is repeated until all instances in the video are identified in which a squashed token image differs from an adjacent squashed token image so as to satisfy a threshold criterion, wherein the adjacent squashed token image precedes in time the identified squashed token image.

170. The method of Claim 169, including providing selectable options in the user interface, the selectable options selectable to play the video starting from an image in the video corresponding to an identified squashed token image, or starting from the image in the video corresponding to an adjacent squashed token image, for all identified instances in the video.

171. The method of any of Claims 154 to 170, in which the search (e.g. Al search) of the navigation tool finds candidate frames as described, and then the candidate frames are investigated at higher resolution, e.g at 64x36 pixels, and/or using original frames, to confirm that a further threshold criterion is met.

172. The method of any of Claims 154 to 171, in which the threshold criterion is user selectable.

173. The method of any of Claims 154 to 172, in which when the analysis (e.g. Al analysis) of the navigation tool is performed, the analysis (e.g. Al analysis) program is configured such that, if a notable item in the navigation tool is identified, the analysis (e.g. Al analysis) program sends an alert, such as to a user device such as to a mobile phone.

174. A system including a server and a client device, the system configured to identify significant images in a video file, the video file including source images, wherein:

(i) the server is configured to generate a plurality of token images, each being a digitized representation of a scaled down version of a respective source image in the video file, by transforming said source images into token images;

(ii) the server is configured to create an arrangement of said token images in a continuous band of token images arranged adjacently as a function of time in the video;

(iii) the server is configured to transform the continuous band of token images, each token image having a multi pixel width and a multi pixel height into at least one new squashed band by squashing the token images in a continuous band of token images in a longitudinal direction only, by one or more factors using pixel averaging, to create said at least one new squashed band of squashed token images, wherein each individual squashed token image is reduceable to a maximum of a single pixel width and a multi-pixel height; (iv) the server is configured to send the new squashed band of squashed token images to the client device;

(v) the client device is configured to analyse the new squashed band of squashed token images, to identify a squashed token image which differs from an adjacent squashed token image so as to satisfy a threshold criterion, wherein the adjacent squashed token image precedes in time the identified squashed token image;

175. The system of Claim 174, configured to perform a method of any of Claims 154 to 173.

176. A computer program product executable on a client device, the client device forming part of a system including the client device and a server, the system configured to identify significant images in a video file, the video file including source images, wherein:

(ii) the server is configured to create an arrangement of said token images in a continuous band of token images arranged adjacently as a function of time in the video;

(iv) the server is configured to send the new squashed band of squashed token images to the client device; wherein the computer program product is executable on the client device:

(a) to analyse the new squashed band of squashed token images, to identify a squashed token image which differs from an adjacent squashed token image so as to satisfy a threshold criterion, wherein the adjacent squashed token image precedes in time the identified squashed token image; and

177. The computer program product of Claim 176, configured to perform a method of any of Claims 154 to 173.

Description:

COMPUTER-IMPLEMENTED METHODS OF BLURRING A DIGITAL

IMAGE; COMPUTER TERMINALS AND COMPUTER PROGRAM

PRODUCTS

BACKGROUND OF THE INVENTION

1. Field of the Invention

The field of the invention relates to computer-implemented methods of blurring a digital image, including computer-implemented methods of blurring digital video images, and to related computer terminals and computer program products.

2. Technical Background

Blur is a digital image or a digital video effect, and can be used to obfuscate information in a video. It can also be used to focus attention on a non-blurred part of an image, because a viewer tends not to look at the blurred part of the image.

For blurring video, or for blurring parts of a video image, there are many known types of blurs. But most known types of blurs produce artefacts of some sort, such as artefacts resembling lens flare. Blurring may be computationally cumbersome.

3. Discussion of Related Art

EP3296952B1 discloses a method for blurring a virtual object in a video, said video being captured by a device comprising at least one motion sensor, said method being performed by said device, said method comprising:

- obtaining an initial image of said video captured by said device at an initial device pose;

- obtaining a current image of said video captured by said device at a current device pose;

- estimating an apparent motion vector of said virtual object between said initial image and said current image, based on a motion of said device; - blurring said virtual object by filtering at least a part of the current image based on said apparent motion vector; in which said motion of said device is obtained from an angular rate generated by at least one motion sensor of said device, moving from said initial device pose to said current device pose.

EP1494174B1 discloses a computerized method of generating blur in a 2D image representing a 3D scene, on the basis of its associated distance image assigning a depth to the pixels of the image, comprising the following steps:

- partitioning of the 2D image into 2D zones as a function of the depths assigned to the pixels, by grouping neighbouring pixels which assigned depth belongs to a depth zone, the scene having been partitioned into predefined depth zones corresponding to depth ranges, a depth range being defined by a minimum and maximum depth, calculation of blur around a 2D zone boundary, by convolution for the pixels of the 2D zone on either side of the boundary with a neighbouring other zone, the size of the convolution kernel being dependent on the depth of the pixels of the 2D zone processed and in that the 2D zones are processed sequentially from the furthest away to the closest, the calculation of blur of a 2D current zone, for the 2D zone boundaries, being carried out only on the boundaries of this 2D zone with the 2D zone previously processed, so as to provide an intermediate image, this intermediate image being that utilized for the convolution, during the processing of the next 2D zone.

SUMMARY OF THE INVENTION

According to a first aspect of the invention, there is provided a computer- implemented method of blurring a digital image, the digital image comprising pixels, the method including the steps of

(i) processing the digital image, using original pixel blocks (e.g. 8x8 pixel blocks) to determine in which original pixel blocks the image satisfies a criterion of being smoothly varying;

(iv) assembling a blurred digital image, using the corresponding blurred pixel blocks produced in steps (ii) and (iii).

An advantage is a low energy and highly effective method of blurring a digital image.

The method may be one in which in step (iii), in a first pass, a one-dimensional kernel is used to Gaussian blur pixels for an original pixel block in a first direction, to produce an intermediate pixel block, and in a second pass, the same one-dimensional kernel is used to Gaussian blur pixels for the intermediate pixel block in a direction orthogonal to the first direction. An advantage is a low energy and highly effective method of blurring a digital image.

The method may be one in which the Gaussian blur is produced for pixels within a pixel radius, e.g. a pixel radius of from 4 to 12 pixels. An advantage is a low energy and highly effective method of blurring a digital image.

The method may be one in which the criterion of being smoothly varying includes that an area (e.g. a square area) is smoothly varying as far as the pixel radius, in which the area may extend outside a pixel block. This has the advantage that distant objects don't impinge on the (e.g. 8x8 pixels) block when they are blurred.

The method may be one in which the method is used when executing in a browser. An advantage is a low energy and highly effective method of blurring a digital image.

The method may be one in which the method is implemented in javascript. An advantage is a low energy and highly effective method of blurring a digital image.

The method may be one in which the method is used when executing on a smart TV, on a desktop computer, on a laptop computer, on a tablet computer or on a smartphone computer. An advantage is a low energy and highly effective method of blurring a digital image.

The method may be one in which the method is used for processing video. An advantage is a low energy and highly effective method of blurring digital images in video.

The method may be one in which the method is used for processing video in real-time. An advantage is a low energy and highly effective method of blurring digital images in video.

The method may be one in which the pixels are represented by pixel values.

The method may be one in which the criterion of being smoothly varying includes that the expression a+d-b-c is less than a predefined percentage of a, b, c or d, where a, b, c and d are pixel values at the corners of the pixel block, and where a and d are pixel values of opposite corners of the pixel block.

The method may be one in which the predefined percentage is 10%, or 5%, or 3%, or 2% or 1%. The method may be one in which the pixel blocks are 4x4 pixel blocks, or 8x8 pixel blocks, or at least 4x4 pixel blocks. An advantage is a low energy and highly effective method of blurring a digital image.

The method may be one in which 8x8 blocks are used for the bilinear interpolation blur, and a pixel radius of from 4 to 12 pixels is used for the Gaussian blur portions of the blur. An advantage is a low energy and highly effective method of blurring a digital image.

The method may be one in which the pixel blocks are 16x16 pixel blocks, or pixel blocks in the range from 4x4 to 16x16. An advantage is a low energy and highly effective method of blurring a digital image.

The method may be one in which the pixel values at the comers of the pixel block are obtained by averaging nearby pixel values, e.g. by averaging one pixel further (e.g. averaging the pixel and its eight nearest neighbours), or by averaging two pixels further (e.g. averaging the pixel and its 24 nearest neighbours).

The method may be one in which the criterion of being smoothly varying includes that the magnitude of a gradient change across a boundary between a pixel block identified as potentially being smoothly varying and a pixel block that is identified as being not smoothly varying is less than a predefined amount. An advantage is a low energy and highly effective method of blurring a digital image.

The method may be one in which the criterion of being smoothly varying includes that a pixel block identified as potentially being smoothly varying does not include a sharp edge.

The method may be one in which block edges are obfuscated, if some jagged block edges are produced.

The method may be one including the step of storing the assembled image. The method may be one including the step of displaying the assembled image on a display.

According to a second aspect of the invention, there is provided a computer terminal configured to blur a digital image, the digital image comprising pixels, the computer terminal configured to:

(i) process the digital image, using original pixel blocks (e.g. 8x8 pixel blocks) to determine in which original pixel blocks the image satisfies a criterion of being smoothly varying;

(iv) assemble a blurred digital image, using the corresponding blurred pixel blocks produced in (ii) and (iii). An advantage is a computer terminal configured to perform a low energy and highly effective method of blurring a digital image.

The computer terminal of may be configured to perform a method of any aspect of the first aspect of the invention.

According to a third aspect of the invention, there is provided a computer program product, the computer program product executable on a computer terminal to blur a digital image, the digital image comprising pixels, the computer program product executable on the computer terminal to:

(i) process the digital image, using original pixel blocks (e.g. 8x8 pixel blocks) to determine in which original pixel blocks the image satisfies a criterion of being smoothly varying;

(iv) assemble a blurred digital image, using the corresponding blurred pixel blocks produced in (ii) and (iii). An advantage is a computer program product executable to perform a low energy and highly effective method of blurring a digital image.

The computer program product may be executable to perform a method of any aspect of the first aspect of the invention.

According to a fourth aspect of the invention, there is provided a computer- implemented method of identifying portions of a digital image suitable for presenting advertising, the digital image comprising pixels, the method including the steps of:

(i) processing the digital image, using original pixel blocks (e.g. 8x8 pixel blocks) to determine in which original pixel blocks the image satisfies a criterion of being smoothly varying;

(ii) receiving an advertisement and a size of the advertisement, to be inserted into the digital image;

(iv) inserting the advertisement into the uninterrupted area of pixel blocks which satisfy the criterion of being smoothly varying, which is large enough to receive the minimum size of the advertisement. An advantage is a low energy and highly effective method of identifying portions of a digital image which are smoothly varying. The field of this aspect of the invention is methods of identifying portions of a digital image which are smoothly varying.

The method may include checking that the colours of the advertisement differ from the colours of the uninterrupted area of pixel blocks which satisfy the criterion of being smoothly varying, which is large enough to receive the size of the advertisement, before performing step (iv). An advantage is a low energy and highly effective method of identifying portions of a digital image which are smoothly varying.

The method may be used when executing in a browser.

The method may be implemented in javascript.

The method may be used when executing on a smart TV, on a desktop computer, on a laptop computer, on a tablet computer or on a smartphone computer.

The method may be used for processing video, e.g. in real time.

The method may be one wherein the pixels are represented by pixel values.

The method may be one wherein the criterion of being smoothly varying includes that the expression a+d-b-c is less than a predefined percentage of a, b, c or d, where a, b, c and d are pixel values at the corners of the pixel block, and where a and d are pixel values of opposite comers of the pixel block. An advantage is a low energy and highly effective method of identifying portions of a digital image which are smoothly varying.

The method may be one wherein the predefined percentage is 10%, or 5%, or 3%, or 2% or 1%.

The method may be one wherein the pixel blocks are 4x4 pixel blocks, or 8x8 pixel blocks, or at least 4x4 pixel blocks.

The method may be one wherein the pixel blocks are 16x16 pixel blocks, or in the range of 4x4 to 16x16 pixel blocks.

The method may be one wherein the pixel values at the corners of the pixel block are obtained by averaging nearby pixel values, e.g. by averaging one pixel further (e.g. averaging the pixel and its eight nearest neighbours), or by averaging two pixels further (e.g. averaging the pixel and its 24 nearest neighbours). The method may be one wherein the criterion of being smoothly varying includes that the magnitude of a gradient change across a boundary between a pixel block identified as potentially being smoothly varying and a pixel block that is identified as being not smoothly varying is less than a predefined amount. An advantage is a low energy and highly effective method of identifying portions of a digital image which are smoothly varying.

The method may be one wherein the criterion of being smoothly varying includes that a pixel block identified as potentially being smoothly varying does not include a sharp edge.

According to a fifth aspect of the invention, there is provided a computer terminal configured to perform a method of any aspect of the fourth aspect of the invention.

According to a sixth aspect of the invention, there is provided a computer program product executable on a computer to perform a method of any aspect of the fourth aspect of the invention.

According to a seventh aspect of the invention, there is provided a computer implemented method for blurring a left hand border and a right hand border next to a video displayed in a landscape orientation display, the video being a portrait orientation video, the method including the steps of:

(ii) performing a Gaussian blur of a n x n pixel block of an identified respective portion, to produce a Gaussian blurred n x n pixel block, where n>=2 (e.g. n=2);

(iii) performing bilinear interpolation of the Gaussian blurred n x n pixel block, to produce a m x m pixel block, where m>= 2n (e.g. m= 2n);

(v) repeating steps (ii) to (iv) for all n x n pixel blocks in the identified respective portions of the portrait orientation video frame;

(vi) repeating steps (i) to (v) for all frames in the portrait orientation video. An advantage is a low energy and highly effective method of blurring a left hand border and a right hand border next to a video displayed in a landscape orientation display, the video being a portrait orientation video. The field of this aspect of the invention is methods for blurring borders next to a video displayed in a display.

The method may be one wherein step (v) includes storing a landscape orientation video frame including the portrait orientation video frame, a blurred left hand border and a blurred right hand border, and wherein step (vi) includes storing a video file comprising the stored landscape orientation video frames.

The method may be one wherein n=2.

The method may be one wherein m=2n.

The method may be one in which computer memory used to store the blurred borders is first used to store an identified respective portion of a portrait orientation video frame to be enlarged and blurred, and then the results of step (iii) are used to overwrite the memory used to store the blurred borders as the process progresses, so no additional workspace outside the memory used to store the blurred borders is required. An advantage is reduced memory usage.

The method may be one in which the method is performed in real-time, on a client device, which is displaying the portrait orientation video, the blurred left hand border and the blurred right hand border on its landscape orientation display.

The method may be one wherein the method is used when executing on a smart TV, on a desktop computer, on a laptop computer, on a tablet computer or on a smartphone computer.

The method may be one wherein the method is executed in a browser environment.

The method may be one wherein the method is executed in javascript.

The method has the advantages of being a low energy and highly effective method of blurring a left hand border and a right hand border next to a video displayed in a landscape orientation display, the video being a portrait orientation video.

According to an eighth aspect of the invention, there is provided a computer terminal configured to perform a method of any aspect of the seventh aspect of the invention. According to a ninth aspect of the invention, there is provided a computer program product executable on a computer to blur a left hand border and a right hand border next to a video displayed in a landscape orientation display of the computer, the video being a portrait orientation video, the computer program product executable on the computer to

(ii) perform a Gaussian blur of a n x n pixel block of an identified respective portion, to produce a Gaussian blurred n x n pixel block, where n>=2 (e.g. n=2);

(iii) perform bilinear interpolation of the Gaussian blurred n x n pixel block, to produce a m x m pixel block, where m>= 2n (e.g. m= 2n);

(v) repeat (ii) to (iv) for all n x n pixel blocks in the identified respective portions of the portrait orientation video frame;

(vi) repeat (i) to (v) for all frames in the portrait orientation video. An advantage is a low energy and highly effective method of blurring a left hand border and a right hand border next to a video displayed in a landscape orientation display, the video being a portrait orientation video.

The computer program product may be executable on the computer to perform a method of any aspect of the seventh aspect of the invention.

According to a tenth aspect of the invention, there is provided a computer implemented method for blurring a region of a video displayed in a display, the method including the steps of:

(i) identifying a region of a video frame to be blurred, the region including a selected portion;

(ii) performing a Gaussian blur of a n x n pixel block of the selected portion, to produce a Gaussian blurred n x n pixel block, where n>=2 (e.g. n=2); (iii) performing bilinear interpolation of the Gaussian blurred n x n pixel block, to produce a m x m pixel block, where m>= 2n (e.g. m= 2n);

(v) repeating steps (ii) to (iv) for all n x n pixel blocks in the selected portion of the identified region of the video frame; and displaying the video frame including the blurred region;

(vi) repeating steps (i) to (v) for all frames in the video. An advantage is a low energy and highly effective method of blurring a region of a video. The field of this aspect of the invention is computer implemented methods for blurring a region of a video displayed in a display.

The method may be one wherein n=2.

The method may be one wherein m=2n.

The method may be one wherein the identified region of the video frame to be blurred is a rectangle, an ellipse, a square, a circle, or a squircle.

The method may be one wherein the identified region of the video frame to be blurred is a vehicle number plate, or a person’s face.

The method may be one wherein the method includes tracking an object within a video (e.g. someone’s face), and blurring that object as it moves in the video.

The method may be one wherein the method is executed in real-time.

The method may be one wherein the method is executed in a browser environment.

The method may be one wherein the method is executed in javascript.

The method may be one wherein the method is used when executing on a smart TV, on a desktop computer, on a laptop computer, on a tablet computer or on a smartphone computer.

The method has an advantage of being a low energy and highly effective method of blurring a region of a video.

According to an eleventh aspect of the invention, there is provided a computer terminal configured to perform a method of any aspect of the tenth aspect of the invention. According to a twelfth aspect of the invention, there is provided a computer program product executable on a computer to blur a region of a video displayed in a display, the computer program product executable on the computer to:

(i) identify a region of a video frame to be blurred, the region including a selected portion;

(ii) perform a Gaussian blur of a n x n pixel block of the selected portion, to produce a Gaussian blurred n x n pixel block, where n>=2 (e.g. n=2);

(iii) perform bilinear interpolation of the Gaussian blurred n x n pixel block, to produce a m x m pixel block, where m>= 2n (e.g. m= 2n);

(v) repeat (ii) to (iv) for all n x n pixel blocks in the selected portion of the identified region of the video frame; and display the video frame including the blurred region;

(vi) repeat (i) to (v) for all frames in the video. An advantage is providing a low energy and highly effective method of blurring a region of a video.

The computer program product may be one executable on the computer to perform a method of any aspect of the tenth aspect of the invention.

According to a thirteenth aspect of the invention, there is provided a computer implemented method of reducing bandwidth required for webpage delivery, the method including the steps of:

(i) analysing a first webpage for content which may be included in a different (e.g. future) webpage;

(ii) identifying content which may be included in the different (e.g. future) webpage;

(iii) storing at a server a unique identifier which identifies content, the identified content which may be included in the different web page, and the first web page;

(vi) the server serving a served second web page in response to a request for the second web page, the served second web page including the unique identifier and not including the content corresponding to the unique identifier, wherein, subsequent to receiving the request for the second web page, the server only serves the content corresponding to the unique identifier upon receipt of a request for the content corresponding to the unique identifier. An advantage is reducing bandwidth required for webpage delivery. An advantage is reducing energy required for webpage delivery. The field of this aspect of the invention is computer implemented methods of reducing bandwidth required for webpage delivery.

The method may be one wherein the server is a news website server, or a social media server.

The method may be one wherein the identified content is video, graphics or text.

The method may be one wherein in the analysis of graphics in step (ii), the graphics is analyzed using a grid, and each portion of the grid is given a respective reference id, to generate a unique identifier for each portion of the grid.

The method may be one wherein the server sends a javascript player to a user terminal, together with the first web page, the javascript player executable to display the first web page and the second web page on the display of the user terminal.

The method has the advantages of reducing bandwidth required for webpage delivery, and of reducing energy required for webpage delivery.

According to a fourteenth aspect of the invention, there is provided a computer system including an analysis computer and a server, the analysis computer communicatively connected to the server wherein

(i) the analysis computer is configured to analyse a first webpage for content which may be included in a different (e.g. future) webpage;

(ii) the analysis computer is configured to identify content which may be included in the different (e g. future) webpage;

(iii) the server is configured, in response to (ii), to store a unique identifier which identifies content, the identified content which may be included in the different web page, and the first web page; (iv) the server is configured to serve a served first web page in response to a request for the first web page, the served first web page including the unique identifier and including the content corresponding to the unique identifier;

The computer system may be configured to perform a method of any aspect of the thirteenth aspect of the invention.

According to a fifteenth aspect of the invention, there is provided a computer program product executable on a computer to:

(i) analyse a first webpage for content which may be included in a different (e.g. future) webpage;

(ii) identify content which may be included in the different (e.g. future) webpage;

(iii) store a unique identifier which identifies content, the identified content which may be included in the different web page, and the first web page;

The computer program product may be configured to perform a method of any aspect of the thirteenth aspect of the invention.

According to a sixteenth aspect of the invention, there is provided a computer implemented method of reducing bandwidth required for webpage delivery, the method including the steps of:

(i) a user terminal requesting a first webpage from a server;

(ii) the user terminal receiving from the server the first web page, the first web page including a content item and a unique identifier which identifies the content item;

(iii) the user terminal storing the first web page including the content item, and the unique identifier which identifies the content item in its cache;

(iv) the user terminal displaying the first webpage on a display of the user terminal;

(v) the user terminal requesting a second webpage from a server;

(vii) the user terminal identifying the content item in its cache, using the unique identifier which identifies the content item, and the user terminal retrieving the content item from its cache;

(viii) the user terminal displaying the second webpage on a display of the user terminal, the displayed second webpage including the content item that was retrieved from its cache. An advantage is reducing bandwidth required for webpage delivery. An advantage is reducing energy required for webpage delivery. The field of this aspect of the invention is computer implemented methods of reducing bandwidth required for webpage delivery.

The method may be one wherein the user terminal includes a browser operable to communicate with the server.

The method may be one in which the browser executes a javascript program, to communicate with the server.

The method may be one wherein the browser receives javascript (e.g. player) code from the server, together with the first web page.

The method may be one wherein the browser receives javascript (e.g. player) code from the server, and executes the javascript (e.g. player) code received from the server, to perform at least steps (iii), (vii) and (viii).

The method may be one wherein the user terminal includes an app operable to communicate with the server.

The method may be one wherein the content item is video, graphics or text

The method may be one wherein the content item is video, which is a video clip from a larger video, e.g. in which the video is stored at the server in Blackbird format.

The method may be one wherein the user terminal is a smartphone, a tablet computer, a laptop, a desktop computer, or a smart TV.

The method has the advantages of reducing bandwidth required for webpage delivery, and reducing energy required for webpage delivery.

According to a seventeenth aspect of the invention, there is provided a user terminal, configured to perform a method of any aspect of the sixteenth aspect of the invention. According to an eighteenth aspect of the invention, there is provided a computer program product, executable on the user terminal, to perform a method of any aspect of the sixteenth aspect of the invention.

According to a nineteenth aspect of the invention, there is provided a computer implemented method of reducing bandwidth required for webpage delivery, the method including the steps of:

(i) a user terminal requesting a second webpage from a server;

(iii) the user terminal searching for the content item in its cache, using the unique identifier; (iv) the user terminal not identifying the content item in its cache, using the unique identifier, and in response, the user terminal requesting the content item from the server, using the unique identifier which identifies the content item;

(v) the user terminal receiving the content item from the server;

(vi) the user terminal displaying the second webpage on a display of the user terminal, the displayed second webpage including the content item that was received from the server. The method has the advantages of assisting with reducing bandwidth required for webpage delivery, and assisting with reducing energy required for webpage delivery. The field of this aspect of the invention is computer implemented methods of reducing bandwidth required for webpage delivery.

The method may be one wherein the user terminal includes a browser operable to communicate with the server.

The method may be one in which the browser executes a javascript program, to communicate with the server.

The method may be one in which the browser executes javascript (e.g. player) code, to perform at least steps (iii), (iv) and (vi).

The method may be one wherein the user terminal includes an app operable to communicate with the server.

The method may be one wherein the content item is video, graphics or text.

The method may be one wherein the content item is video, which is a video clip from a larger video, e.g. in which the video is stored at the server in Blackbird format.

The method may be one wherein the user terminal is a smartphone, a tablet computer, a laptop, a desktop computer, or a smart TV.

According to a twentieth aspect of the invention there is provided a user terminal, configured to perform a method of any aspect of the nineteenth aspect of the invention. According to a 21st aspect of the invention there is provided a computer program product, executable on the user terminal, to perform a method of any aspect of the nineteenth aspect of the invention.

According to a 22nd aspect of the invention there is provided a computer- implemented method of low energy file distribution, the method including encrypting a video file, the video file including a compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the video file, wherein frames in level zero of the video file have the lowest temporal resolution, wherein content of the frames in level zero of the video file is displayable when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the video file is displayable when decompressed only using content of at least one or more frames not in level x of the frames of the video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy, the method including the steps of:

(ii) encrypting the frames in level zero of the video file;

(iii) caching the parts of the video file that have not been encrypted at a proxy;

(iv) transmitting to a user device the parts of the video file that have been encrypted, and transmitting from the proxy to the user device the parts of the video file that have not been encrypted. An advantage is low energy file distribution. An advantage is secure file distribution. The field of this aspect of the invention is computer- implemented methods of low energy file distribution.

The method may be one including the step of assembling a file comprising a partially encrypted version of the video file, the partially encrypted version of the video file including level zero frames including the encrypted frames in level zero of the video file from step (ii), and further including the levels of frames of the video file that do not include level zero of the frames of the video file.

The method may be one further including storing the assembled file. The method may be one wherein the lowest level, level zero, of the hierarchy are key frames.

The method may be one wherein level one comprises delta frames, which are the deltas between the key frames.

The method may be one wherein level two comprises delta frames, which are the deltas between the level one frames.

The method may be one wherein the delta frames have a chain of dependency back to the key frames.

The method may be one wherein decoding each level of content relies on all lower levels having been decoded, with an adaptive code where codewords depend on previous data.

The method may be one wherein for the non-encrypted video fde portions, compression uses transition tables for encoding and decoding, and to perform decoding successfully for any given level, you need to have decoded all the lower levels of lower temporal resolution.

The method may be one wherein compressed level zero frames comprise 20% or less of the total compressed data of all levels, or 10% or less of the total compressed data of all levels, or 5% or less of the total compressed data of all levels.

The method may be one in which non-zero level frames are not encrypted.

The method may be one wherein compressed data in the non-zero level frames is 80% or more of the total compressed data of all levels, or 90% or more of the total compressed data of all levels, or 95% or more of the total compressed data of all levels. The method may be one in which if a file size of an encrypted level zero frame is less than a predetermined size, then the corresponding level one frame is also encrypted.

The method may be one wherein the predetermined size is lOkB, or less.

The method may be one in which the compressed format structure is an MPEG structure.

The method may be one in which the compressed format structure is a Blackbird codec structure.

The method may be one in which the encryption uses a symmetric key cryptography.

The method may be one in which the encryption uses an asymmetric key cryptography.

The method may be one in which the user device is a smartphone, a mobile phone, a tablet computer, a laptop, a desktop computer, a mobile device, or a smart TV.

The method may be one in which the non-encrypted data is sent together with some hashed data, where the hashed data is generated using a hash function of at least some of the non-encrypted data, so that the non-encrypted data may be authenticated using the hashed data.

The method may be one in which the frames are provided in the (e.g. Blackbird) codec, for only some of the lower levels (e.g. level zero and level one) for free, and require payment for the frames in the higher levels (e.g. level two to level six).

The method may be one in which a live broadcast by video (e.g. election results) is provided at a lower frame rate, by providing the frames in the (e.g. Blackbird) codec, for only some of the lower levels (e.g. level zero and level one) for free, and not sending the frames in the higher levels (e.g. level two to level six), to reduce transmission bandwidth, to reduce energy usage, and to reduce transmission costs. The method may be one in which an option to interpolate between frames is provided, to make playback smoother.

The method has the advantages of low energy file distribution, and of secure file distribution.

According to a 23rd aspect of the invention there is provided a computer system configured to perform a method of any aspect of the 22 ^nd aspect of the invention.

According to a 24th aspect of the invention there is provided a computer program product executable on a computer to perform a method of any aspect of the 22 ^nd aspect of the invention.

According to a 25th aspect of the invention there is provided a computer-implemented method of low energy file distribution, the distributed file including a partially encrypted video file of a video file, the video file including a compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the video file, wherein frames in level zero of the video file have the lowest temporal resolution, wherein content of the frames in level zero of the video file is displayable when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the video file is displayable when decompressed only using content of at least one or more frames not in level x of the frames of the video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy, wherein the frames in level zero of the video file are encrypted and stored at a server, the method including the steps of:

(i) caching at a proxy the parts of the video file that have not been encrypted;

(ii) transmitting from the server to a user device the parts of the video file that have been encrypted, and transmitting from the proxy to the user device the parts of the video file that have not been encrypted. An advantage is low energy file distribution. An advantage is secure file distribution. The field of this aspect of the invention is computer-implemented methods of low energy file distribution.

The method may be one including a method of any aspect of the 22 ^nd aspect of the invention. According to a 26th aspect of the invention there is provided a computer system configured to perform a method of any aspect of the 25 ^th aspect of the invention.

According to a 27th aspect of the invention there is provided a computer program product executable on a computer to perform a method of any aspect of the 25 ^th aspect of the invention.

According to a 28th aspect of the invention there is provided a computer-implemented method of low energy file distribution, the method including decrypting a partially encrypted video file to produce a decrypted video file, the decrypted video file including a compressed format structure including a hierarchy of two or more levels of temporal resolution of frames of the decrypted video file, wherein frames in level zero of the decrypted video file have the lowest temporal resolution, wherein content of the frames in level zero of the decrypted video file is displayable when decompressed without depending on content of frames of any other level, and wherein content of frames in each level x not in level zero of the decrypted video file is displayable when decompressed only using content of at least one or more frames not in level x of the frames of the decrypted video file, and included in one or more lower levels of lower temporal resolution of frames of the hierarchy, the method including the steps of:

(v) the user device displaying the decrypted video file on a screen of the user device. An advantage is low energy file distribution. An advantage is secure file distribution. The field of this aspect of the invention is computer-implemented methods of low energy file distribution

The method may be one further including storing the file assembled in step (iv).

The method may be one wherein for the decrypted video file, decoding each level of content relies on all lower levels having been decoded, with an adaptive code where codewords depend on previous data.

The method may be one wherein for the non-encrypted video file, compression uses transition tables for encoding and decoding, and to perform decoding successfully for any given level, you need to have decoded all the lower levels of lower temporal resolution.

The method may be one wherein level zero frames comprise 20% or less of the total (e.g. compressed) data of all levels, or 10% or less of the total (e.g. compressed) data of all levels, or 5% or less of the total (e.g. compressed) data of all levels.

The method may be one wherein the data in the non-zero level frames is 80% or more of the total (e.g. compressed) data of all levels, or 90% or more of the total (e.g. compressed) data of all levels, or 95% or more of the total (e.g. compressed) data of all levels.

The method may be one in which if a level one frame of the partially encrypted video file is also encrypted, then it is also decrypted.

The method may be one in which the non-zero levels of the partially encrypted video file are not encrypted.

The method may be one in which the compressed format structure is an MPEG structure.

The method may be one in which the compressed format structure is a Blackbird codec structure.

The method may be one wherein the decryption uses a symmetric key cryptography.

The method may be one wherein the decryption uses an asymmetric key cryptography.

The method may be one wherein the user device is a smartphone, a mobile phone, a tablet computer, a laptop, a desktop computer, a mobile device or a smart TV.

The method may be one wherein the user device is a smart TV which includes a web browser, and the web browser is executable to play a video, which is received in the form of encrypted key frames, and non-encrypted non-key frames.

The method may be one wherein the user device is a mobile device which includes a web browser, and the web browser is executable to play a video, which is received in the form of encrypted key frames, and non-encrypted non-key frames.

The method may be one wherein the browser playing the video including encrypted key frames, and non-encrypted non-key frames is informed which frames are encrypted, and which frames are non-encrypted.

The method may be one wherein the user (e.g. mobile) device includes an application program, and the application program is executable to play a video, which is received in the form of encrypted key frames, and non-encrypted non-key frames. The method may be one wherein the user (e.g. mobile) device plays back at a lower frame rate in (e.g. Blackbird) codecs to reduce CO2 emissions in power generation, as only displayed frames are downloaded and decompressed.

The method may be one wherein the user (e.g. mobile) device includes an option to interpolate between frames to make playback smoother.

The method may be one wherein the non-encrypted data is sent together with some hashed data, where the hashed data is generated using a hash function of at least some of the non-encrypted data, so that the non-encrypted data may be authenticated using the hashed data.

The method has the advantages of low energy fde distribution, and of secure fde distribution.

According to a 29th aspect of the invention there is provided a computer system configured to perform a method of any aspect of the 28 ^th aspect of the invention.

According to a 30th aspect of the invention there is provided a computer program product executable on a processor to perform a computer-implemented method of decrypting a partially encrypted video file of any aspect of the 28 ^th aspect of the invention.

According to a 31st aspect of the invention there is provided a video file encryption apparatus including a processor configured to perform a computer-implemented method of encrypting a video file of any aspect of the 22nd aspect of the invention.

The video file encryption apparatus may be one wherein the video file encryption apparatus is a chip.

According to a 32nd aspect of the invention there is provided a video file decryption apparatus including a processor configured to perform a computer-implemented method of decrypting a partially encrypted video file of any aspect of the 28th aspect of the invention. The video file decryption apparatus may be one wherein the video file decryption apparatus is a chip.

According to a 33rd aspect of the invention there is provided a video file encryption and decryption apparatus, including a processor configured to perform a computer- implemented method of encrypting a video file of any aspect of the 22nd aspect of the invention, and wherein the processor is configured to perform a computer- implemented method of decrypting a partially encrypted video file of any aspect of the 28th aspect of the invention.

The video file encryption and decryption apparatus may be one wherein the video file encryption and decryption apparatus is a chip.

According to a 34th aspect of the invention there is provided a computer implemented method of identifying significant images in a video file, the video file including source images, the method including the steps of

(ii) creating an arrangement of said token images in a continuous band of token images arranged adjacently as a function of time in the video;

(v) providing a selectable option in a user interface, the selectable option selectable to play the video starting from the image in the video corresponding to the identified squashed token image, or starting from the image in the video corresponding to the adjacent squashed token image. An advantage is a low energy method of identifying significant images in a video file, because the significant images are identified without analyzing all the pixel data in the entire video file. The field of this aspect of the invention is computer implemented methods of identifying significant images in a video file.

The method may be one wherein steps (iv) and (v) are executed in a browser on a client device (e.g. smartphone, tablet computer, laptop, desktop computer, smart TV), or in an app on a client device (e g. smartphone, tablet computer, laptop, desktop computer, smart TV).

The method may be one wherein steps (iv) and (v) are executed in javascript in the browser on the client device.

The method may be one wherein tasks (e.g. Al tasks) (e.g. video cut detection, face recognition, player identification in sport, vehicle detection from a drone) are performed following this client-side processing.

The method may be one wherein step (iv) includes without using analysis (e.g. Al analysis) to search an ingested video on a server.

The method may be one wherein steps (iv) and (v) are executed at a server.

The method may be one wherein the threshold criterion is that the identified squashed token image contains pixels of a selected colour (e.g. red) above a predetermined threshold, and the adjacent squashed token image does not contain pixels of the selected colour above the predetermined threshold.

The method may be one wherein the threshold criterion is that the identified squashed token image contains an increase in pixels of a selected colour (e.g. red) above a predetermined threshold, relative to the adjacent squashed token image.

The method may be one wherein the selected colour is black, white, red, blue or green.

The method may be one wherein the threshold criterion is that the identified squashed token image is black, and the adjacent squashed token image is not black.

The method may be one wherein the video is at least 2 minutes in duration.

The method may be one wherein the threshold criterion includes that the identified squashed token image’s content differs from the adjacent squashed token image’s content, by a threshold amount.

The method may be one wherein the threshold criterion includes that the identified squashed token image’s audio content differs from the adjacent squashed token image’s audio content, by a threshold amount (e.g. indicating a loud cheer).

The method may be one wherein the step (iv) includes using artificial intelligence analysis.

The method may be one wherein an analysis (e.g. Al analysis) is performed of content viewed by a viewer in the past, and the analysis (e.g. Al analysis) then searches for similar content within a library of ingested video content, and the content identified by the analysis (e.g. Al analysis) is offered to the viewer, so the viewer can select what content to view.

The method may be one wherein step (iv) is repeated until all instances in the video are identified in which a squashed token image differs from an adjacent squashed token image so as to satisfy a threshold criterion, wherein the adjacent squashed token image precedes in time the identified squashed token image.

The method may be one including providing selectable options in the user interface, the selectable options selectable to play the video starting from an image in the video corresponding to an identified squashed token image, or starting from the image in the video corresponding to an adjacent squashed token image, for all identified instances in the video.

The method may be one in which the search (e.g. Al search) of the navigation tool finds candidate frames as described, and then the candidate frames are investigated at higher resolution, e.g at 64x36 pixels, and/or using original frames, to confirm that a further threshold criterion is met. An advantage is that more reliable identification of significant frames is obtained.

The method may be one in which the threshold criterion is user selectable.

The method may be one in which when the analysis (e.g. Al analysis) of the navigation tool is performed, the analysis (e.g. Al analysis) program is configured such that, if a notable item in the navigation tool is identified, the analysis (e.g. Al analysis) program sends an alert, such as to a user device such as to a mobile phone.

The method has the advantage of being a low energy method of identifying significant images in a video file. According to a 35th aspect of the invention there is provided a system including a server and a client device, the system configured to identify significant images in a video file, the video file including source images, wherein:

(ii) the server is configured to create an arrangement of said token images in a continuous band of token images arranged adjacently as a function of time in the video;

(iv) the server is configured to send the new squashed band of squashed token images to the client device;

(vi) the client device is configured to provide a selectable option in a user interface of the client device, the selectable option selectable to play the video starting from the image in the video corresponding to the identified squashed token image, or starting from the image in the video corresponding to the adjacent squashed token image. An advantage is a low energy system for identifying significant images in a video file, because the significant images are identified without analyzing all the pixel data in the entire video file.

The system may be one configured to perform a method of any aspect of the 34 ^th aspect of the invention. According to a 36th aspect of the invention there is provided a computer program product executable on a client device, the client device forming part of a system including the client device and a server, the system configured to identify significant images in a video file, the video file including source images, wherein:

(ii) the server is configured to create an arrangement of said token images in a continuous band of token images arranged adjacently as a function of time in the video;

(iv) the server is configured to send the new squashed band of squashed token images to the client device; wherein the computer program product is executable on the client device:

(b) to provide a selectable option in a user interface of the client device, the selectable option selectable to play the video starting from the image in the video corresponding to the identified squashed token image, or starting from the image in the video corresponding to the adjacent squashed token image. An advantage is a low energy usage computer program product for identifying significant images in a video file, because the significant images are identified without analyzing all the pixel data in the entire video file. The computer program product may be one configured to perform a method of any aspect of the 34 ^th aspect of the invention.

Aspects of the invention may be combined.

BRIEF DESCRIPTION OF THE FIGURES

Aspects of the invention will now be described, by way of example(s), with reference to the following Figures, in which:

Figure 1 shows a typical image of 376x280 pixels divided into 8x8 pixel superblocks.

Figure 2 shows a typical super-block of 8x8 pixels divided into 64 pixels.

Figure 3 shows a typical mini-block of 2x2 pixels divided into 4 pixels.

Figure 4 shows an example image containing two Noah regions and a Noah edge.

Figure 5 shows an example of global accessible context for Transition Tables.

Figure 6 shows an example of Transition Tables with local context (e g. LC1, etc.) and corresponding resulting values which have been predicted so far.

Figure 7 shows an example of typical context information for cuts.

Figure 8 shows an example of typical context information for delta frames.

Figure 9 is a flowchart showing how variable length codewords may be generated from a list of codewords sorted by frequency.

Figure 10 is a schematic diagram of a sequence of video frames.

Figure 11 is a schematic diagram illustrating an example of a construction of a delta frame.

Figure 12 is a schematic diagram of an example of a media player.

Figure 13 is an example of a computer display providing a method of enabling efficient navigation of video.

Figure 14 is an example of a sequence of source image frames processed to provide a method of enabling efficient navigation of video.

Figure 15 is an example of additional horizontal reductions, in a method of enabling efficient navigation of video.

Figure 16 shows an example approach of speeding up blur calculations, in which in a first aspect, blurring is performed by varying x then y, using the separable property of the blurring function (e.g. Gaussian or bilinear interpolation), and in a second aspect bilinear interpolation is used for smooth image portions, to speed up the computations. Figure 17 shows an example of bilinear interpolation over a square area with four corners, where initially the signal strength is known only at each of the four corners A, B, C, D.

Figures 18 to 22 provide examples of a sequence of source image frames processed to provide a method of enabling efficient navigation of video.

Figure 23 is an example of additional horizontal reductions, in a method of enabling efficient navigation of video.

Figure 24 is an example of a computer display providing a method of enabling efficient navigation of video.

Figure 25 shows an example in which bilinear interpolation is considered for use in a blur calculation.

Figure 26 shows an example in which it is possible for an abrupt transition to be produced, where a Gaussian blur and a bilinear interpolation blur join at a boundary. Figure 27 shows an example in which a portion of an original sharp image is blurred and enlarged, and shown in a border of the screen, for each of the left hand side and the right hand side of a 16:9 landscape screen.

Figure 28 shows an example of a process for providing a fast, low energy, blur calculation for an image.

Figure 29A shows an example of text and graphics on a web page.

Figure 29B shows an example of text and graphics on a web page.

Figure 30 shows an example of web page content analysis, in which previously identified content is noted, together with its respective id number, and content that has not been previously identified is denoted as “not matched”

Figure 31 shows an example in which an encryption key (e.g. SSL HTTPS RSA) is used only for the level zero key frames, and the data not in the key frames (i.e. data in levels one to six) does not need to be encrypted; this data can instead be sent as HTTP. Figure 32 shows an example in which an encryption key is used only for the level zero key frames, and the data not in the key frames (i.e. data in levels one to six) is not encrypted.

Figure 33 shows an example in which a decryption key is used only for the encrypted level zero key frames, and the data not in the key frames (i.e. data in levels one to six) is not decrypted, because it was not previously encrypted. DETAILED DESCRIPTION

IMAGE BLUR OR VIDEO BLUR

The best type of blur is Gaussian blur, which provides the smoothest blur. Gaussian blur uses the normal distribution, so for a pixel in a whole original image to be blurred, the signal intensity from this pixel is normally distributed over part of, or the whole of, the original image to be blurred, where this process is repeated for every pixel to be blurred in the whole original image. Because the Gaussian shape has the least structure of any shape, it produces a really smooth blur. Other blurs can produce artefacts.

How should one produce a Gaussian blur of an image, computationally? A slow way is to calculate the intensity of each pixel in a blurred image, by normally distributing the signal intensity from each pixel in the whole of the original image to be blurred. But for an image that is x by x pixels, the computation time will scale approximately as x ^A2 for the original image, times x ^A2 for the blurred image, which scales as x ^A4. This is very undesirable scaling for blurring video.

In practice, it is best to take advantage of the Gaussian blur’s separable property by dividing the process into two passes. In the first pass, a one-dimensional kernel is used to blur the image in only the horizontal or vertical direction. In the second pass, the same one-dimensional kernel is used to blur in the remaining (horizontal or vertical) direction. The resulting effect is the same as convolving with a two- dimensional kernel in a single pass, but requires fewer calculations. For an image that is x by x pixels, the computation time may scale approximately as x ^A2 *21og x, which for increasing x is much more favourable than the scaling of x ^A4 mentioned above. So for an image that is 10 by 10 pixels, using the Gaussian blur’s separable property by dividing the process into two passes, the computation can be performed in 200 time units, whereas the slow way takes 10 ^A4 time units, so using the Gaussian blur’s separable property by dividing the process into two passes is about 50 times faster than the slow way, in this example. However, even faster blurring is desirable, for example to provide blurring when executing in j avascript on a normal desktop or smartphone computer, for processing a video. If one tries to use faster blurring algorithms than the Gaussian algorithm, the faster blurring algorithms tend to produce artefacts, particularly in relation to edges in an original image, for example, producing ripples.

To speed up blur calculations, we use bilinear interpolation for the blurring of smooth parts of an original image. An example approach is shown in Figure 16, in which in a first aspect, blurring is performed by varying x then y, using the separable property of the blurring function (e.g. Gaussian or bilinear interpolation), and in a second aspect bilinear interpolation is used for smooth image portions, to speed up the computations.

In mathematics, bilinear interpolation is a method for interpolating functions of two variables (e.g., x and y) using repeated linear interpolation. Bilinear interpolation is performed using linear interpolation first in one direction x, and then again in the other direction y. For bilinear interpolation over a square area with four corners, where initially the signal strength is known only at each of the four corners A, B, C, D, the signal strength at a given point P is most influenced by the signal strength at the corner closest to the given point, and is second most influenced by the signal strength at the corner second closest to the given point, and is third most influenced by the signal strength at the corner third closest to the given point, and is least influenced by the signal strength at the corner furthest from the given point. An example is shown in Figure 17. An advantage of bilinear interpolation over Gaussian blurring is that bilinear interpolation is computationally much faster than Gaussian blurring. Bilinear interpolation is computationally fast, because it is a linear approach, and computers can perform linear calculations quickly. For example, in bilinear interpolation, the difference, per pixel, as you go from left to right is a first constant value, and the difference, per pixel as one goes from top to bottom is a second constant value. So one can perform the bilinear interpolation as one moves left to right by adding the first constant value. And one can perform the bilinear interpolation as one moves top to bottom by adding the second constant value. Computer processors perform addition operations very quickly. For example for some processor chips one can perform four add operations in one clock cycle. In an example, a processor chip can perform eight add operations in one clock cycle.

Our approach to faster blur calculations than Gaussian blur is to replace Gaussian blur with bilinear interpolation, for portions of the original image that are sufficiently smooth.

Gaussian blur produces results that are nice to look at because it minimises assumptions about the original image content. But Gaussian blur is computationally complex.

The objective is to simulate a true Gaussian blur with a computationally simpler solution. Bilinear interpolation over a square area may work well and fast if a+d-b-c is small in magnitude, where a, b, c and d are the signal strengths at each corner of the square area, or are pixel values at each corner of the square area, eg. luminance values or RGB values, etc. But for example if an edge in an original image crosses the square area, so that one comer has a very different pixel value to the other three corners, then a+d-b-c is not small in magnitude, and bilinear interpolation does not work very well for blurring the original image, so Gaussian blurring should be used instead. An example is shown in Figure 25.

So a+d-b-c being small in magnitude is a test one may use, to see if one should replace Gaussian blurring with bilinear interpolation, for blurring a square portion of an original image.

So for example take 4x4 or 8x8 pixel blocks, and use bilinear interpolation if a+d-b-c is small in magnitude, else use Gaussian, where Gaussian includes operating over x and y coordinates sequentially. We have found that 4x4 or 8x8 pixel blocks work well, in our tests. If the blocks are too small, there are many blocks, and the bilinear interpolation can be used, but then there are a lot of edges to the blocks, and lots of time has to be taken to check if the edges permit bilinear interpolation to be used, which slows the calculation.

If the blocks are too big, then bilinear interpolation may not be so accurate, in which case Gaussian blur has to be used more often, in which case less computational time is saved. This is why we think 4x4 or 8x8 pixel blocks work well, in our tests. At 360p, we found 8x8 pixel blocks worked best. For bigger pictures, eg. 1080p, bigger blocks worked better, probably because bigger pictures tend to have larger smoother areas.

The pixel values at the comers of the pixel block square area should be made by averaging nearby pixel values.

For the signal strength in each comer, one may use averaging, for example going one or two pixels further, but not too far, in case there is an object outside the block, which is providing a corrupting effect. Corners of a bilinear interpolation may be made from average nearby pixels, to check that that these nearby pixel values are consistent, to avoid a corrupting effect.

Also, Gaussian blur uses pixels from within a radius, so check a box is smooth out as far as the radius, to know when a transition between a Gaussian blur and a bilinear interpolation blur is a smooth transition. An example is shown in Figure 25. Otherwise an abrupt transition is likely to be produced, where a Gaussian blur and a bilinear interpolation blur join at a boundary. An example is shown in Figure 26, in which it is possible for an abrupt transition to be produced, where a Gaussian blur and a bilinear interpolation blur join at a boundary.

The criterion of being smoothly varying may include that an area (e.g. a square area) is smoothly varying as far as the pixel radius, in which the area may extend outside a pixel block. This has the benefit that distant objects don't impinge on the 8x8 block when they are blurred.

Where a bilinear interpolation blur is used in the vicinity of a sharp edge, the sharp edge leads to a jagged edge, which is not acceptable for a blurred image result. In such a case, a Gaussian blur should be used.

Most image blur regions will be suitable for use of a bilinear interpolation blur in real time and the result should be pretty close to the result from a Gaussian blur. It may be desirable to obfuscate block edges, if some jagged edges are produced, if needed. But in our tests, we have not needed to do this.

The use of smaller block sizes means more bilinear interpolation blur can be used, as a percentage of the total blur calculations, but more calculation is needed, e.g. for checking edge regions of the block for suitability of using bilinear interpolation blur, and because smaller block sizes mean computation is needed for many more blocks. So it is desirable not to make the block sizes too small.

We find that for suitable block sizes, e.g. 4x4 or 8x8 pixel blocks, we can typically obtain a 20% to 80% reduction in computation time, with an average reduction of 50% in computation time, when Gaussian blur and bilinear interpolation blur are used together, compared to use of Gaussian blur alone. For smooth images, we can achieve the largest reductions in computation time.

Using a Gaussian blur and a bilinear interpolation blur together could be employed for any computer that is processing video data. However, if we consider a browser-based option executing on a computer, without access to a graphics processing unit (GPU), this needs to work well in real time for video applications.

On a 640x360p proxy, 8x8 blocks for the bilinear interpolation blur, with a 4-12 pixel radius for the Gaussian blur portions of the blur, worked well, in our tests.

Using the above methods for identifying where a bilinear interpolation blur may be used instead of a Gaussian blur, because the portion of the image is sufficiently smooth, this may be used to identify a portion of an image which is suitable for placing advertising in, because the portion of the image is sufficiently smooth. When such a portion of an image is identified, provided the portion is large enough, and its colours differ from the colouring of the advertising, then the advertising may be inserted over the identified portion of the video image, e.g. for a preselected duration within the video. For example, when analysing a video of an ice hockey game, a portion of the ice surface may be identified as being a smooth portion, and an advertisement of a different colour to the ice may be inserted on the identified portion of the ice surface, e.g. for a preselected duration within the video. Blurred borders

These days, nearly all people tend to watch TV using a 16:9 screen aspect ratio. When people view 9:16 screen aspect ratio portrait images recorded using mobile devices, using 16:9 landscape screen aspect ratio, these 16:9 landscape screen aspect ratio images often have blurred borders added at the left hand side and at the right hand side of the 16:9 landscape screen aspect ratio images. For example, a portion of an original sharp image can be blurred and enlarged, and shown in a border of the screen, for the left hand side and for the right hand side of the 16:9 landscape screen. An example is shown in Figure 27 If a Gaussian blur is used, this tends to be very slow computationally, as the majority of the screen area is blurred.

An objective is to make an efficient faster-than-real-time edge blur for such videos.

Gaussian blurring is quite computationally intensive, in general. A starting 2x2 pixel area is taken from an image in which a Gaussian blur has been performed, where the amount of blurring can be selected to provide a desired speed of computation. This may be expressed as “having as much blur as you want”. This Gaussian blur is fast, because it only needs to be performed on the subset (for example ’A or 1/16) of the original image pixels that are going to be used in the final, magnified, blurred image for a border of the 16:9 landscape screen aspect ratio images. In an example first step, for each pixel in a portion of the image, we use a quick 2x2 bilinear blur first to reduce pixel count by 4x and to reduce the blur radius by 2x. From an original 2x2 pixel area, we create a 4x4 bilinear interpolation blur, which leads to four bilinear interpolated blurred image pixels per original image pixel, because 4x4=16 divided by 2x2=4, is four. This increases the image size by a factor of two in linear magnification, or a factor of four by area. An example is shown in Figure 28.

Then in an example second step we use the memory storing the blurred edges to store new blurred intermediate results during edge blur: this improves central processing unit (CPU) caching and reduces memory footprint for the process and memory allocations for the process. In an example, when the Gaussian blur is performed to provide a small image, the small image can be stored in the memory which will store the final blurred image, where the storing is in a comer of the border of the 16:9 landscape screen aspect ratio images. Then, as the 2x2 bilinear blur is performed, and as the image size is increased by a factor of two in linear magnification, or a factor of four by area, the results of the 2x2 bilinear blur overwrite the workspace of the border as the process progresses, so no additional workspace outside the border is required. This has the advantage of requiring no extra memory, or negligible extra memory, outside the memory storing the final blurred image, for each border. This is useful, because the available memory on the client device may be limited. The calculation can be performed in real-time, on a client device.

An advantage is that we can play back the 9: 16 screen aspect ratio portrait images recorded using mobile devices, using 16:9 landscape screen aspect ratio, on the same machine, with blurred borders in real time. Previously, we used black borders, because the blur calculations were not fast enough to implement blur in real time under these conditions. We can now perform blurring of a portion of a video, e.g. for a vehicle number plate, or for a person’s face, in real time, in a browser environment, e.g. in 1080p video format.

There are no easy alternatives to performing a blur calculation. For example, if one places some graphics over the area one wants to blur, viewers tend to be distracted by what has been placed over the area one wanted to blur. They tend to look at the graphics placed over the area one wanted to blur. Blurring is not distracting for a viewer, because it is like something seen out of focus by the human eyes, which tends to be of very low interest to the human visual processing system. Blurring is easily performed by the human visual system, such as if something is out of focus for a human eye, which acts on an image in parallel, but computationally a blur has to be calculated more-or-less pixel-by-pixel, which is much more challenging.

Within an image a selected portion of the image can be blurred. For example, the selected portion may be a rectangle, an ellipse, a square, a circle, or a squircle. A video editing program may be configured to track an object within a video (e.g. someone’s face), and to blur that object as it moves in the video frame. The above fast blur calculation method may be used, for example to execute the object-tracked blur in a computationally efficient way. The above fast blur calculation method may be used, for example to execute the object-tracked blur in real-time.

Reducing bandwidth required for web page delivery

The goal here is to reduce bandwidth required for web page delivery. A user may use a browser on a computer, the browser including a javascript web page video player. Video is typically quite highly compressed, but web page graphics are typically not so strongly compressed. There is provided a computer program for providing a web page, e g. using javascript. In an example, a web page is processed using javascript, to reduce the bandwidth required from the server to users in connection with the server. Applications include servers providing web pages (e.g. news sites, eg. BBC News website), or apps for social media applications, e g. apps on smartphones. Such data is often sent uncompressed, or poorly compressed.

In our approach, we can compress parts of web pages. The computer program for providing a web page receives compressed web page graphics, decodes the web page graphics, and displays the web page graphics on a screen of a computer which is executing the computer program. The computer program for providing a web page receives compressed web page text, decodes the web page text, and displays the web page text on a screen of a computer which is executing the computer program.

In an example, a browser executing on a client device (e.g. smartphone, tablet computer, laptop, desktop computer, smart TV) is sent a javascript (JS) program which decodes the web pages or content, followed by the web pages or content. The javascript program can use (e.g. up-to-date, sophisticated) compression for text, graphics and/or video, and it can also cache content contained within the page, e.g. memes which are smaller than a whole file. This approach may also be applied for one or more of: words, images, graphics and videos, and for common sentences in content, identical parts of memes which are otherwise largely the same, etc.

Consider the situation where there is a web page, and a web cache on a computer that stores the web page. On a server, we analyse web pages and recognize when a portion of a web page that has been sent at an earlier time is the same as a portion of a web page that is intended to be sent at a later time, e.g. which is about to be sent. Such a portion is saved on the server and given a unique identity. Consider that there is a first image with a speech bubble including first speech that was sent at the earlier time. An example is shown in Figure 29A. Consider that there is a second image with a speech bubble including second speech, different to the first speech, but which is otherwise identical to the first image. An example is shown in Figure 29B. The second image, not having been sent before, is not present in any cache. In an example, we analyse the first image and save some or all the unique portions of the first image, each portion including a unique identifier. In an example, we analyse the second image, and recognize that portions of the second image, not including the speech bubble, are identical to some portions of the first image. So when sending the second image to the browser executing the javascript player, where the browser executing the javascript player has already received the first image, we can send only the unique identifiers for the portions of the second image that are identical to portions of the first image, and not send the portions of the second image that are identical to the first image, and the browser executing the javascript player can provide the portions of the second image that are identical to the first image on the screen, without actually receiving the second image in its entirety from the server. The javascript player needs only to receive the unique identifiers of graphics or text it has received previously, and the portions of the image or text that it has not received previously, to provide the second image on the screen. So in effect we have a sophisticated web cache.

An example of web page content analysis, in which previously identified content is noted, together with its respective id number, and content that has not been previously identified is denoted as “not matched”, is shown in Figure 30.

We can also provide this for video in Blackbird format, e.g. for clips from a larger video.

We can notice partial matches of graphics, text or video. Then we can use the noticed matching, in particular, a short code word for each partial match, to send the code word rather than the whole item, to reduce bandwidth required for delivering web pages or content over the internet.

Consider the example of a news web site. The main news page includes text for stories, and graphics. If one returns to the web page some hours later, much of the content will be the same, but it may have moved position on the web page. The conventional approach is, if the web page has changed, then the entire web page needs to be sent again. In our approach, previously sent portions of the web page are not sent again, if they are identical to portions of an updated web page, but instead a unique identifying code for each such portion is sent, along with the new content of the updated web page. An example is the BBC News website. It is possible that use of this javascript player approach could reduce the amount of data required to be sent to a particular user using a web browser and its cache over a long period of time (e.g one year) by a factor of two. This would provide significant energy savings, e.g. for the internet infrastructure providers, and for the website or data provider. So this provides an environmental advantage, because saving energy is good for the environment.

It is possible that this approach could be adopted only for more commonly occurring content in web pages, and not implemented for rarely occurring content. So for example if some content has been requested from a web page, by the community of users, more than a predetermined number of times, eg. a thousand times, then that content will be allocated its own unique id, and will be implemented in the above process for sending the unique id. An example of commonly occurring content in web pages is the graphics and text for the top of the Home web Page for BBC News. But the approach might not be adopted for a rarely accessed web page on a web site, e.g. one which has had less than a hundred hits in the previous ten years.

For example, in Twitter, when people add comments, much of the original content is repeated. But the whole Twitter entry including an added comment is sent to the user, even if the user has already received a large portion of the content in what was previously sent.

In an example, the javascript player is sent to a client computer, together with the web page. The javascript player need not be sent again, if the javascript player is already cached on the client computer.

The application need not be implemented in javascript. For example, for Twitter, this can be provided by an app on a smartphone. The Twitter server can notice partial matches of graphics, text or video. Then the server can use the noticed matching, in particular, a short code word for each partial match, to send the code word rather than the whole item, to reduce bandwidth required for delivering Twitter content to the smartphone, e.g. over the internet.

In the analysis of text, it is probably not worth creating codewords for each word of language, because there would be too many. But one could create codes for sentences, for example, and use these to reduce bandwidth required to send content. By compressing text, one may be able to reduce the transmitted data amount by about a factor of five.

In the analysis of graphics, the graphics can be analyzed using a grid, and each portion of the grid can be given a respective reference id, and portions of a grid of a new item of graphics can be analyzed (including being given respective reference ids) for matching with graphics that has been previously analyzed using a grid, such that grid portions of the new item of graphics that match with portions of the previously analysed item of graphics can be identified, for use in reducing the amount of data transmitted from a website containing the matched portions of the analysed items of graphics. Using this process for graphics, one may be able to reduce the transmitted data amount by about a factor of five.

The above would provide significant energy savings, e.g. for the internet infrastructure providers, and for the website or data providers. So this provides an environmental advantage, because saving energy is good for the environment.

Lower Energy Distribution

Using http (not https), one can consider the case of a plurality of mobile phones, each connected to the same mobile operator proxy, in connection with a server, where the plurality of mobile phones are all watching the same live event. In that case, one could send just one feed of the live event to the mobile operator proxy, and distribute the live event view to the plurality of mobile phones from the mobile operator proxy. This reduces the number of necessary connections from the server to the mobile operator proxy, from the number of the plurality of mobile phones, to one. The mobile phones were equipped each with a broadcast option, e.g. in a chip, that could receive broadcasts. This was so that each mobile phone of a plurality of mobile phones could watch a live feed with bandwidth from a mobile operator proxy to the source server shared with the other mobile phones of the plurality of mobile phones. The goal was to have the relevant technology installed in about 95% or more of mobile phones, so that it was widely adopted, so that the benefits could be realized to a great extent.

A problem arose when https became widely adopted, e g. by YouTube. With https, the feeds are individually encrypted, with a different key for each user. So then for a live event, every single feed needs to be sent from the server to the mobile phone, for each of the plurality of mobile phones. This uses a lot of bandwidth, slows down communications networks, and uses energy inefficiently.

In an example, we use a codec including a compressed format structure, the compressed format structure including a hierarchy of levels of temporal resolution of frames, each respective level of the hierarchy including frames corresponding to a respective temporal resolution of the respective level of the hierarchy, but not including frames which are included in one or more lower levels of lower temporal resolution of frames of the hierarchy. For example, the lowest level (level zero) of the hierarchy are key frames. In the next level, (level one) there are delta frames, which are the deltas between the key frames. In the next level (level two) there are delta frames, which are the deltas between the level one frames. In the next level (level three) there are delta frames, which are the deltas between the level two frames, etc. The compressed data comprises key frames and deltas, in which the deltas have a chain of dependency back to the key frames. The codewords for decoding are optimised e.g. thousands of times per second, and the codewords are not stored explicitly in the bitstream. The only simple way to deduce the codewords at any point is to decode the video from the key frames up to that point.

So for example, in a system with 63 frames between key frames, where 63=2 ^A6 -1, hence with levels from level zero to level six, if you want to play at eight times the normal speed, you don’t use frames from levels six, five, or four, and you use only frames from levels three, two, one, and zero. This means you don’t need to download frames from levels six, five, or four, which saves bandwidth and transmission energy, and you don’t need to decode frames from levels six, five, or four, which reduces processing energy needed.

The compression may use transition tables for encoding and decoding. But to perform decoding successfully for any given level, you need to have decoded all the lower levels of lower temporal resolution. So for example, the deltas for level one are no use if you don’t have the key frames (level zero). And for example, the deltas for level two are no use if you don’t have the deltas for level one, and the key frames (level zero).

So in an example low energy distribution method, you use an encryption key (e.g. SSL HTTPS RSA) only for the level zero key frames. The level zero frames might comprise only 5% of the total data. All the higher levels (e g. level one to level six) typically are not encrypted, because these cannot be decoded without successfully decoding the level zero frames, so there is no need to encrypt the higher levels (e.g. level one to level six). The code words in the higher levels (e.g. level one to level six) are meaningless if one cannot decrypt the level zero key frames. Hence the data not in the key frames (level zero) does not need to be encrypted, e.g using HTTPS. In a typical example, the data not in the key frames (i.e. data not in level zero) which does not need to be encrypted is about 95% of the total data; this data can instead be sent as HTTP, and this can be cached by any proxy, e.g. in the mobile operator proxy, or in a web browser. An example is shown in Figure 31. To get the data to your device (e.g. mobile phone) from the nearest proxy storing the data, you only have to go to that nearest proxy, which might be in the same building as you, or in your neighborhood, rather than having to receive the data from a server e.g. on the other side of the USA. It takes more energy to send data a longer distance, so this approach saves energy. The non-key frames can be cached on internet routers and/or proxy servers (including company ones) and / or internet playback devices, but the non-key frames cannot be decoded without the decrypted key frames.

In edge cases, or exceptional cases, the key frames may be too simple (e.g. totally black) to obfuscate the codewords enough, and so in this case more frames are to be sent encrypted before the unencrypted frames are sent. For example, some level one frames are sent encrypted. Typically, a minimum number of bytes of the fdes, in order of decoding, would be sent, to ensure the codewords used in the (e.g. Blackbird) codec were sufficiently unpredictable. Maybe lOkB would be sufficient in most cases.

Using a (e.g. Blackbird) codec, where sections of a video are shared or viewed, these sections could cache easily even when the entire video was not watched.

An environmentally friendly option is to play back at a lower frame rate in (e.g. Blackbird) codecs to reduce CO2 emissions in power generation, as only displayed frames are downloaded and decompressed. In an example, we may include an option to interpolate between frames to make playback smoother - which would work well when the video shows a person talking (e.g. a presenter or newsreader presenting, e.g. “talking heads”).

This approach is harder to do in MPEG, because there is only a key frame (level zero) and a further level (level one), but it can be implemented in MPEG. In MPEG, it is less beneficial than in the Blackbird codec, but it is beneficial nonetheless.

In an example, one could provide the frames in the (e.g. Blackbird) codec, for only some of the lower levels (e.g. level zero and level one) for free, and require payment for the frames in the higher levels (e.g. level two to level six). In another example, one could provide a live broadcast by video at a lower frame rate, by providing the frames in the (e.g. Blackbird) codec, for only some of the lower levels (e.g. level zero and level one) for free, and not sending the frames in the higher levels (e.g. level two to level six), to reduce transmission bandwidth, to reduce energy usage, and to reduce transmission costs. But the accompanying audio could be sent as normal. This approach could be suitable for some broadcasts, e.g. election results, in which the screen content does not change so much from one second to the next.

In an example, non-encrypted data is sent together with some hashed data, where the hashed data is generated using a hash function of at least some of the non-encrypted data, so that the non-encrypted data may be authenticated using the hashed data.

In an example, the browser playing the video including encrypted key frames, and non-encrypted non-key frames is informed which frames are encrypted, and which frames are non-encrypted. The browser can play the video using less processing power, and save energy, because it does not need to decrypt the non-encrypted frames.

In an example, a smart TV includes a web browser, and the web browser is executable to play a video, which is received in the form of encrypted key frames, and nonencrypted non-key frames, which are stored as described above, to reduce transmission bandwidth, to reduce energy usage, and to reduce transmission costs. In an example, a mobile device includes a web browser, and the web browser is executable to play a video, which is received in the form of encrypted key frames, and non-encrypted non-key frames, which are stored as described above, to reduce transmission bandwidth, to reduce energy usage, and to reduce transmission costs. A mobile device may be a smartphone or a tablet computer, for example.

In an example, a device includes an application program, and the application program is executable to play a video, which is received in the form of encrypted key frames, and non-encrypted non-key frames, which are stored as described above, to reduce transmission bandwidth, to reduce energy usage, and to reduce transmission costs. In an example, a mobile device includes an application program, and the application program is executable to play a video, which is received in the form of encrypted key frames, and non-encrypted non-key frames, which are stored as described above, to reduce transmission bandwidth, to reduce energy usage, and to reduce transmission costs. A mobile device may be a smartphone or a tablet computer, for example. An advantage of the approach of providing a video in the form of encrypted key frames, and non-encrypted non-key frames, is that the encryption only needs to be performed for the key frames, which saves on processor time because the non-key frames do not have to be processed to provide encrypted non-key frames.

In an example, there is provided a system for efficiently encrypting data when it is compressed using an adaptive code in a hierarchical form, such as the Blackbird family of video codecs. Where decoding each level of content relies on all previous levels having been decoded, with an adaptive code where codewords depend on previous data, in most cases, only the first level of data needs to be encrypted, as without knowing this level, none of the subsequent levels can be decrypted. Efficiency of encryption or decryption lies for example in using less processor time, and less processor energy. Some edge (i.e. exceptional) cases may also be handled, in a related but slightly different way.

In an example, we use a codec including a compressed format structure, the compressed format structure including a hierarchy of levels of temporal resolution of frames, each respective level of the hierarchy including frames corresponding to a respective temporal resolution of the respective level of the hierarchy, but not including frames which are included in one or more lower levels of lower temporal resolution of frames of the hierarchy. For example, the lowest level (level zero) of the hierarchy are key frames. In the next level, (level one) there are delta frames, which are the deltas between the key frames. In the next level (level two) there are delta frames, which are the deltas between the level one frames. In the next level (level three) there are delta frames, which are the deltas between the level two frames, etc. The compressed data comprises key frames and deltas, in which the deltas have a chain of dependency back to the key frames. During video playback, the codewords for decoding are optimised e g. thousands of times per second, and the codewords are not stored explicitly in the bitstream. The only simple way to deduce the codewords at any point is to decode the video from the key frames up to that point.

The level zero frames might comprise only 5% of the total data. All the higher levels (e.g. level one to level six) typically are not encrypted, because these cannot be decoded without successfully decoding the level zero frames, so there is no need to encrypt the higher levels (e.g. level one to level six). The code words in the higher levels (e.g. level one to level six) are meaningless if one cannot decrypt the level zero key frames. Hence the data not in the key frames (level zero) does not need to be encrypted. In a typical example, the data not in the key frames (i.e. data not in level zero) which does not need to be encrypted is about 95% of the total data. See Figure 32, for example. For related decryption, see Figure 33, for example.

In edge cases, or exceptional cases, the key frames may be too simple (e.g. totally black) to obfuscate the codewords enough, and so in this case more frames are encrypted, for example some level one frames. Typically, a minimum number of bytes of the files, in order of decoding, would be present, to ensure the codewords used in the (e.g. Blackbird) codec were sufficiently unpredictable. An encrypted file size of lOkB is expected to be a sufficiently large file size, in most cases.

Consider a situation where video, or other data, is compressed in layers, where each successive layer provides further details or resolution (for example temporal or spatial resolution) compared to antecedent layers.

Further consider a case where the compression uses an adaptive code, where the codewords are optimised as data is received and/or processed.

In this case, the codewords in later layers are dependent on the contents of earlier layers.

Encrypting the first layer makes the other layers hard to decrypt, because the adaptive codewords are unknown.

One example case is the Blackbird video codecs, including for example Blackbird 9. Elements of Blackbird 9 are disclosed in WO2018127695A2. Here the video is stored in multiple files, including one file for each time period.

In an example, video is split into a first key frame and then multiple chunks of, for example 64 frames each, each with its own key frame.

Each chunk of 64 frames is split into multiple files, including one file for each time period (e.g. one second, half a second, quarter of a second, eighth of a second, and so on), each of which cannot be decompressed without knowledge of the preceding files in the same chunk, and the last key frame of the previous chunk.

For example, File 0 from the previous chunk includes frame 0. File 0 includes frame 64 - a single key frame compressed through intra-frame compression. File 1 includes frame 32 (when decompressed with the knowledge of frames 0 and 64). File 2 includes frames 16 and 48 (when decompressed with the knowledge of Files 0 and 1. File 3 includes frames 8, 24, 40 and 56 (when decompressed with the knowledge of Files 0, 1 and 2). File 4 includes frames 4, 12, 20, 28, 36, 44, 52 and 60 (when decompressed with the knowledge of Files 0, 1, 2 and 3). File 5 includes frames 2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58 and 62 (when decompressed with the knowledge of Files 0, 1, 2, 3 and 4). File 6 includes frames 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61 and 63 (when decompressed with the knowledge of Files 0, 1, 2, 3, 4 and 5).

A codec, e g. a Blackbird codec, may use “Transition tables” to efficiently adjust the codewords e.g. thousands of times per second of video.

By encrypting File 0 in each chunk, the codewords used to compress the remaining frames are almost impossible to guess, so encrypting these other frames is usually unnecessary.

In certain cases, the key frames are easy to guess. For example, an entirely black video frame is a possible occurrence, and using this “guess” would provide decoding of all the other frames in the chunk, including potentially secret information. To avoid this possibility, if File 0 is very short, File 1 should be encrypted too. If File 1 is very short, File 2 should be encrypted too. If File 2 is very short, File 3 should be encrypted too. And so on.

Although this disclosure has been given with particular reference to video data, it will be appreciated that it could also be applied to other types of data such as audio data.

Encryption Examples

Encryption examples which may be used in examples of the invention include the following cryptographic schemes for encrypting/decrypting information content: symmetric key cryptography and asymmetric key cryptography. In symmetric key cryptography the key used to decrypt the information is the same as (or easily derivable from) the key used to encrypt the information. However, in asymmetric key cryptography the key used for decryption differs from that used for encryption and it should be computationally infeasible to deduce one key from the other. For asymmetric cryptography a public key / private key pair is generated, the public key (which need not be kept secret) being used to encrypt information and the private key (which must remain secret) being used to decrypt the information. An example of an asymmetric cryptography algorithm that may be used is the RSA (Rivest-Shamir- Adleman) algorithm. The RSA algorithm relies on a one-way function. The public key X is a product of two large prime numbers p and q, which together form the private key. The public key is inserted into the one-way function during the encryption process to obtain a specific one-way function tailored to the recipient's public key. The specific one-way function is used to encrypt a message. The recipient can reverse the specific one-way function only via knowledge of the private key (p and q). Note that X must be large enough so that it is infeasible to deduce p and q from a knowledge of X alone.

An alternative asymmetric encryption scheme to RSA is elliptical curve cryptography (ECC). Rather than generating keys as the product of very large prime number as in the case of RSA, ECC generates keys through properties of an elliptic curve equation. ECC is likely to be faster at key generation than RSA and hence is a preferred asymmetric encryption technique for the present arrangement. The information content stored on a storage medium may be symmetrically encrypted using one or more content encryption keys (CEKs) and the CEKs are asymmetrically encrypted using a public key I private key pair. In alternative examples, data may be asymmetrically encrypted. The CEKs are generated according to a binary tree encryption scheme and may be updated on a regular basis, e.g. every 10 seconds, so that different portions of encrypted information content correspond to different subsets of the CEKs. This allows for selective accessibility of the decrypted information content. Symmetric encryption may be used for encryption of information content because it is less numerically intensive and hence quicker than asymmetric encryption.

Video Analysis e.g. including Artificial Intelligence (Al) Analysis

A video navigation tool (eg. Blackbird Waveform) is provided. An example of such a video navigation tool is provided in the “A Method for Enabling Efficient Navigation of Video” section of this document.

On the whole, video content providers find it prohibitively demanding, e.g. in processing power, or in cost, to process their video content e g. using Al approaches. This applies, for example, to processing in the cloud their video content e.g. using Al approaches.

Processing video on the client, e.g. using Al approaches, is too computationally demanding, as the client device (e.g. smartphone, tablet computer, laptop, desktop computer, smart TV) processing power is normally occupied to a great extent in decompressing video, and showing the decompressed video on the screen.

As part of video content ingestion, in an example we prepare the navigation tool (e.g. as disclosed in the “A Method for Enabling Efficient Navigation of Video” section of this document) in association with the ingested video content.

It turns out that many of the things one would want analysis (e.g. Al analysis) to perform on a video can be performed by analysing (e.g. by Al) the navigation tool which is prepared in association with the ingested video content, instead of analysing the whole video file. For example, a black frame in a video, which possibly indicates the start of a clip, produces a vertical black line in the navigation tool for a video that is at least a few minutes long. So the analysis (e.g. Al analysis) could search the navigation tool for vertical black lines, to find possible candidate frames which are frames at the start of a clip. If the file size of the navigation tool content is 900 times smaller than the file size of the corresponding ingested video content, it is possible to search for a possible candidate frame which is a frame at the start of a clip 900 times faster (with corresponding reduction in energy usage) by searching the navigation tool rather than the corresponding ingested video content. For a search (e g. Al search) of a video of a football match, for frames including football players wearing red, one can search the navigation tool for vertical lines which contain red pixels above a threshold amount, with a similar improvement in search speed (with corresponding reduction in energy usage) to that described for the black frames. For a search (e.g. Al search) of a video, for frames including a flash, one can search the navigation tool for vertical lines which contain an increase in the fraction of white pixels above a threshold amount, with a similar improvement in search speed (with corresponding reduction in energy usage) to that described for the black frames.

In an example, analysis (e.g. Al analysis) can be performed on the navigation tool which is prepared in association with the ingested video content, to analyse where there are significant changes in the content. So for example, if the content of the vertical stripe is unchanging, or changes less than a threshold value, this is taken to indicate that no significant change has occurred in the video. But if the content of the vertical stripe changes more than a threshold value, this is taken to indicate that a significant change has occurred in the video, and the video at this point may be presented to a viewer, for them to view. In an example, in a wildlife video, no significant change is detected over a 12 hour video, except at two points, which correspond to a bird respectively leaving its nest, and returning to its nest. After the significant changes are detected in the analysis (e.g. Al analysis), the two points in the video, which correspond to a bird respectively leaving its nest, and returning to its nest, are offered to a viewer, for viewing. For example, a loud cheer could also be detected by analysis (e.g. Al analysis) of the navigation tool, because the navigation tool may include a representation of audio data in the ingested video.

A significant benefit to the search (e.g. Al search) of the navigation tool being much faster (e.g. 900 times faster) (with corresponding reduction in energy usage) than a search of the ingested video content, is that the former can be performed on the client device (e.g. smartphone, tablet computer, laptop, desktop computer, smart TV), rather than being performed in the cloud. A further benefit is that less data needs to be sent to the client device to perform the analysis (e.g. Al analysis) using the navigation tool, rather than using the whole ingested video, because the file size of the navigation tool is much smaller than the whole ingested video, e.g. 900 times smaller. Also the file download time, file download cost, and energy to perform the file download, are all similarly reduced.

For longer duration ingested video content, for which the navigation tool is of a size to fit in a screen of a client device (e.g. smartphone, tablet computer, laptop, desktop computer, smart TV), correspondingly greater reductions in the processing time to perform the analysis (e.g. Al analysis), in file download time, file download cost, and energy to perform the file download, are all similarly obtained, because the navigation tool file size is an even smaller faction of the longer duration ingested video content file size. Alternatively, the analysis (e.g. Al analysis) of the navigation tool can be performed at a server, e.g. at a cloud server, instead of performing an analysis (e.g. Al analysis) of the ingested video content at the server, and reductions in the processing time to perform the analysis (e.g. Al analysis), and in the energy to perform the analysis (e.g. Al analysis), are similarly obtained.

What we have described is an enabling technology for analysis (e.g. Al analysis), in that use of the navigation tool enables the use of analysis (e.g. Al analysis) on the client device, to analyse an entire video, to find a specific occurrence in an ingested video on a server, without using analysis (e g. Al analysis) to search the ingested video on the server, but instead using analysis (e.g. Al analysis) to search the navigation tool on the client device.

In terms of using the analysis (e.g. Al analysis), this can be personalized, so for example a user can select the content they want to have presented to them by the analysis (e.g. Al analysis). For example, they may want to see the parts of the football match in which the team they support scores goals. Or, in using the analysis (e.g. Al analysis), this can be personalized, so for example an analysis (e.g. Al analysis) may be performed of content viewed by a viewer in the past, and the analysis (e.g. Al analysis) can then search for similar content within a library of ingested video content, and can offer the content identified by the analysis (e.g. Al analysis) to the viewer, so the viewer can select what content to view.

When the analysis (e.g. Al analysis) of the navigation tool is performed, if a notable item in the navigation tool is identified, the analysis (e g. Al analysis) program can be configured to send an alert, such as to send an alert to a mobile phone.

Al has been a popular research topic for decades. But it has always needed a lot of CPU time, and in these days of the Cloud, this is typically provided by the cloud. This may be acceptable when there is one Al application and one or many people view the result, but the application where there are numerous Al analyses, maybe even one per viewer for a large number of viewers, this gets both expensive and distinctly energy inefficient and environmentally undesirable or “ungreen”.

We have long endeavoured to provide efficient processing at the client device (e.g. smartphone, tablet computer, laptop, desktop computer, smart TV).

In an example, source video is ingested once, and a (e.g. Blackbird) proxy is generated, along with the navigation tool (also known as a Video Waveform) which is generated at the same time (i.e. it is generated once, independent of the number of videos made or watched).

The navigation tool is a precis of the video at multiple temporal resolutions and typically at a smaller frame size eg 64x36, 32x36, 16x36, ... 1x36 pixels, or using multiple frames per pixel. The size can be changed easily to eg 128x72 and its derivatives, or arbitrary x by y images. The navigation tool pixels are generated by combining (eg averaging) the source image pixels which make up each navigation tool pixel.

The analysis (e.g. Al analysis) may be looking for something simple, such as a black frame. The presence of a black frame or other simply-defined frame content can be determined with a high degree of accuracy by looking at the navigation tool, at a faster processing speed. E.g. a representation in which a 1920x1080 pixel image is represented by a 64x36 pixel image is processed 1920x1080/64/36 = 900 times faster. Even better, a lower temporal resolution navigation tool may be used eg using 36x1 pixels per frame, which gives a reduction in pixels examined of 57,600 times, and hence a processing speed increase of 57,600 times.

These faster processing times are fast enough to perform the processing on even a slim client in JavaScript at a huge multiple of speed of real time, and at minimal cost if performed in the cloud.

In a more sophisticated method, the search (e.g. Al search) of the navigation tool finds candidate frames as described above, and then the candidate frames are investigated at higher resolution, e.g at 64x36 pixels, and/or using the original frames, for confirmation.

Many tasks (e.g. Al tasks) (e.g. video cut detection, face recognition, player identification in sport, vehicle detection from a drone) can be assisted by this rapid client-side processing.

In addition to video, analysing the audio navigation tool (which typically contains the maximum audio volume in each group of 2 ^An frames) can be used to very quickly rule out vast tracts of silence in an audio record, as well as to zoom in quickly (e.g. exponentially) on any unexpected sound and identify the corresponding frame, including its video component, for display or for further processing. Therefore a much simpler calculation, than analysing an entire set of frames and/or a set of audio samples, can be calculated extremely cheaply, using much less energy, to identify those areas of interest in a satisfactory way.

IMPROVEMENTS TO REPRESENTATIONS OF COMPRESSED VIDEO

This section of this document relates to disclosures made in W02005048607A1, US9179143B2 and US8711944B2.

There is provided a method of compressing digital data comprising the steps of (i) reading digital data as series of binary coded words representing a context and a codeword to be compressed, (ii) calculating distribution output data for the input data and assigning variable length codewords to the result; and (iii) periodically recalculating the codewords in accordance with a predetermined schedule, in order to continuously update the codewords and their lengths.

This disclosure relates to a method of processing of digital information such as video information. This digital video information may be either compressed for storage and then later transmission, or may be compressed and transmitted live with a small latency.

Transmission is for example over the internet. There is a need for highly efficient compression techniques to be developed to enable transmission of video or other data in real time over the internet because of the restrictions in the bandwidth. In addition, the increasing need for high volume of content and rising end-user expectations mean that a market is developing for live compression at high frame rate and image size.

An object of this disclosure is to provide such compression techniques.

The video to be compressed can be considered as comprising a plurality of frames, each frame made up of individual picture elements, or pixels. Each pixel can be represented by three components, usually either RGB (red, green and blue) or YUV (luminance and two chrominance values). These components can be any number of bits each, but eight bits of each is usually considered sufficient.

The human eye is more sensitive to the location of edges in the Y values of pixels than the location of edges in U and V. For this reason, the preferred implementation here uses the YUV representation for pixels.

The image size can vary, with more pixels giving higher resolution and higher quality, but at the cost of higher data rate. Where the source video is in PAL format, the image fields have 288 lines with 25 frames per second. Square pixels give a source image size of 384 x 288 pixels. The preferred implementation has a resolution of 376 x 280 pixels using the central pixels of a 384 x 288 pixel image, in order to remove edge pixels which are prone to noise and which are not normally displayed on a TV set.

The images available to the computer generally contain noise so that the values of the image components fluctuate. These source images may be filtered as the first stage of the compression process. The filtering reduces the data rate and improves the image quality of the compressed video.

A further stage analyses the contents of the video frame-by-frame and determines which of a number of possible types pixel should be allocated to. These broadly correspond to pixels in high contrast areas and pixels in low contrast areas.

The pixels are hard to compress individually, but there are high correlations between each pixel and its near neighbours. To aid compression, the image is split into one of a number of different types of components. The simpler parts of the image split into rectangular components, called "super-blocks" in this application, which can be thought of as single entities with their own structure. These blocks can be any size, but in the preferred implementation described below, the super-blocks are all the same size and are 8 x 8 pixel squares. More structurally complex parts of the image where the connection between pixels further apart is less obvious are split up into smaller rectangular components, called "mini-blocks" in this application.

It is apparent that if each super-block is compressed separately, the errors resulting from the compression process can combine across edges between super-blocks thus illustrating the block-like nature of the compression by highlighting edges between blocks, which is undesirable. To avoid this problem, the mini-blocks are tokenised with an accurate representation and these are compressed in a loss free way.

Each super-block or mini-block is encoded as containing YUV information of its constituent pixels.

This U and V information is stored at lower spatial resolution than the Y information, in one implementation with only one value of each of U and V for every mini -block. The super-blocks are split into regions. The colour of each one of these regions is represented by one UV pair.

Real time filtering

An aim is to remove noise from the input video, as noise is by definition hard to compress. The filtering mechanism takes frames one at a time. It compares the current frame with the previous filtered frame on a pixel-by-pixel basis. The value for the previous pixel is used unless there is a significant difference. This can occur in a variety of ways. In one, the value of the pixel in the latest frame is a long way from the value in the previous filtered frame. In another, the difference is smaller, but consistently in the same direction. In another, the difference is even smaller, but cumulatively, over a period of time, has tended to be in the same direction. In these the first two cases, the pixel value is updated to the new value. In the third case, the filtered pixel value is updated by a small amount in the direction of the captured video. The allowable error near a spatial edge is increased depending on the local contrast to cut out the effects of spatial jitter on the input video.

Real time motion estimation

The video frames are filtered into "Noah regions". Thus the pixels near to edges are all labelled. In a typical scene, only between 2% and 20% of the pixels in the image turn out to have the edge labelling. There are three types of motion estimation used. In the first, whole frame pan detection using integer number of pixels is implemented.

These motions can be implemented efficiently over the whole image on playback as pixels can be copied to new locations and no blurring is needed. This uses the edge areas from the Noah regions only, as the edges contain the information needed for an accurate motion search. The second is sub-pixel motion removal over the whole image.

This uses the edge areas from the Noah regions only, as the edges contain the information needed for an accurate motion search. The edge pixels in the image, estimated by example from the Noah filtering stage, are matched with copies of themselves with translations of up to e.g. 2 pixels, but accurate to e.g. 1/64 pixel (using a blurring function to smooth the error function) and small rotations. The best match is calculated by a directed search starting at a large scale and increasing the resolution until the required sub-pixel accuracy is attained. This transformation is then applied in reverse to the new image frame and filtering continues as before. These changes are typically ignored on playback. The effect is to remove artefacts caused by camera shake, significantly reducing data rate and giving an increase in image quality. The third type examines local areas of the image. Where a significant proportion of the pixels are updated, for example on an 8x8 pixel block, either motion vectors are tested in this area with patches for the now smaller temporal deltas, or a simplified super-block representation is used giving either 1 or 2 YUVs per block, and patches are made to this.

Real time fade representation

The encoding is principally achieved by representing the differences between consecutive compressed frames. In some cases, the changes in brightness are spatially correlated. In this case, the image is split into blocks or regions, and codewords are used to specify a change over the entire region, with differences with these new values rather than differences to the previous frame itself being used.

Segment Noah regions-find edges

A typical image includes areas with low contrast and areas of high contrast, or edges. The segmentation stage described here analyses the image and decides whether any pixel is near an edge or not. It does this by looking at the variance in a small area containing the pixel. For speed, in the current implementation, this involves looking at a 3x3 square of pixels with the current pixel at the centre, although implementations on faster machines can look at a larger area. The pixels which are not near edges are compressed using an efficient but simple representation which includes multiple pixels-for example 2x2 blocks or 8x8 blocks, which are interpolated on playback. The remaining pixels near edges are represented as either e. g. , 8x8 blocks with a number of YUV areas (typically 2 or 3) if the edge is simply the boundary between two or more large regions which just happen to meet here, or as 2x2 blocks with 1 Y and one UV per block in the case that the above simple model does not apply e.g. when there is too much detail in the area because the objects in this area are too small.

Miniblockify

The image is made up of regions, which are created from the Noah regions. The relatively smooth areas are represented by spatially relatively sparse YUV values, with the more detailed regions such as the Noah edges being represented by 2x2 blocks which are either uniform YUV, or include a UV for the block and maximum Y and a minimum Y, with a codeword to specify which of the pixels in the block should be the maximum Y value and which should be the minimum. To further reduce the datarate, the Y pairs in the non-uniform blocks are restricted to a subset of all possible Y pairs which is more sparse when the Y values are far apart.

Transitions with variable lengths codewords

Compressing video includes in part predicting what the next frame will be, as accurately as possible from the available data, or context. Then the (small) unpredictable element is what is sent in the bitstream, and this is combined with the prediction to give the result. The transition methods described here are designed to facilitate this process. On compression, the available context and codeword to compress are passed to the system. This then adds this information to its current distribution (which it is found performs well when it starts with no prejudice as the likely relationship between the context and the output codeword). The distribution output data for this context is calculated and variable length codewords assigned to the outcomes which have arisen. These variable length codewords are not calculated each time the system is queried as the cost/reward ratio makes it unviable, particularly as the codewords have to be recalculated on the player at the corresponding times they are calculated on the compressor. Instead, the codewords are recalculated from time to time. For example, every new frame, or every time the number of codewords has doubled. Recalculation every time an output word is entered for the first time is too costly in many cases, but this is aided by not using all the codeword space every time the codewords are recalculated. Codeword space at the long end is left available, and when new codewords are needed then next one is taken. As these codewords have never occurred up to this point, they are assumed to be rare, and so giving them long codewords is not a significant hindrance. When the codeword space is all used up, the codewords are recalculated. The minimum datarate for Huffman codewords is a very flat and wide minimum, so using the distribution from the codewords which have occurred so far is a good approximation to the optimal. Recalculating the codewords has to happen quickly in a real time system. The codewords are kept sorted in order of frequency, with the most frequent codewords first. In an example, the sorting is a mixture of bin sort using linked lists which is O(n) for the rare codewords which change order quite a lot, and bubble sort for the common codewords which by their nature do not change order by very much each time a new codeword is added. The codewords are calculated by keeping a record of the unused codeword space, and the proportion of the total remaining codewords the next data to encode takes. The shorted codeword when the new codeword does not exceed its correct proportion of the available codeword space is used. There are further constraints: in order to keep the codes as prefix codes and to allow spare space for new codewords, codewords never get shorter in length, and each codeword takes up an integer power of two of the total codeword space. This method creates the new codewords into a lookup table for quick encoding in O(n) where n is the number of sorted codewords.

Memory management

To facilitate Java playback, all the memory used is allocated in one block at the start. As garbage collection algorithms on Java virtual machines are unpredictable, and many stop the system for periods which are long in comparison to the length of a video frame, the computer method or apparatus may use its own memory management system. This involves allocating enough memory for e.g. 2 destination codewords for each source codeword when it is first encountered. New transitions are added as and when they occur, and when the available space for them overflows, the old memory is ignored, and new memory of twice the size is allocated. Although up to half the memory may end up unused, the many rare transitions take almost no memory, and the system scales very well and makes no assumption about the distribution of transitions.

Give compressed codeword for this uncompressed codeword

Every time a codeword occurs in a transition for the second or subsequent time, its frequency is updated and it is re-sorted. When it occurs for the first time in this transition however, it must be defined. As many codewords occur multiple times in different transitions, the destination value is encoded as a variable length codeword each time it is used for the first time, and this variable length codeword is what is sent in the bitstream, preceded by a "new local codeword" header codeword. Similarly, when it occurs for the first time ever, it is encoded raw preceded by a "new global codeword" header codeword. These header codewords themselves are variable length and recalculated regularly, so they start off short as most codewords are new when a new environment is encountered, and they gradually lengthen as the transitions and concepts being encoded have been encountered before.

Video compression (cuts)

Cuts are compressed using spatial context from the same frame.

Cuts, RLE uniform shape, else assume independent and context=CUT_CW.

Cuts- > editable, so needs efficient. First approximation at lower resolution e. g., 8x8. Cuts-predict difference in mini-block codewords from previous one and uniform flag for current one.

Video compression (deltas)

The deltas can use temporal and spatial context.

Deltas shape-predict shape from uniformness of four neighbours and old shape. Deltas-predict mini-block codeword differences from uniformness of this miniblock and old mini-block in time.

Datarate reductions

Various simple but effective datarate reduction methods are employed. Noise in the input signal can lead to isolated small changes over the image, whose loss would not be noticed. Isolated changed mini-blocks are generally left out from the bitstream, though if the mini-block is sufficiently different they can still be updated. In addition, small changes in colour in high colour areas are generally ignored as these are almost always caused by noise.

Multi-level gap masks: 4x4, 16x16, 64x64

The bulk of the images are represented mbs and gaps between them. The gaps are spatially and temporally correlated. The spatial correlation is catered for by dividing the image into 4x4 blocks of mbs, representing 64 pixels each, with one bit per miniblock representing whether the mbs has changed on this frame. These 4x4 blocks are grouped into 4x4 blocks of these, with a set bit if any of the mbs it represents have changed. Similarly, these are grouped into 4x4 blocks, representing 128x128 pixels, which a set bit if any of the pixels has changed in the compressed representation. It turns out that trying to predict 16 bits at a time is too ambitious as the system does not have time to learn the correct distributions in a video of typical length. Predicting the masks 4x2 pixels at a time works well. The context for this is the corresponding gap masks from the two previous frames. The transition infrastructure above then gives efficient codewords for the gaps at various scales.

Multiple datarates at once

One of the features of internet or intranet video distribution is that the audience can have a wide range of receiving and decoding equipment. In particular the connection speed may vary widely. In a system such as this designed for transmission across the internet, it helps to support multiple datarates. So the compression filters the image once, then resamples it to the appropriate sizes involving for example cropping so that averaging pixels to make the final image the correct size involves averaging pixels in rectangular blocks of fixed size. There is a sophisticated datarate targeting system which skips frames independently for each output bitstream. The compression is sufficiently fast on a typical modem PC of this time to create modem or midband videos with multiple target datarates. The video is split into files for easy access, and these files may typically be 10 seconds long, and may start with a key frame. The player can detect whether its pre-load is ahead or behind target and load the next chunk at either lower or higher datarate to make use of the available bandwidth. This is particularly important if the serving is from a limited system where multiple simultaneous viewers may wish to access the video at the same time, so the limit to transmission speed is caused by the server rather than the receiver. The small files will cache well on a typical internet setup, reducing server load if viewers are watching the video from the same ISP, office, or even the same computer at different times. Key frames

The video may be split into a number of files to allow easy access to parts of the video which are not the beginning. In these cases, the files may start with a key frame. A key frame contains all information required to start decompressing the bitstream from this point, including a cut-style video frame and information about the status of the Transition Tables, such as starting with completely blank tables.

Digital Rights Management (DRM)

DRM is an increasingly important component of a video solution, particularly now content is so readily accessible of the internet. Data typically included in DRM may be an expiry data for the video, a restricted set of URLs the video can be played from. Once the compressor itself is sold, the same video may be compressed twice with different DRM data in an attempt to crack the DRM by looking at the difference between the two files. The compression described here is designed to allow small changes to the initial state of the transition or global compression tables to effectively randomise the bitstream. By randomizing a few bits each time a video is compressed, the entire bitstream is randomized each time the video is compressed, making it much harder to detect differences in compressed data caused by changes to the information encoded in DRM.

Miscellaneous

The Y values for each pixel within a single super-block can also be approximated.

In many cases, there is only one or part of one object in a super-block. In these cases, a single Y value is often sufficient to approximate the entire super-block's pixel Y values, particularly when the context of neighbouring super-blocks is used to help reconstruct the image on decompression.

In many further cases, there are only two or parts of two objects in a super-block. In these cases, a pair of Y values is often sufficient to approximate the entire superblock's Y values, particularly when the context of the neighbouring super-blocks is used to help reconstruct the image on decompression. In the cases where there are two Y values, a mask is used to show which of the two Y values is to be used for each pixel when reconstructing the original super-block. These masks can be compressed in a variety of ways, depending on their content, as it turns out that the distribution of masks is very skewed. In addition, masks often change by small amounts between frames, allowing the differences between masks on different frames to be compressed efficiently.

Improvements to image quality can be obtained by allowing masks with more than two Y values, although this increases the amount of information needed to specify which Y value to use.

Although this disclosure has been given with particular reference to video data, it will be appreciated that it could also be applied to other types of data such as audio data.

Examples

Video frames of typically 384x288, 376x280, 320x240, 192x144, 160x120 or 128x96 pixels (see e.g. Figure 1) are divided into pixel blocks, typically 8x8 pixels in size (see e.g. Figure 2), and also into pixel blocks, typically 2x2 pixels in size, called mini-blocks (see e.g. Figure 3). In addition, the video frames are divided into Noah regions (see e.g. Figure 4), indicating how complex an area of the image is.

In one implementation, each super-block is divided into regions, each region in each super-block approximating the corresponding pixels in the original image and containing the following information:

1 Y values (typically 8 bits)

1 U value (typically 8 bits)

1 V value (typically 8 bits)

64 bits of mask specifying which YUV value to use when reconstructing this super- block.

In this implementation, each mini-block contains the following information:

2 Y values (typically 8 bits each)

1 U value (typically 8 bits)

1 V value (typically 8 bits)

4 bits of mask specifying which Y value to use when reconstructing this mini-block.

Temporal gaps

If more latency is acceptable, temporal gaps rather than spatial gaps turn out to be an efficient representation. This involves coding each changed mini-block with a codeword indicating the next time (if any) in which it changes.

Interpolation between Uniform Super-Blocks

Where uniform super-blocks neighbour each other, bilinear interpolation between the Y, U and V values used to represent each block is used to find the Y, U and V values to use for each pixel on playback.

In an example, there is provided a method of processing digital video information for transmission or storage after compression, said method comprising: reading digital data representing individual picture elements (pixels) of a video frame as a series of binary coded words; segmenting the image into regions of locally relatively similar pixels and locally relatively distinct pixels; having a mechanism for learning how contextual information relates to codewords requiring compression and encoding such codewords in a way which is efficient both computationally and in terms of compression rate of the encoded codewords and which dynamically varies to adjust as the relationship between the context and the codewords requiring compression changes and which is computationally efficient to decompress; establishing a reduced number of possible luminance values for each block of pixels (typically no more than four); encoding to derive from the words representing individual pixels further words describing blocks or groups of pixels each described as a single derived word which at least includes a representation of the luminance of a block component of at least eight by eight individual pixels (super-block); establishing a reduced number of possible luminance values for each block of pixels (typically no more than four); encoding to derive from the words representing individual pixels further words describing blocks or groups of pixels each described as a single derived word which at least includes a representation of the luminance of a block component of typically two by two individual pixels (mini-block); establishing a reduced number of possible luminance values for each block of pixels (typically one or two); providing a series of changeable stored masks as a mechanism for indicating which of the possible luminance values are to be used in determining the appropriate luminance value of each pixel for display; comparing and evaluating the words representing corresponding portions of one frame with another frame or frames in a predetermined sequential order of the elements making up the groups to detect differences and hence changes; identifying any of the masks which require updating to reflect such differences and choosing a fresh mask as the most appropriate to represent such differences and storing the fresh mask or masks for transmission or storage; using context which will be available at the time of decompression to encode the masks, the changes in Y values, U values, and V values, and the spatial or temporal gaps between changed blocks, combined with the efficient encoding scheme, to give an efficient compressed real time representation of the video; using variable length codewords to represent the result of transitions in a way which is nearly optimal from a compression point of view, and computational very efficient to calculate.

There is provided a method of compressing digital data comprising the steps of: (i) reading digital data as series of binary coded words representing a context and a codeword to be compressed; (ii) calculating distribution output data for the input data and assigning variable length codewords to the result ; and (iii) periodically recalculating the codewords in accordance with a predetermined schedule, in order to continuously update the codewords and their lengths.

The method may be one in which the codewords are recalculated each time the number of codewords has doubled. The method may be one in which the codewords are recalculated for every new frame of data. The method may be one in which some codeword space is reserved at each recalculation so as to allow successive new codewords to be assigned for data of lower frequency.

There is provided a method of processing digital video information so as to compress it for transmission or storage, said method comprising: reading digital data representing individual picture elements (pixels) of a video frame as a series of binary coded words; segmenting the image into regions of locally relatively similar pixels and locally relatively distinct pixels; establishing a reduced number of possible luminance values for each block of pixels (typically no more than four); carrying out an encoding process so as to derive from the words representing individual pixels, further words describing blocks or groups of pixels each described as a single derived word which at least includes a representation of the luminance of a block component of at least eight by eight individual pixels (super-block) ; establishing a reduced number of possible luminance values for each smaller block of pixels (typically no more than four); carrying out an encoding process so as to derive from the words representing individual pixels, further words describing blocks or groups of pixels each described as a single derived word which at least includes a representation of the luminance of a block component of typically two by two individual pixels (miniblock) ; establishing a reduced number of possible luminance values for each block of pixels (typically one or two); providing a series of changeable stored masks to indicate which of the possible luminance values are to be used in determining the appropriate luminance value of each pixel for display; comparing and evaluating the words representing corresponding portions of one frame with another frame or frames in a predetermined sequential order of the elements making up the groups to detect differences and hence changes; identifying any of the masks which require updating to reflect such differences and choosing a fresh mask as the most appropriate to represent such differences and storing the fresh mask or mask for transmission or storage; using context which will be available the time of decompression to encode the masks, the changes in Y values (luminance), U values (chrominance), and V values (chrominance) and the spatial or temporal gaps between changed blocks, combined with the efficient encoding scheme, to give an efficient compressed real time representation of the video; and using variable length codewords to represent the result of transitions. The method may be one in which the method further comprises an adaptive learning process for deriving a relationship between contextual information and codewords requiring compression, and a process for dynamically adjusting the relationship so as to optimise the compression rate and the efficiency of decompression.

There is provided a method of compressing digital data for storage or transmission, comprising the steps of

(i) reading inputted digital data as series of binary coded words representing a context and an input codeword to be compressed;

(ii) calculating distribution output data for the inputted digital data and generating variable length prefix codewords for each combination of context and input codeword, and generating a respective sorted Transition Table of variable length prefix codewords for each context, in a manner in which codeword space at the long end is left available to represent new input codewords, which have not yet occurred with corresponding contexts, as they occur; and

(iii) repeating the process of step (ii) from time to time;

(iv) whereby the inputted digital data can be subsequently replayed by recalculating the sorted Transition Table of local codewords at corresponding times in the inputted digital data.

The method may be one in which the codewords are recalculated for every new frame of data. The method may be one in which some codeword space is reserved at each recalculation so as to allow successive new codewords to be assigned for data of lower frequency. The method may be one in which some codeword space is reserved at each recalculation so as to allow successive new codewords to be assigned for data of lower frequency.

There is provided a method of compressing digital data for storage or transmission, comprising the steps of:

(i) reading digital data as a series of binary coded words representing a context and a codeword to be compressed;

(ii) calculating distribution output data for the input data and generating variable length prefix codewords for each combination of context and input codeword so as to form a respective sorted Transition Table of local codewords for each context, in a manner which reserves logical codeword space at the long end to represent any new input codewords, which have not yet occurred with that context, as they occur for the first time; and

(iii) repeating the process of step (ii) from time to time;

(iv) whereby the input data can be subsequently replayed by recalculating the codeword tables at corresponding times in the input data, wherein the codewords are recalculated each time the number of codewords has doubled.

There is provided a method of compressing digital data for storage or transmission, comprising the steps of:

(i) reading digital data as a series of binary coded words representing a context and a codeword to be compressed;

(iii) repeating the process of step (ii) from time to time;

(iv) whereby the input data can be subsequently replayed by recalculating the codeword tables at corresponding times in the input data, wherein the method further comprises an adaptive learning process for deriving a relationship between contextual information and codewords requiring compression, and a process for dynamically adjusting the relationship so as to optimize the compression rate and the efficiency of decompression.

A METHOD OF COMPRESSING VIDEO DATA AND A MEDIA PLAYER FOR IMPLEMENTING THE METHOD

This section of this document relates to disclosures made in W02007077447A2 and US8660181B2. There is provided a method of receiving video data comprising the steps of: receiving at least one chunk of video data comprising a number of sequential key video frames where the number is at least two and, constructing at least one delta frame between a nearest preceding key frame and a nearest subsequent key frame from data contained in the either or each of the nearest preceding and subsequent frames.

Visual recordings of moving things are generally made up of sequences of successive images. Each such image represents a scene at a different time or range of times. This disclosure relates to such sequences of images such as are found, for example, in video, fdm and animation

Video takes a large amount of memory, even when compressed. The result is that video is generally stored remotely from the main memory of the computer. In traditional video editing systems, this would be on hard discs or removable disc storage, which are generally fast enough to access the video at full quality and frame rate. Some people would like to access and edit video files content remotely, over the internet, in real time. This disclosure relates to the applications of video editing (important as much video content on the web will have been edited to some extent), video streaming, and video on demand.

At present any media player editor implementing a method of transferring video data across the internet in real time suffers the technical problems that: (a) the internet connection speed available to internet users is, from moment to moment, variable and unpredictable; and (b) that the central processing unit (CPU) speed available to internet users is from moment to moment variable and unpredictable.

For the application of video editing, consistent image quality is very preferable, because many editing decisions are based on aspects of the image, for example, whether the image was taken in focus or out.

It is an object of the present disclosure to alleviate at least some of the aforementioned technical problems. Accordingly this disclosure provides a method of receiving video data comprising the steps of: receiving at least one chunk of video data comprising a number (n) of sequential key video frames where the number (n) is at least two and, constructing at least one delta frame between a nearest preceding key frame and a nearest subsequent key frame from data contained in either, or each, of the nearest preceding and subsequent frames.

Preferably the delta frame is composed of a plurality of component blocks or pixels and each component of the delta frame is constructed according to data indicating it is one of: the same as the corresponding component in the nearest preceding key frame, or the same as the corresponding component in the nearest subsequent key frame, or a new value compressed using some or all of the spatial compression of the delta frame and information from the nearest preceding and subsequent frames. After the step of construction, the delta frame may be treated as a key frame for the construction of one or more further delta frames. Delta frames may continue to be constructed in a chunk until either: a sufficiently good predetermined image playback quality criterion is met or the time constraints of playing the video in real time require the frames to be displayed. The number of key frames in a chunk may be in the range from n=3 to n=10.

Although the method may have other applications, it is particularly advantageous when the video data is downloaded across the internet. In such a case it is convenient to download each key frame in a separate download slot, the number of said download slots equating to the maximum number of download slots supportable by the internet connection at any moment in time. Preferably each slot is implemented in a separate thread. Where it is desired to subsequently edit the video it is preferable that each frame, particularly the key frames, are cached upon first viewing to enable subsequent video editing.

According to another aspect of this disclosure, there is provided a media player arranged to implement the method which preferably comprises a receiver to receive chunks of video data including at least two key frames, and a processor adapted to construct a delta frame sequentially between a nearest preceding key frame and a nearest subsequent key frame. Preferably, a memory is also provided for caching frames as they are first viewed to reduce the subsequent requirements for downloading.

According to a third aspect of this disclosure, there is provided a method of compressing video data so that the video can be streamed across a limited bandwidth connection with no loss of quality on displayed frames which entails storing video frames at various temporal resolutions which can be accessed in a pre-defined order, stopping at any point. Thus multiple simultaneous internet accesses can ensure a fairly stable frame rate over a connection by (within the resolution of the multitasking nature of the machine) simultaneously loading the first or subsequent temporal resolution groups of frames from each of a number of non-intersecting subsets of consecutive video frames until either all the frames in the group are downloaded, or there would probably not be time to download the group, in which case a new group is started.

This disclosure includes a method for enabling accurate editing decisions to be made over a wide range of internet connection speeds, as well as video playback which uses available bandwidth efficiently to give a better experience to users with higher bandwidth. Traditional systems have a constant frame rate, but the present disclosure relates to improving quality by adding extra delta frame data, where bandwidth allows.

A source which contains images making up a video, film, animation or other moving picture is available for the delivery of video over the internet. Images (2, 4, 6...) in the source are digitised and labelled with frame numbers (starting from zero) where later times correspond to bigger frame numbers and consecutive frames have consecutive frame numbers. The video also has audio content, which is split into sections.

The video frames are split into chunks as follows: A value of n is chosen to be a small integer 0<n. In one implementation, n is chosen to be 5. A chunk is a set of consecutive frames of length 2 ^An. All frames appear in at least one chunk, and the end of each chunk is always followed immediately by the beginning of another chunk. "f" represent the frame number in the chunk, where the earliest frame (2) in each chunk has f=0, and the last (8) has f=(2 ^An)-l (see e.g. Figure 10).

All f=0 frames in a chunk are compressed as key frames - that is they can be recreated without using data from any other frames. All frames equidistant in time between previously compressed frames are compressed as delta frames recursively as follows: Let frame C (see e.g. Figure 11) be the delta frame being compressed. Then there is a nearest key frame earlier than this frame, and a nearest key frame later than this frame, which have already been compressed. Let us call them E and L respectively. Each frame is converted into a spatially compressed representation, in one implementation comprising rectangular blocks of various sizes with four Y or UV values representing the four corner values of each block in the luminance and chrominance respectively.

Frame C is compressed as a delta frame using information from frames E and L (which are known to the decompressor), as well as information as it becomes available about frame C.

In one implementation, the delta frame is reconstructed as follows:

Each component (12) of the image (pixel or block) is represented as either: the same as the corresponding component (10) in frame E; or the same as the corresponding component (14) in frame L; or a new value compressed using some or all of spatial compression of frame C, and information from frames E and L.

Compressing the video data in this way allows the second part of the disclosure to function. This is described next. When transferring data across the internet, using the HTTP protocol used by web browsers, the described compression has advantages, for example enabling access through many firewalls. The two significant factors relevant to this disclosure are latency and bandwidth. The latency here is the time taken between asking for the data and it starting to arrive. The bandwidth here is the speed at which data arrives once it has started arriving. For a typical domestic broadband connection, the latency can be expected to be between 20ms and Is, and the bandwidth can be expected to be between 256kb/s and 8Mb/s.

The disclosure involves one compression step for all supported bandwidths of connection, so the player (e.g. 16, Figure 12) has to determine the data to request which gives the best playback experience. This may be done as follows:

The player has a number of download slots (20, 22, 24...) for performing overlapping downloads, each running effectively simultaneously with the others. At any time, any of these may be blocked by waiting for the latency or by lost packets. Each download slot is used to download a key frame, and then subsequent files (if there is time) at each successive granularity. When all files pertaining to a particular section are downloaded, or when there would not be time to download a section before it is needed for decompression by the processor (18), the download slot is applied to the next unaccounted for key frame.

In one implementation of the disclosure, each slot is implemented in a separate thread.

A fast link results in all frames being downloaded, but slower links download a variable frame rate at e.g. 1, 1/2, 1/4, 1/8 etc of the frame rate of the original source video for each chunk. This way the video can play back with in real time at full quality, possibly with some sections of the video at lower frame rate.

In a further implementation, as used for video editing, frames downloaded in this way are cached in a memory (20 A) when they are first seen, so that on subsequent accesses, only the finer granularity videos need be downloaded.

The number of slots depends on the latency and the bandwidth and the size of each file, but is chosen to be the smallest number which ensures the internet connection is fully busy substantially all of the time.

In one implementation, when choosing what order to download or access the data in, the audio is given highest priority (with earlier audio having priority over later audio), then the key frames, and then the delta frames (within each chunk) in the order required for decompression with the earliest first.

There is provided a method of receiving video data comprising the steps of: receiving at least one chunk of video data comprising a number (n) of sequential key video frames where the number (n) is at least two and, constructing at least one delta frame (C) between a nearest preceding key frame (E) and a nearest subsequent key frame (L) from data contained in the either or each of the nearest preceding and subsequent frames.

The method may be one wherein the delta frame (C) is composed of a plurality of component blocks or pixels and each component of the delta frame is constructed according to data indicating it is one of:

(a) the same as the corresponding component in the nearest preceding key frame (E), or

(b) the same as the corresponding component in the nearest subsequent key frame (L), or (c) a new value compressed using some or all of the spatial compression of frame C, and information from the nearest preceding and subsequent frames.

The method may be one wherein after the step of construction, the delta frame is treated as a key frame for the construction of one or more delta frames.

The method may be one wherein delta frames continue to be constructed in a chunk until either: a sufficiently good predetermined image playback quality criterion is met or the time constraints of playing the video in real time require the frames to be displayed.

The method may be one wherein the number of key frames is in the range from n=3 to n=10.

The method may be one comprising downloading the video data across the internet.

The method may be one comprising downloading each key frame in a separate download slot, the number of said download slots equating to the maximum number of download slots supportable by the internet connection at any moment in time.

The method may be one wherein each slot is implemented in a separate thread.

The method may be one wherein each frame is cached upon first viewing to enable subsequent video editing.

The method may be one wherein the key frames are cached.

There is provided a media player configured to implement the method according to any one of the above statements.

The media player may be one having: a receiver to receive chunks of video data including at least two key frames, a processor adapted to construct a delta frame sequentially between a nearest preceding key frame and a nearest subsequent key frame.

There is provided a method of compressing video data so that the video can be streamed across a limited bandwidth connection with no loss of quality on displayed frames, the method comprising storing video frames at various temporal resolutions which can be accessed in a pre-defined order, stopping at any point.

The method may be one where multiple simultaneous internet accesses can ensure a fairly stable frame rate over a connection by simultaneously loading the first or subsequent temporal resolution groups of frames from each of a number of nonintersecting subsets of consecutive video frames until either all the frames in the group are downloaded, or until a predetermined time has elapsed, and then in starting a new group.

There is provided a method of compressing video data with no loss of frame image quality on the displayed frames, by varying the frame rate relative to the original source video, the method comprising the steps of: receiving at least two chunks of uncompressed video data, each chunk comprising at least two sequential video frames and, compressing at least one frame in each chunk as a key frame, for reconstruction without the need for data from any other frames, compressing at least one intermediate frame as a delta frame between a nearest preceding key frame and a nearest subsequent key frame from data contained in either or each of the nearest preceding and subsequent frames, wherein further intermediate frames are compressed as further delta frames within the same chunk, by treating any previously compressed delta frame as a key frame for constructing said further delta frames, and storing the compressed video frames at various mutually exclusive temporal resolutions, which are accessed in a pre-defined order, in use, starting with key frames, and followed by each successive granularity of delta frames, stopping at any point; and whereby the frame rate is progressively increased as more intermediate data is accessed.

The method may be one wherein the delta frame is composed of a plurality of component blocks or pixels and each component of the delta frame is constructed according to data indicating it is one of:

(a) the same as the corresponding component in the nearest preceding key frame, or

(b) the same as the corresponding component in the nearest subsequent key frame, or

(c) a new value compressed using some or all of the spatial compression of frame, and information from the nearest preceding and subsequent frames.

The method may be one wherein after the step of construction, the delta frame is treated as a key frame for the construction of one or more delta frames.

The method may be one wherein delta frames continue to be constructed in a chunk until either: a predetermined image playback quality criterion, including a frame rate required by an end-user, is met or the time constraints of playing the video in real time require the frame to be displayed.

The method may be one wherein the number of frames in a chunk is 2 ^An, and n is in the range from n=3 to n=10.

The method may be one comprising downloading the video data across the internet.

The method may be one comprising downloading each key frame in a separate download slot, the number of said download slots equating to the minimum number to fully utilize the internet connection.

The method may be one wherein each slot is implemented in a separate thread.

The method may be one wherein each frame is cached upon first viewing to enable subsequent video editing.

The method may be one wherein the key frames are cached.

There is provided a method of processing video data comprising the steps of: receiving at least one chunk of video data comprising 2 ^An frames and one key video frame, and the next key video frame; constructing a delta frame (C) equidistant between a nearest preceding key frame (E) and a nearest subsequent key frame (L) from data that includes data contained in either or each of the nearest preceding and subsequent key frames; constructing additional delta frames equidistant between a nearest preceding key frame and a nearest subsequent key frame from data that includes data contained in either or each of the nearest preceding and subsequent key frames, wherein at least one of the nearest preceding key frame or the nearest subsequent key frame is any previously constructed delta frame; storing the additional delta frames at various mutually exclusive temporal resolutions, which are accessible in a pre-defined order, in use, starting with the key frames, and followed by each successive granularity of delta frames, stopping at any point; and continuing to construct the additional delta frames in a chunk until either a predetermined image playback quality criterion, including a user selected frame rate, is achieved, or a time constraint associated with playing of the chunk of video data in real time requires the frames to be displayed. The method may be one further comprising downloading the at least one chunk of video data at a frame rate that is less than an original frame rate associated with the received video data.

The method may be one further comprising determining a speed associated with the receipt of the at least one image chunk, and only displaying a plurality of constructed frames in accordance with the time constraint and the determined speed.

A METHOD FOR ENABLING EFFICIENT NAVIGATION OF VIDEO

This section of this document relates to disclosures made in EP1738365B1, W02005101408A1 and US8255802B2.

A method is provided of facilitating navigation of a sequence of source images, the method using tokens representing each source image which are scaled versions of each source image and which are arranged adjacently on a display device in a continuous band of token images so that a pointer device can point to a token and the identity of the corresponding image is available for further processing.

Visual recordings of moving things are generally made up of sequences of successive images. Each such image represents a scene at a different time or range of times. This disclosure relates to recordings including sequences of images such as are found, for example, in video, film and animation.

The common video standard PAL used in Europe comprises 25 frames per second. This implies that an hour of video will include nearly 100,000 frames. Other video formats, such as the NTSC standard used in the USA and Japan, have similar number of frames per hour as PAL.

A requirement for a human operator to locate accurately and to access reliably a particular frame from within many can arise. One application where this requirement arises is video editing. In this case, the need may not just be for accurate access on the scale of individual frames, but also easy access to different scenes many frames apart. In other words, there is a need to be able to access video frames over a range of time scales which may be up to five or six orders of magnitude apart.

The disclosure provided herein includes a method for enabling efficient access to video content over a range of temporal scales.

Assume there is a source which contains images making up a video, film, animation or other moving picture. Images in the source are digitised and labelled with frame numbers where later times correspond to bigger frame numbers and consecutive frames have consecutive frame numbers.

Each image is given an associated token image, which may be a copy of the source image. In practice, these source images may be too big to fit many on a display device such as a computer screen, a smartphone screen, or a tablet screen, at the same time. In this case, the token image will be a reduced size version of the original image. The token images are small enough that a number of token images can be displayed on the display device at the same time. In an application according to this disclosure, this size reduction is achieved by averaging a number of pixels in the source image to give each corresponding pixel in the smaller token images. There are many tools available to achieve this. In this application, there are typically between ten and fifty token images visible at a time.

Referring to Figure 13, in an example, there is provided a computer display whose resolution is 1024x768 pixels, and the images (102) from the source video are digitised at 320x240 pixels, and the tokens (104) representing the source images are 32x24 pixels. In one commercial application, the token images have the same aspect ratio as the original images.

The token images are then combined consecutively with no gaps between them in a continuous band (106) which is preferably horizontal. This band is then displayed on the computer screen, although if the source is more than a few images in length, the band will be wider than the available display area, and only a subset of it will be visible at any one time. The video is navigated to frame accuracy by using a pointing device, such as a mouse, which is pointed at a particular token within the horizontal band. This causes the original image corresponding to this token to be selected. Any appropriate action can then be carried out on the selected frame. For example, the selected frame can then be displayed. In another example, the time code of the selected frame can be passed on for further processing. In a further example, the image pixels of the selected frame can be passed on for further processing.

In a further refinement, in one implementation, when the pointing device points near to the edge (108) or (110) of the displayed subset of the horizontal band, the band automatically and smoothly scrolls so that the token originally being pointed to moves towards the centre of the displayed range. This allows access beyond the original displayed area of the horizontal band.

The above description therefore shows how frame accurate access is simple for short clips. The same principle can be extended to longer sequences of source image frames, as illustrated for example in Figure 14.

Each token is reduced in size, but this time only horizontally. This reduction leaves each new token (112) at least one pixel wide. Where the reduction in size is by a factor of x, the resulting token is called an x-token within this document. So, for example, 2-tokens are half the width of tokens, but the same height. The x-tokens are then displayed adjacent to each other in the same order as the original image frames to create a horizontal band as with the tokens, but with the difference that more of these x-tokens fit in the same space than the corresponding tokens, by a factor of x.

Navigation proceeds as before, the difference being that each x-token is narrower than before, so that more of them are visible than with the original tokens, and a smaller pointer movement is needed to achieve the same movement in frames.

In one such implementation, the space (114) allocated to the horizontal band for tokens and x-tokens is 320 pixels. The tokens (104) are 32x24 pixels, and the x-tokens (112) are created in a variety of sizes down to 1x24 pixels. In the 32-token case, the horizontal band corresponds to 320 frames of video, compared with ten frames for the token image. This range of 320 frames can be navigated successfully with the pointer.

This design is a significant departure from existing commercial systems where instead of a horizontal band made of all the x-tokens, the corresponding band may contain one token in every x. In this disclosure, subject to the colour resolution of the display device, every pixel in every image contributes some information to each horizontal band. Even with x-tokens only one pixel wide, the position of any cut (116) on the source is visible to frame accuracy, as are sudden changes in the video content.

The x-tokens are fine for navigating short clips, but to navigate longer sources, further horizontal reductions are required, see e g. Figure 15. In the case where each horizontal pixel on the horizontal display band represents y frames, the horizontal band made of 1 pixel wide x-tokens is squashed horizontally by a factor of y. If y is an integer, this is achieved by combining y adjacent non-intersecting sets of 1 pixel wide x-tokens (by for example averaging) to make a y-token one pixel wide and the same height as the tokens. Significant changes of video content (118, 120) can still be identified, even for quite large values of y.

In one implementation, values of x and y used are powers of two, and the resulting horizontal display bands represent all scales from 0 frames to 5120 frames. Larger values of y will be appropriate for longer videos.

In the x-tokens and y-tokens, the values of x and y need not be integers, although appropriate weightings between vertical lines within image frames and between image frames will then be needed if image artefacts are to be avoided.

In one implementation, the tokens, x-tokens and y-tokens are created in advance of their use for editing in order to facilitate rapid access to the horizontal bands. The x- tokens and y-tokens are created at multiple resolutions. Switching between horizontal bands representing different scales is facilitated by zoom in and zoom out buttons (122, 124) which move through the range of horizontal contractions available. There is provided a method of facilitating navigation of a sequence of source images, the method using tokens representing each source image which are scaled versions of each source image and which are arranged adjacently on a display device in a continuous band of token images so that a pointer device can point to a token and the identity of the corresponding image is available for further processing.

The method may be one where one or more new bands can be constructed by squashing the band in the longitudinal direction by one or more factors in each case squashing by a factor which is no wider than the pixel width of the individual tokens making up the band.

The method may be one where neighbouring tokens are first combined to make new tokens corresponding to multiple frames and these new tokens are arranged next to each other in a band. The method may be one where the widths and heights of different tokens differ. The method may be one in which the band is arranged horizontally on a display device together with a normal video display generated from the source images. The method may be one which is so arranged that, when the pointer device points to a token near to the edge of the displayed subset of the continuous band, the band automatically scrolls, so that the token moves towards the centre of the displayed range, thereby allowing access to a region beyond the original displayed area.

A method of facilitating navigation of a sequence of source images (102) the method using tokens (104) representing each source image which are scaled versions of each source image and which are arranged adjacently on a display device in a continuous band (106) of token images so that a pointer device can point to a token and the identity of the corresponding image is available for further processing, whereby one or more new bands can be constructed by reducing the continuous band (106) in the longitudinal direction by reducing only the width of the tokens (104) by a factor x, each new token being an x-token no wider than the pixel width of the individual tokens (104) making up the continuous band and being at least one pixel wide, so as to provide a band having more tokens (112) than the continuous band (106), wherein the method further comprises longitudinally squashing the band having more tokens (112) than the continuous band (106) by a factory by combining y adjacent nonintersecting sets of x-tokens so as to provide a squashed band of y-tokens (121). A related example is shown in Figure 13. A related example is shown in Figure 14.

A method of facilitating navigation of a sequence of source images, via a display device and under computer control, the method comprising: generating a plurality of token images, each being a digitized representation of a scaled down version of a respective source image, by transforming said source images into token images for display on said display device; creating an arrangement of said token images on the display device in a continuous band of token images arranged adjacently; and responding to a computer controlled pointer device pointing to a token image on the display device by identifying the corresponding image for further processing, the method further comprising, transforming the continuous band of token images, each token image having a multi pixel width and a multi -pixel height into at least one new squashed band by squashing the token images in a continuous band of token images in the longitudinal direction only, by one or more factors using pixel averaging, to create said at least one new squashed band of squashed token images, whereby each individual squashed token image can be reduced to a maximum of a single pixel width and a multi-pixel height.

Note

It is to be understood that the above-referenced arrangements are only illustrative of the application for the principles of the present invention. Numerous modifications and alternative arrangements can be devised without departing from the spirit and scope of the present invention. While the present invention has been shown in the drawings and fully described above with particularity and detail in connection with what is presently deemed to be the most practical and preferred example(s) of the invention, it will be apparent to those of ordinary skill in the art that numerous modifications can be made without departing from the principles and concepts of the invention as set forth herein.

Previous Patent: VISUAL INDICATOR OF CHANGED LUMINANCE AND CHROMATICITY STATUS OF A HIGH VISIBILITY GARMENT

Next Patent: HARD SURFACING OF METALLIC WEAR COMPONENT