Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS AND SYSTEMS FOR AUTOMATICALLY AND EFFICIENTLY CATEGORIZING, TRANSMITTING, AND MANAGING MULTIMEDIA CONTENTS
Document Type and Number:
WIPO Patent Application WO/2013/185237
Kind Code:
A1
Abstract:
Methods and systems for automatically and efficiently categorizing, securely transmitting, and managing multimedia contents generated from a network connection enabled digital data generating device by way of multimedia transcoding, compression, and/or classification. Content similarity between multimedia data units with one unit corresponding to one image, one video frame, or one audio frame is first detected and identified, and then used to help encode conditionally one unit given another. Based on the identified content similarity and/or the resulting compression efficiency, these multimedia units are automatically categorized in terms of their contents along with their locations, occasions, events, and/or features. Whenever network connections selected by a user are available, these encoded and categorized multimedia data units along with their metadata containing side information including their locations, occasions, events, features, and other information specified by the user are then automatically and securely transmitted to networks, cloud servers, and/or other network connected devices.

Inventors:
YANG DR EN-HUI (CA)
Application Number:
PCT/CA2013/050455
Publication Date:
December 19, 2013
Filing Date:
June 14, 2013
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
YANG DR EN-HUI (CA)
International Classes:
G06F17/00; G06F7/00; H04N21/436; H04N21/458; H04N21/83
Domestic Patent References:
WO2009097265A12009-08-06
Foreign References:
US20100332958A12010-12-30
Other References:
See also references of EP 2862100A4
Attorney, Agent or Firm:
RIDOUT AND MAYBEE LLP (5th FloorToronto, Ontario M5H 3E5, CA)
Download PDF:
Claims:
WHAT IS CLAIMED IS :

1. A method for managing multimedia content of a device having access to a plurality of successive media data units, the method comprising :

comparing media data units within the plurality of successive media data units to determine similarity; and

when the determined similarity between compared media data units is within a similarity threshold, automatically categorizing the compared media data units within the same category.

2. The method of claim 1 further comprising the device generating at least one of the plurality of successive media data units.

3. The method of claim 1 wherein the generating is implemented from a media capturing component of the device.

4. The method of claim 1 wherein comparing media data units comprises comparing content similarity between the media data units. 5. The method of claim 1 further comprising referencing one of the media data units as a reference media data unit.

6. The method of claim 5 wherein the comparing comprises comparing, in succession, each of the media data units with the reference media data unit.

7. The method of claim 6 further comprising, when one of the compared media data units is not within the similarity threshold, referencing the one of the compared media data units as the reference media data unit. 8. The method of claim 6 further comprising, when the determined similarity of a compared media data unit to the reference media data unit is within the similarity threshold, encoding the compared media data unit in dependence on the reference media data unit.

9. The method of claim 6 further comprising encoding, when the determined similarity of a compared media data unit to the reference media data unit is not within the similarity threshold, without dependence on the reference media data unit, the compared media data unit which is not already in an encoded format.

10. The method of claim 5 wherein the reference media data unit is initialized as a blank media data unit or a previous media data unit.

11. The method of claim 8 further comprising grouping, in succession, each of the encoded and compared media data units together with the reference media data unit.

12. The method of claim 5 wherein the similarity threshold is a compression threshold, and wherein the comparing media data units comprises conditionally encoding a selected media data unit in dependence on the reference media data unit and the determined similarity being determined when the resulting

compression efficiency is within a compression threshold.

13. The method of any one of claims 1 to 12 further comprising, prior to comparing of any one of the media data units, decoding the any one of the media data units which is in an encoded format.

14. The method of any one of claims 1 to 13 wherein the multimedia data units each correspond to one image, one video frame or one audio frame.

15. The method of any one of claims 1 to 14 wherein the multimedia data units each have associated side data specifying for the multimedia data unit at least one of a location, occasion, event and feature; wherein automatically categorizing the compared media data units includes sub-categorizing multimedia units that are within the same category according to the side data.

16. The method of any one of claims 1 to 15 wherein the device is network connection enabled, the method further comprising, when a preselected network connection is available, automatically transmitting the categorized multimedia data units to one or more remote computing devices.

17. The method of claim 16, further comprising automatically transmitting the categorized multimedia data units from the one or more remote computing devices to one or more further digital data generating devices.

18. The method of claim 16, wherein the automatically transmitting is part of a synchronizing process between the device and the one or more remote computing devices.

19. A device, comprising :

memory;

a component configured to access a plurality of successive media data units; and

a processor configured to execute instructions stored in the memory in order to:

compare media data units within the plurality of successive media data units to determine similarity, and

when the determined similarity between compared media data units is within a similarity threshold, automatically categorize the compared media data units within the same category. 20. The device of claim 19 further comprising a media capturing component configured to generate the plurality of successive media data units.

21. The device of claim 19, wherein the memory stores the plurality of successive media data units.

22. A non-transitory computer readable medium containing instructions executable by a processor of a device for managing media content of the device, the instructions comprising :

instructions for comparing media data units within the plurality of successive media data units to determine similarity; and

instructions for, when the determined similarity between compared media data units is within a similarity threshold, automatically categorizing the compared media data units within the same category.

23. A device, comprising :

memory;

a component configured to access a plurality of successive media data units; and

a processor configured to execute instructions stored in the memory in order to perform the method of any one of claims 1 to 18. 24. The device of claim 23 further comprising a media capturing component configured to generate the plurality of successive media data units.

25. The device of claim 23, wherein the memory stores the plurality of successive media data units.

26. A non-transitory computer readable medium containing instructions executable by a processor of a device for managing media content of the device, the instructions comprising instructions for performing the method of any one of claims 1 to 18.

Description:
METHODS AND SYSTEMS FOR AUTOMATICALLY AND EFFICIENTLY CATEGORIZING, TRANSMITTING, AND MANAGING MULTIMEDIA CONTENTS

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of priority to U.S. Provisional Patent Application Number 61/660,258 filed June 15, 2012, the contents of which are hereby incorporated by reference.

TECHNICAL FIELD

[0002] At least some example embodiments generally relate to automatic categorization, transmission, and management of multimedia contents, and in particular to methods and systems that automatically and efficiently categorize multimedia contents.

BACKGROUND [0003] The convergence of wireless communications and mobile computing is happening at a speed much faster than most of us have anticipated. According to a white paper from Cisco entitled "Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2011-2016" (since updated on February 6, 2013 for 2012- 2017), the global mobile data traffic in 2011 was 597 petabytes per month, more than eight times the size of the entire global Internet traffic in 2000; again in 2011, the mobile data traffic originating from phones and tablets was 217 petabytes per month, more than 36% of the entire global mobile data traffic. It is expected that the number of mobile-connected devices will soon exceed the world's population. With so many mobile-connected devices, both downlink mobile data traffic and uplink mobile data traffic, particularly multimedia traffic including photos and videos generated by users themselves, will continue to grow phenomenally. [0004] Although the proliferation of high-end smartphones, tablets, network connection enabled digital cameras and camcorders, and other high-end network connection enabled devices makes it easy for users to generate multimedia contents such as photos, videos, and audios, managing these data and moving them onto networks, cloud servers, home and office computers, and/or other network connected devices is quite challenging at this point. Part of the reason lies in the huge volume of these data; part of the reason lies in the fact that these data are in general unstructured. For example, on tablets, smartphones, and digital cameras, photos, right after being captured, are normally saved separately as independent files, one photo per file, indexed sequentially according to the order they were taken, and named with their respective indices and possibly along with information on the location and time at which they were taken. As clearly illustrated in Figure 1, a screen shot taken from a Mac Pro™ computer, these independent files are stored in an unorganized structure. To move these photos from their network connection enabled generating device onto networks, cloud servers, home and office computers, and/or other network connected devices, users could manually do one of the following :

[0005] (1) open a client application on the generating device and then send/upload these photos via the client application to their desired destinations, or [0006] (2) use a cable such as a universal serial bus to connect the generating device to a personal computer (PC) or take a memory stick out from the generating device and plug it into the PC, and then upload these photos to the PC and other desired destinations.

[0007] Clearly, none of the above approaches is convenient.

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] Reference will now be made, by way of example, to the accompanying drawings which show example embodiments, in which : [0009] Figure 1 illustrates an example user interface screen for photo organization;

[0010] Figure 2A illustrates an example user interface screen which illustrates photo structure for a mobile communication client device; [0011] Figure 2B illustrates an example user interface screen which illustrates a corresponding photo structure for a server device;

[0012] Figure 3 illustrates a flow diagram of a first example method for managing multimedia content of a device, illustrating a categorization only procedure, in accordance with an example embodiment; [0013] Figure 4 illustrates a flow diagram of a second example method for managing multimedia content of a device, illustrating a categorization and compression procedure, in accordance with an example embodiment;

[0014] Figure 5 illustrates a flow diagram of a third example method for managing multimedia content of a device, illustrating a categorization and lossless compression procedure, in accordance with an example embodiment;

[0015] Figure 6 illustrates a block diagram of a system for managing multimedia content of a device, to which example embodiments may be applied;

[0016] Figure 7 illustrates a block diagram of a simplified system for managing multimedia content of a device, to which example embodiments may be applied;

[0017] Figure 8 illustrates a block diagram of a JPEG encoder, to which example embodiments may be applied;

[0018] Figure 9 illustrates an algorithm of the first example method of Figure 3, in accordance with an example embodiment; [0019] Figure 10 illustrates an algorithm of the second example method of Figure 4, in accordance with an example embodiment; and [0020] Figure 11 illustrates an algorithm of the first example method of Figure 5, in accordance with an example embodiment.

[0021] Similar reference numerals may be used in different figures to denote similar components.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

[0022] To overcome the inconvenience problem associated with manual operations, recent wireless sync services such as iCIoud™, Dropbox™, Google+™, Skydrive™, etc., can automatically upload photos from a specific folder on a smartphone to a cloud server. For example, the Photo Stream™ service of iCIoud automatically uploads a newly taken photo stored in the special folder, Photo Stream™, on an iOS™ device (developed and distributed by Apple Inc.), to a cloud server and also pushes it to users' other registered devices and computers over any available Wi-Fi or Ethernet connection. However, as clearly illustrated in Figures 2A and 2B, screen shots of Photo Stream on an iPhone™ and iCIoud, photos in Photo Stream remain unstructured and unorganized as independent files; they are not automatically categorized either in terms of their contents, locations, occasions, events, and/or features. In addition, the wireless sync services may not be power efficient, bandwidth efficient, and/or storage efficient since similarity between photos taken consecutively at the same scene, location, or event is not utilized to further reduce the size of these photos.

[0023] It would be advantageous to have methods and systems that automatically and efficiently categorize multimedia contents generated from a network connection enabled digital data generating device in terms of their contents, locations, occasions, events, and/or features, securely transmit them to other network connected devices, and manage them across all these devices.

[0024] At least some example embodiments generally relate to automatic categorization, transmission, and management of multimedia contents, and in particular to methods and systems that automatically and efficiently categorize multimedia contents generated from a network connection enabled digital data generating device in terms of their contents, locations, occasions, events, and/or features, securely transmit them to other network connected devices, and manage them across all these devices. [0025] In at least some example embodiments, there is provided methods and systems for automatically and efficiently categorizing, securely transmitting, and managing multimedia contents generated from a network connection enabled digital data generating device by way of multimedia transcoding, compression, and/or classification. Content similarity between multimedia data units with one unit corresponding to one image, one video frame, or one audio frame is first detected and identified, and then used to help encode conditionally one unit given another; based on the identified content similarity and/or the resulting compression efficiency, these multimedia units are automatically categorized in terms of their contents along with their locations, occasions, events, and/or features; whenever network connections selected by a user are available, these encoded and

categorized multimedia data units along with their metadata containing side information including their locations, occasions, events, features, and other information specified by the user are then automatically and securely transmitted to networks, cloud servers, and/or home and office computers, and are further pushed to other network connected devices. Depending on the availability of network connections and the power supply of the digital data generating device, the detection and identification of content similarity, encoding, categorization, and transmission can be performed either concurrently or sequentially. Further management performed on the cloud servers, home computers, and/or office computers regarding amalgamation, regrouping, decoupling, insertion, deletion, etc. of these encoded and categorized multimedia data units will be pushed to all connected devices mentioned above to keep them in sync with a service

agreement.

[0026] To be specific, at least some example embodiments can use images such as photos and medical images as exemplary multimedia contents to illustrate our example methods and systems that automatically and efficiently categorize multimedia contents generated from a network connection enabled digital data generating device in terms of their contents, locations, occasions, events, and/or features, securely transmit them to other network connected devices, and manage them across all these devices. Nonetheless, at least some example embodiments of the methods and systems apply to other multimedia content types such as video and audio as well with one image replaced by one multimedia data unit, which is a video frame in the case of video and an audio frame in the case of audio.

[0027] On many occasions, a user of a network connection enabled digital data generating device may take a sequence of images consecutively at the same scene, location, and/or event. These images are often compressed already in an either lossless or lossy manner by standard compression methods such as JPEG (e.g. G. K. Wallace, "The JPEG still picture compression standard," Communications of the ACM, Vol. 34, No. 4, pp.3044, April, 1991) when they are captured by the digital data generating device. As such, using universal lossless compression algorithms, such as the Lempel-Ziv algorithms (e.g. J. Ziv and A. Lempel, "A universal algorithm for sequential data compression," IEEE Trans. Inform. Theory, vol. 23, pp. 337-343, 1977; and J. Ziv and A. Lempel, "Compression of individual sequences via variable rate coding," IEEE Trans. Inform. Theory, vol. 24, pp. 530- 536, 1978); and the Yang-Kieffer grammar-based algorithms (J. C. Kieffer and E.- H . Yang, "Grammar based codes: A new class of universal lossless source codes," IEEE Trans. Inform. Theory, vol. 46, pp. 737-754, 2000; E.-H . Yang and J. C.

Kieffer, "Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform -Part one: Without context models," IEEE Trans. Inform. Theory, vol. 46, pp. 755-788, 2000; and E.-H . Yang and Da-ke He,

"Efficient universal lossless compression algorithms based on a greedy sequential grammar transform -Part two: With context models," IEEE Trans. Inform. Theory, Vol. 49, No. 11, pp. 2874-2894, November 2003) directly to further compress them either individually or collectively yields little or no compression. On the other hand, since these images are taken at the same scene, location, and/or event, some of them do look similar. In at least some systems of example embodiments, the content similarity between images is first detected and identified, and then used to help encode or re-encode conditionally one image given another image; based on the identified content sim i larity and/or the resulting com pression efficiency, these images are automatically categorized in terms of their contents along with their locations, occasions, events, and/or features; whenever network connections selected by a user are available, these encoded and categorized images along with their metadata containing side information i ncluding their locations, occasions, events, features, and other information specified by the user are then automatically and securely transm itted to networks, cloud servers, and/or home and office com puters, and are further pushed to other network connected devices. Depending on the availabil ity of network connections and the power supply of the digital data generating device, the detection and identification of content sim ilarity, encoding, categorization, and transm ission can be performed either concurrently or

sequentially. Further management performed on the cloud servers, home com puters, and/or office computers regarding amalgamation, regrouping, decoupling, insertion, deletion, etc. of these encoded and categorized images will be pushed to all connected devices mentioned above to keep them in sync with a service agreement.

[0028] Let F 1 , F 2 , - -- , Fi,— , F n be a sequence of images captured consecutively by the digital data generating device in the indicated order. As the image index / increases, the content sim ilarity in these images likely appears in a piece-wise manner since images appearing consecutively according to / are likely taken at the same scene, location, and/or event. Taking advantage of this feature, the example em bodiments of the described methods and systems process these images in an image by image manner while always maintaining a dynam ic representative image R. In itial ly, the representative image R could be a blank image or the

representative image left at the end of the earlier session of the systems. The categorization procedure, when used alone, is described in Algorithm 1 shown in Figure 9 and further illustrated in the flowchart shown in Figure 3.

[0029] In Algorithm 1, in itial ize the dynam ic representative image R and set / = 1. Perform the Algorithm while /≤ n . Verify if Image F t is in a com pressed format: if F t is com pressed, then decode it into its reconstruction G ; else set G t = F ( . Com pare G t with R to detect and identify si m ilarity between G t and R : if G t and R are sim ilar, then classify F t into the sub-category represented by R; else the sub ¬ category represented by R is completed, classify F t into a new sub-category represented by G t itself, and update R into R = G^ . Increase / by 1. Com pleted sub ¬ categories may be further categorized according to side information metadata . [0030] To im prove bandwidth efficiency, storage efficiency, and/or radio power efficiency needed for wireless transm ission, the categorization procedure described in Algorithm 1 can also be used in conjunction with conditional

com pression . The resulting concurrent categorization and com pression procedure is described in Algorithm 2 shown in Figure 10 and further illustrated in the flowchart shown in Figure 4.

[0031] In Algorithm 2, in itial ize the dynam ic representative image R and set / = 1. Perform the Algorithm while /≤ n . Verify if Image F t is in a com pressed format: if F t is com pressed, then decode it into its reconstruction G ; else set G t = F ( . Com pare G t with R to detect and identify si m ilarity between G t and R : if G t and R are sim ilar, then encode G t conditionally given R, and classify F t into the sub ¬ category represented by R; else the sub-category represented by R is com pleted, encode G t without any help from R, decode the encoded G t to get the reconstruction Gi of Gi, classify F t into a new sub-category represented by G i r and update R into R = Gi . Increase / by 1. Com pleted sub-categories may be further categorized according to side information metadata .

[0032] The encoding of each G t in Algorithm 2 can be either lossless or lossy. If it is lossless, the classification of each image F t into each sub-category

represented by R can also be determ ined according to the efficiency of conditional encoding of G t given R in com parison with the original fi le size of F t . Such a variant of concurrent categorization and lossless com pression procedure is described in Algorithm 3 shown in Figure 11 and further il lustrated in the flowchart shown in Figure 5.

[0033] In Algorithm 3, in itial ize the dynam ic representative image R and set / = 1. Perform the Algorithm while i ≤ n . Verify if Image F t is in a com pressed format: if F t is com pressed, then decode it into its reconstruction G ; else set G t = F t . Encode G t conditionally given R in a lossless manner. If the byte size of conditionally encoded G t divided by that of F t is less than a threshold S, then classify F t into the sub-category represented by R; else the sub-category

represented by R is com pleted, discard the conditional encoding of G i r losslessly encode G t without any help from R if F t is not com pressed, classify F t into a new sub-category represented by G i r and update R into R = G^ . Increase / by 1.

Com pleted sub-categories may be further categorized according to side information metadata .

[0034] In both Algorithms 2 and 3, each com pleted sub-category represented by R can be configured to be either a set of files, one fi le per image F t classified into the sub-category represented by R, or a single com pressed file consisting of several segments with the first segment corresponding to the encoded R and each remaining segment corresponding to the conditionally encoded G t for each image F t classified into the sub-category represented by R. Since the conditional encoding of Gi within each sub-category represented by R depends only on R, non-linear editing such as image insertion into, deletion from, and/or extraction from the subcategory represented by R can be easily performed even when the sub-category represented by R is saved as a single com pressed file.

[0035] When the digita l data generating device has enough power supply and the network connections selected by the user are available on the digital data generating device, each encoded G t along with its categorization information and its metadata containing side information includ ing its locations, occasions, events, features, and other information specified by the user is then automatically and securely transm itted to networks, cloud servers, and/or home and office com puters, and is further pushed to other network connected devices.

[0036] Figure 6 i llustrates the overal l architecture of a system 600 in which m ultimedia data units generated from a network connection enabled digital data generating device 602 are automatically and efficiently categorized in terms of their contents, locations, occasions, events, and/or features, transm itted securely to a cloud server, pushed to other network connected devices, and further managed in sync across all these devices. To describe the system 600 shown in Figure 6 in details, we once again use images as exemplary multimedia contents. As shown in Figure 6, a client application 606 is provided on the digital data generating device 602. The client application comprises a graphical user interface (GUI) 608 and six modules: a data detection and monitoring module 610, a similarity detection and identification module 612, a multimedia content categorization module 614, a multimedia encoding module 616, a data transfer protocol module 618, and a categorization management module 620.

[0037] The GUI 608 allows a user of the system 600 to configure the system 600 to include the location information provided by the Global Positioning System (GPS) (not shown) for images to be taken and other information such as occasions, events, messages, etc the user can input and wants to be associated with images to be taken, and to select network connections on the digital data generating device 602 for subsequent automatic transmission of these images along with their respective side information data. Later on, the side information data will be used by the system 600 to automatically categorize these images on top of their content similarity.

[0038] The data generating/capture module 604 captures data and images on the digital data generating device 602. The data detection and monitoring module 610 detects and monitors this capture process. It will ensure that captured images will flow into subsequent modules along with their respective side information metadata according to the order they have been captured. It also acts as an interface between the data generating/capture module 604 and other modules in the client application 606 by buffering captured images that have not been categorized, encoded or re-encoded, and/or transmitted to the server. As long as the buffer is not empty, the categorization, encoding, and transmission processes will continue whenever the digital data generating device 602 has enough power supply and the network connections selected by the user are available on the digital data generating device 602. Examples of the data generating/capture module 604 include microphones, cameras, videocameras, and 3D versions of cameras and videocameras, which may capture a scene with a 3D effect by taking two or more individual but related images of the scene (representing the stereoscopic views) from different points of view, for example.

[0039] Upon receiving a sequence of images F 1 , F 2 , --- . F — , F n in the order they have been captured, the similarity detection and identification module 612, multimedia content categorization module 614, and multimedia encoding module 616 would function according to Algorithms 1 to 3, as the case may be. Assume that the digital data generating device 602 has enough power supply and the network connections selected by the user are available on the digital data

generating device 602. The data transfer protocol module 618 would then

automatically and securely transmit each categorized and encoded image along with its categorization information and its associated side information metadata to the server, which in turn pushes its received data to other network connected devices.

[0040] Since each categorized and encoded image is compressed, secure transmission can be provided by scrambling a small fraction of key compressed bits, which would reduce the computational complexity and power consumption associated with security in comparison with a full-fledged encryption of all data to be transmitted.

[0041] After all categorized and encoded images within a sub-category represented by a representative image R are received by the server along with their categorization information and associated side information metadata, the subcategory represented by R can be configured to be saved as either a set of files, one file per image F t classified into the sub-category represented by R, or a single compressed file consisting of several segments with the first segment

corresponding to the encoded R and each remaining segment corresponding to the conditionally encoded G t for each image F t classified into the sub-category

represented by R. On the server, completed sub-categories would be further automatically categorized according to their metadata. Through the categorization management module 620 on the digital data generating device 602, the server would send back this change to the digital data generating device 602 and further pushes it to other network connected devices. Further management regarding amalgamation, regrouping, decoupling, insertion, deletion, etc of these encoded and categorized images could also be performed manually on the server, home computers, and/or office computers if needed; changes again will be pushed to all connected devices to keep them in sync with a service agreement.

[0042] For each media type— for example, images are one type and video is another type— conceptually, there would be four types of folders to handle all categorized and encoded multimedia data units: an active folder, a permanent folder, a history folder, and a trash folder. Using images as exemplary multimedia contents again, we describe these folders in details. The active folder stores all categorized and encoded images that have been actively accessed by one or more of network connected devices during the past a period of time. This folder would be in sync across all network connected devices including the digital data generating device 602 with the server through the categorization management module 620. Inactive categorized and encoded images would be moved to the permanent folder. The full version of all categorized and encoded images within the permanent folder would be kept in and synced across all network connected devices the user selects for this purpose; for all other network connected devices, only a low resolution version (such as thumbnail version) of all categorized and encoded images within the permanent folder would be kept for information purpose. The history folder would store all images the user has uploaded through the client application to web sites for sharing, publishing, and other multimedia purposes. Finally, the trash folder would store temporarily deleted images, sub-categories, and categories.

[0043] Figure 7 illustrates the overall architecture of a simplified system 700 in which multimedia data units generated from a network connection enabled digital data generating device are automatically and efficiently categorized in terms of their contents, locations, occasions, events, and/or features, transmitted securely to a server/computer, and further managed in sync between the digital data generating device and server/computer. The simplified system 700 is useful when multimedia data units generated from the digital data generating device are intended only to be transferred to the user's own server/computer and managed in sync between the digital data generating device and server/computer. The client application and its corresponding modules on the digital data generating device shown in Figure 7 have the same functionalities as those in Figure 6.

[0044] METHODS FOR CONCURRENT CATEGORIZATION AND LOSSLESS COMPRESSION

[0045] Example methods for concurrent categorization and lossless

compression will now be described, in accordance with example embodiments. Using images as exemplary multimedia contents again, in this section, we describe methods for implementing Algorithm 3 in the case in which images F i r i = 1, 2, ■ ■ ■ , n, are JPEG compressed when they are captured by the digital data generating device in the indicated order. Although we focus on Algorithm 3, many steps and procedures presented below also apply to Algorithm 2.

[0046] JPEG is a popular discrete cosine transform (DCT) based still image compression standard. It has been widely used in smartphones, tablets, and digital cameras to generate JPEG format images. As shown in Figure 8, a JPEG encoder 800 consists of three basic steps: forward DCT (FDCT), quantization, and lossless encoding. The encoder first partitions an input image into 8 x 8 blocks and then processes these 8 x 8 image blocks one by one in raster scan order (baseline JPEG). Each block is first transformed from the pixel domain to the DCT domain by an 8 x 8 FDCT. The resulting DCT coefficients are then uniformly quantized using an 8 x 8 quantization table Q, whose entries are the quantization step sizes for each frequency bin. The DCT indices U from the quantization are finally encoded in a lossless manner using run-length coding and Huffman coding. If the original input image is a multiple component image such as an RGB color image, the pipeline process of FDCT, quantization, and lossless encoding is conceptually applied to each of its components (such as its luminance component Y and chroma components Cr and Cb in the case of RGB color images) independently.

[0047] A. Single Component Images [0048] Suppose that each image F / = 1, 2, ■ ■ ■ , /?, is a JPEG compressed single component image. We first describe methods for implementing key steps in Algorithms 3 in this case.

[0049] 1) Decode F t into G t : With reference to Algorithm 3, since each F t is JPEG compressed, we need to decode F t into its reconstruction G t in the pixel domain. This can be achieved by:

[0050] (Dl) applying a lossless decoder to F t to get 8 x 8 blocks of DCT indices U;

[0051] (D2) multiplying each 8 x 8 block of DCT indices by the 8 x 8 quantization table Q element-wise to get an 8 x 8 block of quantized DCT

coefficients; and

[0052] (D3) applying an integer inverse DCT, say T - 1 , to each 8 x 8 block of quantized DCT coefficients to get an 8 x 8 block of reconstructed pixel values.

[0053] All 8 x 8 blocks of reconstructed pixel values together form the reconstructed image G t for F t . Note that G t here is level-shifted implicitly.

[0054] 2) Similarity Detection and Identification: With reference to Algorithm 3 again, both G t and the dynamic representative image R at this point are in the pixel domain. To detect and identify the similarity between G t and R, we compare G t against R or a filtered R by first partitioning G t into blocks of size N x N and then for each of such N x N blocks (hereafter referred to as a target block), searching a region of R or the filtered R to see if there is any N x N block within the searched region which is similar to the target block, where N is an integer multiple of 8 so that each N x N block consists of several 8 x 8 blocks. If a similar block within the searched region is found, then the block with the strongest similarity in some sense within the searched region is selected as a reference block for the target block. The offset position between the reference block and target block is called an offset vector or motion vector of the target block. [0055] Specifically, let us use the top-left corner of an image as the origin. Then a pixel in the image can be referred to by its coordinates (x, /), where x (/, respectively) increases as we scan the image from left to right (top to bottom, respectively). Consider an N x N target block Bt with its upper-left corner at (x, y). Then the search process can be performed as follows:

[0056] (SI) In the image R or the filtered R, locate the co-located N x N block to get a starting position (h m , j m ), where the co-located N x N block is the N x N block in R or the filtered R with its upper-left corner at the same position (x, y). If reference blocks in R or the filtered R have been found for one or more neighboring N x N blocks of the target block in G if the starting position can also be estimated from the motion vectors of these neighboring blocks. If the co-located block is used as a starting position, then {h m , jm) = (0, 0).

[0057] (S2) Search all N x N blocks B r in R or the filtered R with their upper- left corners falling into the following set {{x - hm - h, y -j m -j) : -k≤ h,j≤k} (4.1)

[0058] where k determines the search range, to see if there is, within the searched region, an N x N block similar to the target block according to some metric. [0059] (S3) If there is, within the searched region, an N x N block similar to the target block, further locate, within the searched region, the N x N block with strongest similarity, say at location (x - h m - h*, y - jm - j*), which is selected as a reference block for the target block. The offset position (h m + h*, j m + j*) is the motion vector for the target block. [0060] (S4) If no N x N blocks within the searched region are similar to the target block, select an all zero N x N block as a reference block for the target block and the vector (h m + k + 1, jm) as an artificial offset position. [0061] The similarity metric between an N x N block B r in R or the filtered R and the target block Bt could be based on a cost function defined for the difference Bt - B r ) for example, the cost function could be the L\ norm of Bt - B r

C(B B r )) =∑ ∑ \B t (a, b) - B r (a, b)\ (4.2)

□r the energy of B t - B r

C(B t B r )) =∑ ∑ \B t (a, b) - B r (a, b)\ 2 (4.3) s=0 c=Q

[0062] where for any block B, B(a, b) denotes the value of the pixel located at the ath column and £>th row of B. The similarity metric between B r and Bt could also be based on a cost function defined for

round{÷^-) and round( -^= ) (4.4)

[0063] where T denotes the forward DCT corresponding to the integer inverse DCT T - 1 and is applied to every 8x8 block, the division by Q is an element-wise division, and so is the round operation. No matter which similarity metric or cost function is used, B r is similar to Bt if C {Bt, B r )-C (Bt, 0) is less than a threshold, where 0 denotes the all zero N x N block; otherwise, B r is deemed not to be similar to Bt. [0064] Mean-based search : When the search region is large, i.e., when k in (4.1) is large, the computation complexity of Step S2 would be very large if the brute force full search is adopted. To reduce the computation complexity of Step S2, we here describe an efficient mean-based search method :

[0065] (S2-1) Calculate the mean of the target block Bt, i.e., the mean value of all pixel values within the target block Bt- Since G t is obtained from the JPEG compressed F t via Steps Dl to D3, it is not hard to see that the mean of the target block Bt can be easily com puted as the average of the quantized DC coefficients (from Fi ) of all J PEG 8 x 8 blocks contained within the target block Bt-

[0066] (S2-2) Calculate the mean of each N x N block B r in R or the filtered R with its upper-left corner falling into the set given by (4. 1) . For overlapped blocks B r , their means share a significant num ber of com mon pixel values. By avoiding repeated computations, the means of all N x N blocks B r in R or the filtered R with their upper-left corners falling into the set g iven by (4. 1) can be com puted with (4 c

+ N - - 1 additions. This is com pared very favorably with (2k + 1)^(/V ^ - 1) additions, which are required if the mean of each N x N block B r with its upper-left corner falling into the set given by (4. 1) is com puted independently and one by one.

[0067] (S2-3) Search only those N x N blocks Br in R or the filtered R whose upper-left corners fall into the set given by (4. 1) and whose means are close to the mean of the target block Bt, to see if there is an N x N block sim ilar to the target block according to the selected sim i larity metric. Those N x N blocks Br in R or the filtered R whose upper-left corners fall into the set given by (4. 1), but whose means are away from the mean of the target block Bt, will be skipped .

[0068] 3) Lossless conditional encoding of G t given R: Apply the procedure specified in Steps S I to S4 to each and every target block in G ( . We then get a reference block for each and every target block in G^ . Let P^ be the image obtained by piecing all these reference blocks together in the same order as their

corresponding target blocks would appear in G ( . The image P t is called a predicted image for G t from R. Since P t is determ ined by com paring G t against R, the decoder does not know at this point what P t is. As such, one has to first encode in a lossless manner all motion vectors of all target blocks in G t and then send the resulting com pressed bits to the decoder. Upon receiving these com pressed bits, the decoder can recover P t perfectly with help of R. The lossless encoding of G t given R can then be achieved by encoding G t conditionally given P t in a lossless manner. Note that both Gi and Pi are im plicitly level shifted . [0069] A typical approach to encoding G t conditionally given P t is to encode the difference image G t - P t without any loss. However, although conceptually simple and popular, such an approach is not efficient in our case. Indeed we can do much better by providing a technique dubbed lossless encoding via quantization, which is described next.

[0070] Lossless encoding via quantization : Since G t is obtained from the JPEG compressed F t via Steps Dl to D3, it is not hard to see that

(4.5) and

= U - roundj 7 ^ ) (4.6)

Q

[0071] where U is the sequence of DCT indices decoded out from F T denotes the forward DCT corresponding to the integer inverse DCT 7 ~ - 1 and is applied to every 8 x 8 block, the division by Q is an element-wise division, and so is the round operation. In view of (4.5), U and G t can be determined from each other. In addition, the JPEG compressed F t can be fully recovered without any loss from either U or G^ . Therefore, it follows from (4.6) that lossless encoding of G t or U can be achieved by lossless encoding of the quantized, transformed difference image

by adding round

[0072] which can also be computed by the decoder, to the quantized, transformed difference image, we get U back without any loss. In com parison with the direct lossless encoding of the difference image G t - P t in either the pixel domain or transform domain, the lossless encoding via quantization technique through the lossless encoding of the quantized, transformed difference image is more efficient since the energy of the quantized, transformed difference image is drastically reduced . Note that the quantization table Q used to quantize the transformed difference image 7~ (G ( . - ( ) should be identical to that used to generate the J PEG compressed F t by the digital data generating device when F t was first captured .

[0073] Variants of lossless encoding via quantization : One variant of the lossless encoding via quantization techn ique mentioned above is to directly encode U in a lossless manner with help of

roundi-^-). (4.7)

[0074] In th is case, there are some flexibil ities in the quantization process used in (4.7) . In particular, the quantization process used in (4.7) may not be identical to that used to generate the J PEG com pressed F t by the digital data generating device when F t was first captured . For exam ple, if most DCT coefficients in an 8 x 8 block in T (G have values greater, by a margin q, than corresponding DCT coefficients in a corresponding 8 x 8 block in T (P , then all DCT coefficients in the corresponding 8 x 8 block in T (P ) can be shifted to the right by q before they are quantized so that after quantization, they are closer to DCT indices U

corresponding to the 8 x 8 block in T (G ) .

[0075] 4) Concurrent categorization: After the lossless conditional encoding of Gi or U given R is com pleted, the decision regarding whether F t should be assigned to the sub-category represented by R can be based on the total num ber of com pressed bits resulting from the lossless encoding of all motion vectors of all target blocks in G t and from the lossless conditional encoding of G t or U given Pi in com parison with the total num ber of bits in the J PEG compressed F ( . If the former divided by the latter is less than a threshold, say S, then assign F t into the sub ¬ category represented by R. Otherwise, assign F t into a new sub-category

represented by R = G^.

[0076] B. Multiple Component Images [0077] Suppose now that each image F ir / = 1, 2, ■ ■ ■ , /?, is a JPEG

compressed multiple component image; further the number of components and their types are the same for all / = 1, 2, ■ ■ ■ , n. For example, when an RGB color image is compressed by JPEG, it is first converted from the RGB color space to the YCrCb space; if the 4 : 2 : 0 format is adopted, its chroma components Cr and Cb are further downsmapled; then its luminance component Y and downsampled chroma components Cr and Cb are compressed independently by a JPEG encoder and multiplexed together to form a JPEG compressed color image.

[0078] In this case, the methods described in the herein subsection "A. Single Component Images" are applied to each component independently except for the following example possible changes:

[0079] (CI) To reduce the computation complexity of Steps SI to S4 and also the number of compressed bits needed for the lossless encoding of the resulting motion vectors, different components can share the motion vectors of their target blocks. Take JPEG compressed color images F ir i = 1, 2, ■ ■ ■ , n as an example. Once the motion vector of a target block in the luminance component Y of Fi (more precisely, G ) is determined, it or its scaled version could be used as the motion vector of the corresponding target blocks in both Cr and Cb chroma components of F^.

[0080] (C2) In the concurrent categorization step, the decision regarding whether F t should be assigned to the sub-category represented by R can be based on either a single component of F t or all components of F t together. Once again, take JPEG compressed color images F ir i = 1, 2, ■ ■ ■ , n as an example. One can assign F t into the sub-category represented by R if the total number of compressed bits resulting from the lossless encoding of all motion vectors of all target blocks in the luminance component Y of G t and from the lossless conditional encoding of the luminance component Y of G t given the luminance component Y of P t is less than the total number of bits in the JPEG compressed luminance component Y of F t multiplied by a threshold S. Alternatively, one can assign F t into the sub-category represented by R if the total number of bits in the conditionally encoded G t

(including all of its components) given R is less than the total number of bits in the JPEG compressed F t (including all components of F ) multiplied by a threshold S.

[0081] In accordance with an example embodiment, there is provided a method for managing multimedia content of a device having access to a plurality of successive media data units, the method including : comparing media data units within the plurality of successive media data units to determine similarity, and when the determined similarity between compared media data units is within a similarity threshold, automatically categorizing the compared media data units within the same category. [0082] In accordance with another example embodiment, there is provided a method for encoding media content of a device having access to a plurality of successive media data units, the method including : encoding a selected media data unit of the plurality of successive media data units in dependence on a reference media data unit, and when resulting compression efficiency is not within a compression threshold, referencing the selected media data unit as the reference media data unit.

[0083] There may also be provided the method of the another example embodiment, wherein the encoding comprises conditionally encoding, wherein the method further comprises discarding the conditional encoding when the resulting compression efficiency is not within the compression threshold .

[0084] There may also be provided the method of the another example embodiment, further comprising the device generating at least one of the plurality of successive media data units. [0085] There may also be provided the method of the another example embodiment, wherein the generating is implemented from a media capturing component of the device.

[0086] There may also be provided the method of the another example embodiment, further comprising performing said selecting on the media data units in succession, and performing said referencing when resulting compression efficiency is not within the compression threshold.

[0087] There may also be provided the method of the another example embodiment, further comprising, when resulting compression efficiency is within a compression threshold, automatically categorizing the selected media data unit and the reference media data unit within the same category.

[0088] There may also be provided the method of the another example embodiment, wherein the multimedia data units each have associated side data specifying for the multimedia data unit at least one of a location, occasion, event and feature; wherein automatically categorizing the compared media data units includes sub-categorizing multimedia units that are within the same category according to the side data.

[0089] There may also be provided the method of the another example embodiment, wherein the reference media data unit is initialized as a blank media data unit or a previous media data unit.

[0090] There may also be provided the method of the another example embodiment, wherein the referencing further comprises copying the reference media data unit to a file.

[0091] There may also be provided the method of the another example embodiment, further comprising, prior to the encoding of any one of the media data units, decoding the any one of the media data units which is in an encoded format.

[0092] There may also be provided the method of the another example embodiment, further comprising, after referencing the selected media data unit, re- encoding, without dependence on the reference media data unit, the selected media data unit which did not require decoding.

[0093] There may also be provided the method of the another example embodiment, wherein the multimedia data units each correspond to one image, one video frame or one audio frame.

[0094] There may also be provided the method of the another example embodiment, wherein the device is network connection enabled, the method further comprising, when a preselected network connection is available, automatically transmitting the categorized multimedia data units to one or more remote computing devices.

[0095] There may also be provided the method of the another example embodiment, wherein the automatically transmitting is part of a synchronizing process between the device and the one or more remote computing devices.

[0096] In accordance with an example embodiment, there is provided a device, including memory, a component configured to access a plurality of successive media data units, and a processor configured to execute instructions stored in the memory in order to perform the described methods.

[0097] In an example embodiment, the device further comprises a media capturing component configured to generate the plurality of successive media data units.

[0098] In an example embodiment, the device further comprises a component configured to enable network connectivity.

[0099] In an example embodiment, the memory stores the plurality of successive media data units. [00100] In accordance with an example embodiment, there is provided a non- transitory computer-readable medium containing instructions executable by a processor for performing the described methods. [00101] In the described methods, the boxes may represent events, steps, functions, processes, modules, state-based operations, etc. While some of the above examples have been described as occurring in a particular order, it will be appreciated by persons skilled in the art that some of the steps or processes may be performed in a different order provided that the result of the changed order of any given step will not prevent or impair the occurrence of subsequent steps.

Furthermore, some of the messages or steps described above may be removed or combined in other embodiments, and some of the messages or steps described above may be separated into a number of sub-messages or sub-steps in other embodiments. Even further, some or all of the steps may be repeated, as necessary. Elements described as methods or steps similarly apply to systems or subcomponents, and vice-versa. Reference to such words as "sending" or

"receiving" could be interchanged depending on the perspective of the particular device. [00102] While some example embodiments have been described, at least in part, in terms of methods, a person of ordinary skill in the art will understand that some example embodiments are also directed to the various components for performing at least some of the aspects and features of the described processes, be it by way of hardware components, software or any combination of the two, or in any other manner. Moreover, some example embodiments are also directed to a pre-recorded storage device or other similar computer-readable medium including program instructions stored thereon for performing the processes described herein. The computer-readable medium includes any non-transient storage medium, such as RAM, ROM, flash memory, compact discs, USB sticks, DVDs, HD-DVDs, or any other such computer-readable memory devices.

[00103] Although not specifically illustrated, it will be understood that the devices described herein include one or more processors and associated memory. The memory may include one or more application program, modules, or other programming constructs containing computer-executable instructions that, when executed by the one or more processors, implement the methods or processes described herein. [00104] The various embodiments presented above are merely examples and are in no way meant to limit the scope of this disclosure. Variations of the innovations described herein will be apparent to persons of ordinary skill in the art, such variations being within the intended scope of the present disclosure. In particular, features from one or more of the above-described embodiments may be selected to create alternative embodiments comprised of a sub-combination of features which may not be explicitly described above. In addition, features from one or more of the above-described embodiments may be selected and combined to create alternative embodiments comprised of a combination of features which may not be explicitly described above. Features suitable for such combinations and sub-combinations would be readily apparent to persons skilled in the art upon review of the present disclosure as a whole. The subject matter described herein intends to cover and embrace all suitable changes in technology.

[00105] All publications described or referenced herein are hereby incorporated by reference in their entirety.