Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METADATA-AIDED REMOVAL OF FILM GRAIN
Document Type and Number:
WIPO Patent Application WO/2023/205144
Kind Code:
A1
Abstract:
A metadata-aided film-grain removal method and corresponding apparatus. An example embodiment enables a video decoder to substantially fully remove the film grain from a digital video signal that has undergone lossy video compression and then video decompression. Different embodiments may rely only on spatial-domain grain-removal processing, only on temporal-domain grain-removal processing, or on a combination of spatial-domain and temporal-domain grain-removal processing. Both spatial-domain and temporal-domain grain-removal processing may use metadata provided by the corresponding video encoder, the metadata including one or more parameters corresponding to the digital film grain injected into the host video at the encoder. Different film-grain-injection formats can be accommodated by the video decoder using signal preprocessing directed at supplying, to the film-grain removal module of the video decoder, an input compatible with the film-grain removal method implemented therein.

Inventors:
SU GUAN-MING (US)
YIN PENG (US)
HUANG TSUNG-WEI (US)
Application Number:
PCT/US2023/018941
Publication Date:
October 26, 2023
Filing Date:
April 18, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
DOLBY LABORATORIES LICENSING CORP (US)
International Classes:
H04N19/117; G06T5/00; H04N19/46; H04N19/70; H04N19/85; H04N19/86
Domestic Patent References:
WO2006057994A22006-06-01
WO2004105250A22004-12-02
Foreign References:
KR20190098634A2019-08-22
US20060183275A12006-08-17
US6067125A2000-05-23
EP1231778A22002-08-14
US20200045231A12020-02-06
Other References:
MCCARTHY (DOLBY) S ET AL: "AHG9: Errata and editorial clarifications for AVC and HEVC film grain characteristics SEI message semantics", no. JVET-T0075, 14 October 2020 (2020-10-14), XP030293508, Retrieved from the Internet [retrieved on 20201014]
DUBOIS E ET AL: "NOISE REDUCTION IN IMAGE SEQUENCES USING MOTION-COMPENSATED TEMPORAL FILTERING", IEEE TRANSACTIONS ON COMMUNICATIONS, IEEE SERVICE CENTER, PISCATAWAY, NJ. USA, vol. COM-32, no. 7, 1 July 1984 (1984-07-01), pages 826 - 831, XP001024707, ISSN: 0090-6778
Attorney, Agent or Firm:
KONSTANTINIDES, Konstantinos et al. (US)
Download PDF:
Claims:
CLAIMS

1. A video delivery system capable of digital film-grain removal, the system comprising a video decoder (130) that comprises: an input interface (310) to receive a coded bitstream (122) including a compressed video bitstream (252) and metadata (212, 232), the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; and a processor (320, 340) configured to: decompress the compressed video bitstream to generate a respective decompressed representation (330) of each of the images, each of the decompressed representations including a respective host-image component and a respective film-grain component; compute, based on the metadata, a respective estimate of the respective film-grain component for each of the decompressed representations using a simulated film-grain image (430) and an approximation of a modulation function for modulating the digital film grain (420); and remove the respective estimate from the respective decompressed representation to produce a corresponding estimate (342) of the respective host-image component.

2. The video delivery system of claim 1, wherein the processor is further configured to: compute the simulated film-grain image using a film-grain model (210) and the metadata

(212).

3. The video delivery system of claim 2, wherein the processor is further configured to: compute, for patches of image pixels, standard deviations corresponding to estimated host-image pixel values; and compute the simulated film-grain image based on the standard deviations.

4. The video delivery system of claim 2 or claim 3, wherein the processor is further configured to adjust the simulated film-grain image using a digital filter to account for distortions caused by at least one of video compression and decompression.

5. The video delivery system of claim 4, wherein the processor is configured to adjust the simulated film-grain image using a Gaussian filter.

6. The video delivery system of claim 4 or claim 5, wherein the processor is configured to adjust the simulated film-grain image using a finite-impulse-response filter.

7. The video delivery system of any of claims 4-6, wherein the processor is configured to adjust the simulated film-grain image using a guided filter.

8. The video delivery system of any of claims 1-7, wherein the processor is further configured to: compute first-order polynomial approximations of a nonlinear luma modulation function; and compute the respective estimate using said approximations.

9. The video delivery system of claim 8, wherein the processor is further configured to use the metadata to compute said approximations.

10. The video delivery system of any of claims 1-9, wherein the processor is further configured to compute the respective estimate using a precomputed look-up table having stored therein parameters of first-order polynomial approximations of a nonlinear luma modulation function.

11. The video delivery system of any of claims 1-10, wherein the processor is further configured to: temporally average a sequence of the respective decompressed representations using a plurality of temporal sliding windows, each of the temporal sliding windows corresponding to a respective image-frame pixel, at least some of the temporal sliding windows having different respective lengths; and select the lengths of the temporal sliding windows based on the metadata.

12. The video delivery system of any of claims 1-11, further comprising a video encoder that comprises: an output interface to output the coded bitstream for the video encoder; and a video-compression module to generate the compressed video bitstream using lossy compression according to a video compression standard.

13. A video delivery system capable of digital film- grain removal, the system comprising a video decoder (130) that comprises: an input interface (310) to receive a coded bitstream (122) including a compressed video bitstream (252) and metadata (212, 232), the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; and a processor (320, 340) configured to: decompress the compressed video bitstream to generate a respective decompressed representation (330) of each of the images, each of the decompressed representations including a respective host-image component and a respective film-grain component; and temporally average (910) a sequence of the respective decompressed representations using a plurality of temporal sliding windows, each of the temporal sliding windows corresponding to a respective image-frame pixel, at least some of the temporal sliding windows having different respective lengths selected based on the metadata.

14. The video delivery system of claim 13, further comprising a video encoder that comprises: an output interface to output the coded bitstream for the video encoder; and a video-compression module to generate the compressed video bitstream using lossy compression according to a video compression standard.

15. A machine-implemented method of removing digital film grain from video data, the method comprising: receiving a coded bitstream (122) including a compressed video bitstream (252) and metadata (212, 232), the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; decompressing (320) the compressed video bitstream to generate a respective decompressed representation (330) of each of the images, each of the decompressed representations including a respective host-image component and a respective film-grain component; computing, based on the metadata, a respective estimate of the respective film-grain component for each of the decompressed representations using a simulated film-grain image (430) and an approximation of a modulation function for modulating the digital film grain (420); and removing (340) the respective estimate from the respective decompressed representation to produce a corresponding estimate (342) of the respective host-image component.

Description:
METADATA-AIDED REMOVAL OF FILM GRAIN

1. Cross-Reference to Related Applications

[0001] This application claims the benefit of priority from the following priority applications: U.S. provisional patent application 63/332,332 (reference: D22003USP01), filed 19 April 2022, and European patent application 22168812.0 (reference: D22003EP), fded 19 April 2022, each of which is hereby incorporated by reference in their entirety.

2. Field of the Disclosure

[0002] Various example embodiments relate to image and video processing and, more specifically but not exclusively, to the digital film-grain technology.

3. Background

[0003] On physical film, film grain is the random physical texture made from small metallic silver particles found on processed photographic celluloid. In digital photography, the visual and artistic effects of film grain can be simulated by adding a digital grain pattern to a digital image after the image is taken. Because digital film grain may be difficult to encode, e.g., due to its (quasi)random nature, a video encoder may typically remove the film grain during encoding and then, during playback, the corresponding video decoder may synthesize the film grain and add it back in.

BRIEF SUMMARY OF SOME SPECIFIC EMBODIMENTS

[0004] Disclosed herein are various embodiments of a metadata- aided film-grain removal method and corresponding apparatus. An example embodiment enables a video decoder to substantially fully remove the film grain from a digital video signal that has undergone lossy video compression and then video decompression. Different embodiments may rely only on spatial-domain grain-removal processing, only on temporal-domain grain-removal processing, or on a combination of spatial-domain and temporal-domain grain-removal processing. Both spatial-domain and temporal-domain grain-removal processing may use metadata provided by the corresponding video encoder, the metadata including one or more parameters corresponding to the digital film grain injected into the host video at the encoder. Different film-grain-injection formats can be accommodated by the video decoder using signal preprocessing directed at supplying, to the film-grain removal module of the video decoder, an input compatible with the film-grain removal method implemented therein. [0005] According to an example embodiment, provided is a video delivery system capable of film- grain removal, the system comprising a video decoder that comprises: an input interface to receive a coded bitstream including a compressed video bitstream and metadata, the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; and a processor configured to: decompress the compressed video bitstream to generate a respective decompressed representation of each of the images, each of the decompressed representations including a respective host-image component and a respective film-grain component; compute, based on the metadata, a respective estimate of the respective film-grain component for each of the decompressed representations; and remove the respective estimate from the respective decompressed representation to produce a corresponding estimate of the respective host-image component.

[0006] According to another example embodiment, provided is a video delivery system capable of film- grain removal, the system comprising a video decoder that comprises: an input interface to receive a coded bitstream including a compressed video bitstream and metadata, the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; and a processor configured to: decompress the compressed video bitstream to generate a respective decompressed representation of each of the images, each of the decompressed representations including a respective host-image component and a respective film-grain component; and temporally average a sequence of the respective decompressed representations using a plurality of temporal sliding windows, each of the temporal sliding windows corresponding to a respective image-frame pixel, at least some of the temporal sliding windows having different respective lengths selected based on the metadata.

[0007] According to yet another example embodiment, provided is a machine-implemented method of removing film grain from video data, the method comprising: receiving a coded bitstream including a compressed video bitstream and metadata, the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; decompressing the compressed video bitstream to generate a respective decompressed representation of each of the images, each of the decompressed representations including a respective host- image component and a respective film-grain component; computing, based on the metadata, a respective estimate of the respective film-grain component for each of the decompressed representations; and removing the respective estimate from the respective decompressed representation to produce a corresponding estimate of the respective host-image component.

[0008] According to yet another example embodiment, provided is a non-transitory machine- readable medium, having encoded thereon program code, wherein, when the program code is executed by a machine, the machine implements a method comprising the steps of: receiving a coded bitstream including a compressed video bitstream and metadata, the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; decompressing the compressed video bitstream to generate a respective decompressed representation of each of the images, each of the decompressed representations including a respective host-image component and a respective film-grain component; computing, based on the metadata, a respective estimate of the respective film-grain component for each of the decompressed representations; and removing the respective estimate from the respective decompressed representation to produce a corresponding estimate of the respective host-image component.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] Other aspects, features, and benefits of various disclosed embodiments will become more fully apparent, by way of example, from the following detailed description and the accompanying drawings, in which:

[0010] FIG. 1 depicts an example process for a video delivery pipeline, in which at least some embodiments can be practiced.

[0011] FIG. 2 is a block diagram illustrating a video encoder that can be used in the process of FIG. 1 according to an embodiment.

[0012] FIG. 3 is a block diagram illustrating a video decoder that can be used in the process of FIG. 1 according to an embodiment.

[0013] FIG. 4 is a processing flowchart illustrating a spatial-domain grain removal process that can be implemented in the video decoder of FIG. 3 according to an embodiment.

[0014] FIG. 5 graphically illustrates example film-grain removal results possibly achievable using a scaled Gaussian filter in the video decoder of FIG. 3 according to first embodiments. [0015] FIG. 6 graphically illustrates example film-grain removal results possibly achievable using a scaled Gaussian filter in the video decoder of FIG. 3 according to second embodiments.

[0016] FIG. 7 graphically illustrates example film-grain removal results possibly achievable using a scaled Gaussian filter in the video decoder of FIG. 3 according to third embodiments.

[0017] FIG. 8 graphically illustrates example film-grain removal results possibly achievable using a scaled Gaussian filter in the video decoder of FIG. 3 according to fourth embodiments.

[0018] FIG. 9 is a processing flowchart illustrating a temporal-domain grain removal process that can be implemented in the video decoder of FIG. 3 according to an embodiment.

[0019] FIG. 10 shows an example of an intensity-dependent scaling factor that can be accommodated in the video decoder of FIG. 3 using signal preprocessing according to an embodiment.

DETAILED DESCRIPTION

[0020] This disclosure and aspects thereof can be embodied in various forms, including hardware, devices or circuits controlled by computer- implemented methods, computer program products, computer systems and networks, user interfaces, and application programming interfaces; as well as hardware-implemented methods, signal processing circuits, memory arrays, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), and the like. The foregoing is intended solely to give a general idea of various aspects of the present disclosure, and does not limit the scope of the disclosure in any way.

[0021] In the following description, numerous details are set forth, such as optical device configurations, timings, operations, and the like, in order to provide an understanding of one or more aspects of the present disclosure. It will be readily apparent to one skilled in the art that these specific details are merely exemplary and not intended to limit the scope of this application.

[0022] Moreover, while the present disclosure focuses mainly on examples in which the various circuits are used in digital projection systems, it will be understood that these are merely examples. It will further be understood that the disclosed systems and methods can be used in any device in which there is a need to project light, for example, cinema, consumer, and other commercial projection systems, heads-up displays, virtual reality displays, and the like. Disclosed systems and methods may be implemented in additional display devices, such as with an OLED display, an LCD display, a quantum dot display, or the like.

[0023] Herein, the term “dynamic range” (DR) may relate to a capability of the human visual system (HVS) to perceive a range of intensity (e.g., luminance, luma) in an image, e.g., from darkest grays (blacks) to brightest whites (highlights). In this sense, DR relates to a ‘scene- referred’ intensity. DR may also relate to the ability of a display device to adequately and/or approximately render an intensity range of a particular breadth. In this sense, DR relates to a ‘display-referred’ intensity. Unless a particular sense is explicitly specified to have particular significance at any point in the description herein, it should be inferred that the term may be used in either sense, e.g., interchangeably.

[0024] Herein, the term “high dynamic range” (HDR) relates to a DR breadth that spans some 14-15 orders of magnitude of the human visual system (HVS). In practice, the DR over which a human may simultaneously perceive an extensive breadth in intensity range may be somewhat truncated, in relation to HDR. As used herein, the terms “enhanced dynamic range” (EDR) or “visual dynamic range” (VDR) may individually or interchangeably relate to the DR that is perceivable within a scene or image by a human visual system that includes eye movements, allowing for some light adaptation changes across the scene or image.

[0025] In practice, images comprise one or more color components (e.g., luma Y and chroma Cb and Cr), wherein each color component is represented by a precision of n-bits per pixel (e.g., n=8). Using linear luminance coding, images where n<8 are considered as images of standard dynamic range, while images where n>8 (e.g., color 24-bit JPEG images) may be considered as images of enhanced dynamic range. EDR and HDR images may also be stored and distributed using high-precision (e.g., 16-bit) floating-point formats, such as the OpenEXR file format developed by Industrial Light and Magic.

[0026] Herein, the term “metadata” relates to any auxiliary information that is transmitted as part of the coded bitstream and assists a decoder in rendering the corresponding image(s). For television broadcasting and video streaming, video metadata may be used to provide side information about specific video and audio streams or files. Metadata can either be embedded directly into the video or be included as a separate file within a container, such as the MP4 or MKV. Metadata may include information about the entire video stream or file or about specific video frames. Created by cameras, encoders, and other video-processing elements (e.g., see 115, 120, FIG. 1), metadata may include but are not limited to timestamps, video resolution, digital film-grain parameters, color space or gamut information, reference display parameters, auxiliary signal parameters, file size, closed captioning, audio languages, ad-insertion points, color spaces, error messages, and so on. Additional examples of metadata pertinent to the disclosed embodiments are described herein below.

[0027] Many consumer desktop displays may support luminance of 200 to 300 cd/m 2 or nits. Many consumer HDTVs range from 300 to 500 nits, with new models reaching 1000 nits (cd/m 2 ). Such conventional displays typify a lower dynamic range (LDR), also referred to as a standard dynamic range (SDR), in relation to HDR or EDR. As the availability of HDR content grows due to advances in both image-capture equipment (e.g., cameras) and HDR displays (e.g., the PRM-4200 professional reference monitor from Dolby Laboratories), HDR content may be color graded and displayed on HDR displays that support higher dynamic ranges (e.g., from 1,000 nits to 5,000 nits or more).

Video Coding According to Example Embodiments

[0028] FIG. 1 depicts an example process of a video delivery pipeline (100), showing various stages from video capture to video-content display according to an embodiment. A sequence of video frames (102) may be captured or generated using an image-generation block (105). The video frames (102) may be digitally captured (e.g., by a digital camera) or generated by a computer (e.g., using computer animation) to provide video data (107). Alternatively, the video frames (102) may be captured on film by a film camera. Then, the film may be translated into a digital format to provide the video data (107).

[0029] In a production phase (110), the video data (107) may be edited to provide a video production stream (112). The data of the video production stream (112) may then be provided to a processor (or one or more processors, such as a central processing unit, CPU) at a postproduction block (115) for post-production editing. The post-production editing of the block (115) may include, e.g., adjusting or modifying colors or brightness in particular areas of an image to enhance the image quality or achieve a particular appearance for the image in accordance with the video creator’s creative intent. This part of post-production editing is sometimes referred to as “color timing” or “color grading.” Other editing (e.g., scene selection and sequencing, image cropping, addition of computer-generated visual special effects, etc.) may be performed at the block (115) to yield a “final” version (117) of the production for distribution. During the post-production editing (115), video images may be viewed on a reference display (125). [0030] Following the post-production (115), video data of the final version (117) may be delivered to a coding block (120) for being delivered downstream to decoding and playback devices, such as television sets, set-top boxes, movie theaters, and the like. In some embodiments, the coding block (120) may include audio and video encoders, such as those defined by the ATSC, DVB, DVD, Blu-Ray, and other delivery formats, to generate a coded bitstream (122). Some methods described herein below may be performed by the corresponding processor at the coding block (120). In a receiver, the coded bitstream (122) is decoded by a decoding unit (130) to generate a corresponding decoded signal (132) representing a copy or a close approximation of the signal (117). The receiver may be attached to a target display (140) that may have somewhat or completely different characteristics than the reference display (125). In such cases, a display management block (135) may be used to map the decoded signal (132) to the characteristics of the target display (140) by generating a display-mapped signal (137). Some methods described herein below may be performed by the decoding unit (130) and/or display management block (135). Depending on the embodiment, the decoding unit (130) and display management block (135) may include individual processors or may be based on a single integrated processing unit.

[0031] As already indicated above, film grain may provide an effective way to improve the look of digital video in terms of aesthetic feeling and sharpness. Film grain may also be used to alleviate banding artifacts and/or to mask compression artifacts. Film grain may be added in the post-production (115). Conventionally, film grain is removed at the encoder at the coding block (120), and the resulting “clean” video signal is compressed, e.g., using a lossy videocompression code, to generate the coded bitstream (122). At the corresponding decoder, the coded bitstream (122) is decoded, and film grain is added back using the decoding unit (130) to generate a decoded signal (132) that includes film grain.

[0032] In some use cases, film grain may not be removed at the encoder in the coding block (120) or, alternatively, a basic film grain may be added thereat to the “clean” video signal prior to video compression. This type of processing may be performed, e.g., to ensure that the filmgrain feature is present in the coded bitstream (122) and to enable “simpler” decoders to provide film-grain containing playback even though the decoder itself may not be capable of synthesizing and overlaying film grain. However, if the decoder has the film-grain-synthesizing capability, then the basic film grain of the coded bitstream (122) may be removed at the decoder using the decoding unit (130) and further video processing may be applied thereat. In some use cases, such further video processing may include adding another (e.g., adaptive) type of film grain in accordance with the viewing environment corresponding to the target display (140).

[0033] In one example of the above-indicated use cases, the base layer of bitstream (122) carries an SDR signal, with SDR-specific film grain injected therein. If the received bitstream is to be converted to HDR, then the SDR-specific film grain may be removed at the decoding unit (130) and replaced by a different (e.g., HDR-specific) film grain.

[0034] In another example of the above-indicated use cases, bitstream (122) may have film grain injected therein to reduce the above-mentioned banding artifacts. If the decoding unit (130) operates to convert an 8-bit based profile to a 10-bit based profile, then the corresponding processing may include removing the existing film grain and then applying a de-banding filter. The resulting filtered signal may further be processed to, inter alia, insert a different film grain.

[0035] Example embodiments disclosed herein are generally directed at performing filmgrain removal from the received bitstream (122) at the decoding unit (130). Some embodiments may rely only on spatial-domain grain-removal processing or only on temporal-domain grainremoval processing. Some other embodiments may rely on both spatial-domain and temporaldomain grain-removal processing. In an example embodiment, the grain-removal processing is performed using metadata of the received bitstream (122), which may provide useful information about the film grain of the received bitstream, thereby facilitating the grain-removal processing at the decoding unit (130).

[0036] In an example embodiment, the pipeline blocks (115, 120) may generate film grain for the coded bitstream (122) using a reproducible film-grain model, wherein a random or pseudo-random seed and a selected intensity-modulation function may be used. The seed information and model’s film-grain parameters may be transmitted as metadata in bitstream 122. As a result, the decoder at the decoding unit (130) may be provided with certain specifics of the stream’s film- grain pattern, which the decoder may beneficially utilize to efficiently generate an accurate approximation of the corresponding “clean” (i.e., grain-free) video signal, e.g., as described in more detail below.

[0037] FIG. 2 is a block diagram illustrating a video encoder (200) according to an embodiment. The encoder (200) can be used, e.g., to implement the coding block (120) of the pipeline (100) (also see FIG. 1). For illustration purposes and without any implied limitations, FIG. 2 uses the reference numerals (117) and (122) to better indicate an example relationship between the block diagrams of FIGs. 1 and 2. A person of ordinary skill in the pertinent art will readily understand that alternative arrangements for the encoder (200) within the coding block (120) are also possible.

[0038] As shown in FIG. 2, the encoder (200) operates to convert an input video signal (117) into an output bitstream (122). In this embodiment, the input video signal (117) does not have film grain encoded therein. The processing performed in the encoder (200) causes the corresponding video signal encoded in the output bitstream (122) to have digital film grain therein. In addition, the encoder (200) typically causes the output bitstream (122) to carry metadata corresponding to and/or characterizing the digital film grain.

[0039] For each image of the output bitstream (122), the encoder (200) uses a film-grain model (210) to generate a film-grain image (220). The film-grain model (210) is also typically “known” to a corresponding video decoder (e.g., see FIG. 3). Specific parameter values of the film-grain model (210) used by the encoder (200) in the generation of the film-grain image (220) may typically be communicated to the corresponding video decoder as film-grain-model metadata (212). As such, the corresponding video decoder can compute an exact copy of the film-grain image (220) (also see FIG. 3).

[0040] A film-grain injection block (240) of the encoder (200) operates to modulate, pixel- by-pixel, the noise strength of the film-grain image (220) based on the corresponding intensity of the input video signal (117). The resulting modulated film- grain image is then added to the host image to produce a corresponding film-grain-injected image. A sequence (242) of such film- grain- injected images is compressed in a video-compression module (250) of the encoder (200), thereby producing a compressed-video bitstream (252). In some embodiments, the videocompression module (250) may perform lossy compression according to a video compression standard. A metadata generator (230), with the parameter inputs (222, 244, 254) indicated in FIG. 2, generates film-grain-removal metadata (232) comprising one or more parameters that can be used for efficient and effective film-grain removal at the corresponding video decoder. Examples of such metadata (232) are described in more detail below. A combiner module (260) operates to appropriately combine the metadata (212, 232) and the compressed video bitstream (252) to generate the output bitstream (122).

[0041] FIG. 3 is a block diagram illustrating a video decoder (300) according to an embodiment. The decoder (300) can be used, e.g., to implement the decoding unit (130) of the pipeline (100) (also see FIG. 1). The decoder (300) operates to convert the received bitstream (122) into a video signal (342). In an example embodiment, the video signal (342) has the above-described digital film grain substantially fully removed therefrom and, as such, may represent a relatively accurate approximation of the source video signal (117) (also see FIG. 2).

[0042] The decoder (300) comprises a separator module (310) configured to appropriately extract the above-described metadata (212, 232) and the compressed video bitstream (252) from the received bitstream (122). As already indicated above, the film- grain model (210) is “known” to (e.g., stored in a memory of) the decoder (300). This knowledge, together with the film-grain- model metadata (212) extracted by the separator module (310), enables the decoder (300) to recompute the film-grain image (220) as indicated in FIG. 3. The film-grain image (220) may be referred to as a simulated film-grain image.

[0043] The decoder (300) further comprises a video-decompression module (320) configured to appropriately decompress the compressed video bitstream (252) received from the separator module (310), thereby generating a decompressed video signal (330). Note that, if lossy compression is used at the video-compression module (250) of the encoder (200), then the decompressed video signal (330) may differ from the sequence (242). Since the film-grain injection performed at the film-grain injection block (240) of the encoder (200) involves modulation of the film-grain image (220) by luma, the actually injected film grain may not be determined from the decompressed video signal (330) and may instead be estimated. This estimation is performed by a film-grain removal module (340) of the decoder (300) and generally relies on the recomputed film- grain image (220), the metadata (232) extracted by the separator module (310) from the received bitstream (122), and the decompressed video signal (330). Example embodiments of the film-grain removal method implemented in the film-grain removal module (340) are described in more detail below.

Film-Grain Injection Process

[0044] Let us denote the z' 1 pixel of the / 11 host image frame as Sji, and the corresponding film- grain pixel as njt. The luma modulation function is denoted as /(). The modulation function, fts^. may typically be a non-linear function of the host signal s /7 . The film-grain injected pixel, Vji, can be expressed as:

Vjt = Sji + f(sjt) - n yi (1) The film-grain injected signal after video compression, e.g., by the video-compression module (250) of the encoder (200), is denoted as: vjt = g(vjt) (2) where gQ denotes the compression function. The compressed signal without the film grain is denoted as:

Sji = g(Sjd (3)

As already indicated above, a film-grain removal process may rely only on spatial-domain processing, only on temporal-domain processing, or on a combination of spatial-domain and temporal-domain processing. Example embodiments of the film-grain removal process are described in more detail below.

Film-Grain Removal Process: Spatial Domain

[0045] FIG. 4 is a processing flowchart (400) illustrating a spatial-domain grain-removal process that can be implemented in the decoder (300) according to an embodiment. The process (400) comprises a processing block (440), wherein the decompressed video signal (330) is transformed into the output video signal (342) (also see FIG. 3). The processing block (440) also receives inputs from processing blocks (420, 430), which aid the transformation process. In general, the grain-removal process (400) is directed at recovering the host signal s /7 from the signal obtained by applying video decompression to the compressed film-grain injected signal Vjt (also see Eqs. (l)-(2)). Depending on the embodiment, the host signal actually recovered at the decoder (300) may be a copy of the host signal used for generating the bitstream (122) at the encoder (200) or may be a relatively accurate estimate of that host signal.

[0046] The input provided to the processing block (440) by the processing block (420) is based on the above-mentioned luma modulation function /() (410). Pertinent parameters representing the function /() may be supplied to the decoder (300), e.g., as part of the film-grain model (210) and/or the metadata (212) (also see FIGs. 2, 3). In an example embodiment, the processing block (420) operates to compute local first-order polynomial approximations of the function /(), which may then be used in the grain-removal processing implemented in the processing block (440), e.g., as described in more detail below. [0047] The input provided to the processing block (440) by the processing block (430) is an estimate of the distorted film-grain image (220), wherein the distortion is typically caused by the serial application of the lossy compression (250) and the corresponding decompression (320) (also see FIGs. 2, 3). Examples of how to compute such estimates are described in more detail below.

Lossless Compression

[0048] When the video-compression module (250) of the encoder (200) applies lossless compression, the function gQ defined by Eqs. (2)-(3) is effectively a bypass function having the following properties:

Vjt = g(vji) (4)

In some embodiments, the luma modulation function /() can be linear. These embodiments are described in reference to Eqs. (6)-(8). In some other embodiments, the luma modulation function /() can be nonlinear. The latter embodiments are described in reference to Eqs. (9)- (18).

[0049] A linear function /() can be expressed as follows: f(SjC) = a + b - Sjt (6) where a and b are constants. Using Eq. (6), Eq. (1) can be rewritten as follows:

The original host signal Sjt, can be recovered using the following equation obtained by rearranging Eq. (7):

Since the constants a and b can be communicated to the decoder (300), e.g., as part of the metadata (212), the film-grain removal module (340) can be programmed to recover the original host signal using Eq. (8). [0050] When the luma modulation function /() is non-linear, a close-form solution for the film-grain removal, such as that of Eq. (8), may not be available. However, in view of Eq. (8), an approximate solution can be constructed based on local linear approximations of the function /() expressed as follows: f(x) ≈ a x + b x - x (9) where a x and b x are the (local) coefficients that depend on x. More specifically, for each input codeword x, one can compute the local linear function via a first-order polynomial regression based on the neighboring codewords x G [x L , x H ] and the corresponding modulation-function output y = /(%). The range boundaries x L and x u may be determined, e.g., using Eqs. (10)- (11): x L = clip3(x - d, 0, 2 B - 1) (10) x H = clip3(x + d, 0, 2 B — 1) (11) where B is the bit depth of the video signal; and d is the neighborhood range. In some embodiments, the value of d can be selected to be close to the maximum film-grain size.

[0051] Let us present the local first-order polynomial parameters (a x , b x ) as a vector m x :

Then, let us rearrange the input and output of the function /() using the following matrix and vector v x :

An approximate solution, m° pt , for (a x , b x ) can then be obtained using a suitable least mean squares (LMS) algorithm applied to the following equation: where T in the superscript means transposed. Note that Eqs. (12)-( 15) enable, for each possible value, x ∈ [0, 2 B — 1], to have the corresponding pair of parameters (a x , b x ). In an example embodiment, Eqs. (12)-( 15) can be used to build a look-up table (LUT), wherein a sub-table A(x) maps x to a x , and a sub-table B(x) maps x to b x .

[0052] In analogy with Eq. (7), Eq. (1) can now be approximated by Eq. (16) as follows:

An approximate value of the original host signal Sji can then be obtained via the following equation obtained by rearranging Eq. (16):

Note that the right-hand part of Eq. (17) relies on the values of Sji, which may not be available at the decoder. As a result, further approximation is needed to adapt Eq. (17) for use in the program of the decoder (300). In one example embodiment, such further approximation can be obtained by making the following substitution: s ;i « . Qualitatively, this substitution can be justified by the fact that the injected film grain is a relatively weak signal, and the differences between Sj L and Vjt are typically small. After the substitution, Eq. (17) becomes Eq. (18):

Since the values of Vj t are typically available at the decoder, Eq. (18) can be used in the program of the decoder (300) to obtain estimates of the host image frame s 77 .

Lossy Compression

[0053] When the video-compression module (250) of the encoder (200) applies lossy compression, the above-described film-grain removal process adapted for lossless compression may not produce optimal results. For example, it can be noted that lossy compression may cause signal distortions, e.g., both the host video signal and the film-grain component may lose some higher frequency components thereof. Since the above-described “lossless” film-grain removal does not account for such compression-caused distortions, the residual film grain can be quite noticeable if such removal is applied in conjunction with lossy compression.

[0054] To address this problem, Eq. (18) can be further modified to arrive at Eq. (19): where Sj t denotes the estimated compressed signal; the film-grain injected signal after video compression, Vji, is given by Eq. (2); and the compressed film-grain component, rtjt, is given by Eq. (20): (20)

Note that Eq. (19) relies on the values of riji which need to be estimated at the decoder. According to example embodiments disclosed herein, such estimation can be performed using a suitable digital filter. Such a digital filter may be, e.g., a Gaussian filter, a finite impulse response (FIR) filter, a guided filter, or another suitable filter. Several illustrative filter embodiments are described below, by way of example. Based on the provided description, a person of ordinary skill in the pertinent art will be able to make and use additional filter embodiments without any undue experimentation.

[0055] According to one example embodiment, the values of are estimated using a scaled Gaussian filter, G(), expressed as follows: where kj is a scaling factor; and tij is the filter’s spectral width. The parameters kj and <jj can be included in the metadata (232).

[0056] FIGs. 5-8 graphically illustrate example film-grain removal results that can be achieved using the scaled Gaussian filter of Eq. (21). More specifically, each of FIGs. 5-7 graphically shows the dependence of the peak signal-to-noise ratio (PSNR) on the parameter o, at different respective bit rates. FIG. 5 corresponds to the bit rate of 20 Mbps. FIG. 6 corresponds to the bit rate of 10 Mbps. FIG. 7 corresponds to the bit rate of 5 Mbps. The PSNR values corresponding to cry = 0 represent results, wherein no filter is applied. FIG. 8 graphically shows the PSNR dependence on the parameter kj for Oj = 0.75 and the bit rate of 20 Mbps. Collectively, the data shown in FIGs. 5-8 indicate that a PSNR gain of up to -2.5 dB is achievable according to these embodiments.

[0057] According to another example embodiment, the values of are estimated using an FIR filter implemented in accordance with Eq. (22): where the index k is in the range [ - w jk are the filter coefficients; and ω ji denotes a

BxB pixel block having its center at the i-th pixel. [0058] The optimal filter coefficients Wj k can be obtained, e.g., as follows. At the encoder (200), the values of Vjt and Sjt are known and can be expressed as follows:

Rearrangement of Eq. (23) provides Eq. (24):

Using the approximation g(Sjt) « s^, Eq. (24) is transformed into Eq. (25), which expresses the target signal of the filtered results:

The problem of optimizing the FIR filter coefficients can then be formulated as follows:

[0059] An approximate solution to the problem of Eq. (26) can be obtained using an LMS method, e.g., as follows. First, let us arrange sets of the filter coefficients Wj k in a vector form:

The film-grain pixel values of the neighborhood of the z-th pixel’s (with value n 7 j) can similarly be represented in a vector form, , and such vectors of the P neighbor pixels can be stacked to form a corresponding matrix, Slj, expressed as follows:

The corresponding target reference signal, also in a vector form, is given by Eq. (29):

An approximate solution, w° pt , for the FIR-filter coefficients can be obtained using an LMS algorithm applied to the following equation:

The FIR-filter coefficients obtained in this manner from Eq. (30) can then be used in Eq. (22) to compute iiji. Finally, Eq. (31) can be used to compute the estimated compressed signal

[0060] The PSNR gain of FIR-filter embodiments generally increases with an increase of the filter size. However, this improvement may be achieved at the cost of an additional implementation overhead.

[0061] With respect to the guided- filter embodiments, we observe that the most noticeable residual film-grain noise may be in the image areas having relatively flat and/or smooth image patterns therein. For example, the pixel values in a flat area may be approximately constant, with local variations being primarily due to the added or incompletely/imperfectly removed film grain. As a preliminary observation, we note that, in such flat areas, the estimated compressed film grain, can be approximated using a linear model based on the decoded filmgrain-injected signal, Vj t . In an example embodiment, such a model can be expressed as follows: are the local linear- model constants within a BxB local pixel area, (pji, of an image subjected to lossy compression.

[0062] Herein, the difference between the original film-grain pattern (n 7 i) and the reconstructed film grain (n 7-£ ) is denoted as T /( . Therefore, the following is true: n 77 = nji + Tjt (33)

Applying an LMS method in a similar manner to that described above, one can find optimal linear- model constants using Eq. (34): where e is a regularization parameter. An approximate solution to Eq. (34) is given by Eqs. (35), (36): mean and variance of Vj t in <£>(. [0063] Next, one can compute the average of the local linear-model constants in a neighborhood, e.g., as follows:

Substituting these values into Eq. (32), one obtains:

In an example embodiment, the computations corresponding to Eq. (39) may use a guided-filter function, GFQ, known to persons of ordinary skill in the pertinent art. The function GFQ can be accelerated using an integral image, which is also known to persons of ordinary skill in the pertinent art. Using this function, the 2D image, Nj, consisting of the Fiji values of thej-th frame can be computed as follows: (40) whereas V 7 denotes the 2D image consisting of the Vjt values of the j-th frame; and Nj denotes the 2D image consisting of the values of the /-th frame. The neighborhood size B is a parameter of this method. As already indicated above, the value of B may be related to the filmgrain size. For example, for a larger film grain, larger B values may be more suitable. The linear-model constants are computed using the film-grain-injected image {vyj and the film-grain model {nji}, e.g., as outlined above. Finally, Eq. (41) can be used to compute the estimated compressed signal s /( :

[0064] It should be noted here that, in general, guided-filter embodiments may fall in between the Gaussian-filter embodiments and the FIR-filter embodiments in terms of the PSNR gain. However, because the guided-filter embodiments work best for the flat and smooth image areas, the improvement that is subjectively perceivable by the viewer may typically be more significant under the guided-filter approach.

Film-Grain Removal Process: Temporal Domain

[0065] In an example embodiment, film grain may have zero-mean characteristics, meaning that the image with film grain and the original image have the same DC value. As a result, the averaged pixel value for a given pixel location within a static scene tends to average out the film grain back to “zero.” Certain embodiments of temporal-domain grain removal disclosed herein are designed to take advantage of this property to remove the film grain.

[0066] FIG. 9 is a processing flowchart (900) illustrating a temporal-domain grain-removal process that can be implemented in the decoder (300) according to an embodiment. For illustration purposes and without any implied limitations, the process (900) is shown and described as receiving, as an input thereto, a sequence of images (342 J-L , ..., 342 j , . . ., 342 J+L ) produced by the process (400). However, embodiments of the process (900) are not so limited. For example, in another embodiment, the process (900) may operate directly on a sequence of the images (330) (also see FIG. 3). The sliding-window length, L, may depend on the maximum amplitude of the film-grain pattern. In general, larger L values may be more beneficial when the maximum amplitude is higher. In operation, a processing block (910) of the process (900) transforms the input sequence of the images (342) (FIG. 4) or of the images (330) (FIG. 3) into the corresponding output sequence of the images (342'), e.g., as described below, wherein the film-grain pattern in the output sequence is less noticeable than in the input sequence.

[0067] According to an example embodiment, for the z-th spatially filtered pixel of the /-th frame (342j), the temporal-domain filtering of the process (900) within a temporal sliding window provides the following signals for the sequence (342'): where the values of w-[ and w B may be adaptively determined based on the two thresholds, and T®, which are derived from the metadata parameters T A and t B as follows:

A motivation behind the use of the function fQ for modulating the thresholds an q T B i s that the injected film-grain strength is also modulated using the function /() (see, e.g., Eq. (1)).

[0068] The value of wjl can be determined, e.g., by iteratively checking the condition expressed by Inequality (45) for the sequence (342j-i, 342j-2, . . .):

For each iteration, the parameter d is increased. If Inequality (45) is not satisfied and the previous iterative value of d satisfies this inequality, then w-[ = j — d. To reduce the search time in multiple frames and to reduce the frame buffer, the sliding-window size may be upper limited by the left buffer size D L and the available frames at the beginning of the video sequence, for example, as follows:

W

[0069] The value of w-[ can similarly be determined by iteratively checking the condition expressed by Inequality (47) for the sequence (342j+i, 342j+2, . . .):

For each iteration, the parameter d is increased. If the condition (47) is not met and the previous value of d satisfies this condition, then Wjf = j + d. To reduce the search time in multiple frames and to reduce the frame buffer, the window size may be upper limited by the right buffer size D R and the available frames at the end of the video sequence, for example, as follows: where F is the number of video frames in the sequence.

[0070] Once the values of and w-f are determined as indicated above, the temporal averaging of the processing block (910) can be performed in accordance with Eq. (42). Note that the above-described temporal averaging is a pixel-based operation. The buffer sizes D L and D R can be provided to the decoder (300) as part of the metadata (232).

Preprocessing of Signals Corresponding to Alternative Film-Grain Models

[0071] Some embodiments may be adapted to be compatible with alternative film-grain models, i.e., film-grain models that are different from the above-described film-grain model. Such embodiments generally rely on signal preprocessing to provide a preprocessed signal compatible with the above-described grain-removal processes. As a non-limiting example of such preprocessing, we describe herein example preprocessing applicable to an MPEG Film Grain Model, which is defined in the corresponding standard(s).

[0072] An example MPEG film-grain model is a piecewise model according to the pixel intensity value. In each piece, different film-grain-model parameters may be used, e.g., different cutoff frequencies, different scaling factors to scale the film grain, etc. Let us assume that there are T such pieces in the j- th frame, and the partition points for those pieces are denoted as . The film-grain image in the /-th piece is denoted as , and the scaling factor for the /-th piece is denoted as J. [0073] FIG. 10 shows an example of an intensity-dependent scaling factor that can be accommodated using the above-mentioned preprocessing according to an embodiment. It can be noticed in FIG. 10 that the scaling factor can take several discrete values from the range between zero and thirty. In this case, the film-grain-injected pixel, Vj t , can be expressed as:

[0074] Comparing Eq. (49) with Eq. (1), one can notice that the piecewise scaling factor (a 1 - ) may be considered as a special use case of the non-linear modulation function discussed above. Therefore, the methodology described above in reference to the non-linear modulation function f(Sji) is applicable to this special use case. The corresponding LUTs, A (Sjt) and also see Eqs. (12)-(15), can be computed, e.g., as follows:

[0075] On the other hand, the different cutoff frequencies of the noise pattern in each piece adds an additional dimension that needs to be addressed in the preprocessing. Herein, we describe embodiments directed at constructing the full film-grain image, }, from multiple film-grain pieces { } . Once the full film-grain image is constructed through preprocessing, the subsequent grain removal may rely on one or more of the above-described grain-removal processes.

[0076] Towards building the full film-grain image, consider the z-th pixel of Vjt (see Eq. (1)) at the decoder (300). If the z-th pixel value is not near one of the boundaries of different intensity pivot points (e.g., see FIG. 10), e.g., the corresponding value is in the range p ■ + A< Vji < pj +1 — A, then the film-grain value for the full film-grain image can be in accordance with Eq. (52):

Herein, A denotes the film-grain amplitude, which is typically relatively small. [0077] However, when the pixel value is relatively close to the piece boundary, e.g., pj — A< Vji < pj + A, then the following ambiguity needs to be resolved:

Herein, the values of nJ -1 ■ can be determined based the received bitstream and metadata; and Vj t denotes the decoded pixel value. However, the determination of the host pixel value for s fi has to deal with two different options, i.e., defined by Eqs. (53) and (54).

[0078] In an example embodiment, the choice between the two possible options, can be made by analyzing preprocessing results in a local neighborhood/patch (f2j) around the i- th pixel. More specifically, the results corresponding to the correctly chosen alternative are expected to be smoother (e.g., exhibit lower residual film-grain noise) than the results corresponding to the incorrect one of the two alternatives. Based on this observation, an example embodiment is configured to compute the standard deviation for each of the two alternatives in such a patch. The standard deviation is expected that to be smaller for the correctly chosen one of the two alternatives.

[0079] Eqs. (55)-(56) provide mathematical expressions for the computation of the above- mentioned standard deviations, oy" 1 as follows:

The computed standard deviations and can then be used to select the host-image pixel values for s /7 and the film-grain pixel values

By carrying out the above-described calculations on a pixel-by-pixel basis, a full film-grain image (n ;7 } can be built, according to the decoded values, v^, and partition points, { pj } . The preprocessing results, including the LUTs ri(s 7 j), B(s ; j) (see Eqs. (50), (51)) in some embodiments, can then be used for further processing in accordance with a suitable embodiment of the grain-removal process described above.

[0080] According to an example embodiment disclosed above, e.g., in the summary section and/or in reference to any one or any combination of some or all of FIGs. 1-10, provided is an apparatus, comprising a video decoder (e.g., 130, FIG. 1) that comprises: an input interface (e.g., 310, FIG. 3) to receive a coded bitstream (e.g., 122, FIGs. 1, 3) including a compressed video bitstream (e.g., 252, FIG. 3) and metadata (e.g., 212, 232, FIG. 3), the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; and a processor (e.g., 320, 340, FIG. 3) configured to: (A) decompress (e.g., at 320, FIG. 3) the compressed video bitstream to generate a respective decompressed representation (e.g., 330, FIG. 3) of each of the images, each of the decompressed representations including a respective host-image component and a respective film-grain component; (B) compute, based on the metadata, a respective estimate of the respective film-grain component for each of the decompressed representations; and (C) remove (e.g., at 340, FIG. 3) the respective estimate from the respective decompressed representation to produce a corresponding estimate (e.g., 342, FIG. 4) of the respective host-image component.

[0081] In some embodiments of the above apparatus, the processor is further configured to: compute (e.g., at 210, FIG. 3), based on the metadata, a film-grain image (e.g., 220, FIG. 3); and compute the respective estimate using the film-grain image.

[0082] In some embodiments of any of the above apparatus, the processor is further configured to adjust (e.g., at 430, FIG. 4) the film-grain image to account for distortions caused by at least one of video compression and decompression.

[0083] In some embodiments of any of the above apparatus, the processor is configured to adjust the film-grain image using a Gaussian filter (e.g., illustrated in FIGs. 5-8). [0084] In some embodiments of any of the above apparatus, the processor is configured to adjust the film-grain image using a finite-impulse-response filter (e.g., in accordance with Eqs. (22)-(31)).

[0085] In some embodiments of any of the above apparatus, the processor is configured to adjust the film-grain image using a guided filter (e.g., in accordance with Eqs. (32)-(41)).

[0086] In some embodiments of any of the above apparatus, the processor is further configured to: compute, for patches of image pixels, standard deviations corresponding to estimated host-image pixel values (e.g., in accordance with Eqs. (55)-(56)) ; and compute the film-grain image based on the standard deviations (e.g., in accordance with Eq. (58)).

[0087] In some embodiments of any of the above apparatus, the processor is further configured to: compute (e.g., at 420, FIG. 4) first-order polynomial approximations of a nonlinear luma modulation function (e.g., /(), Eq. (1)); and compute the respective estimate using said approximations.

[0088] In some embodiments of any of the above apparatus, the processor is further configured to use the metadata to compute said approximations.

[0089] In some embodiments of any of the above apparatus, the processor is further configured to compute the respective estimate using a precomputed look-up table (e.g., 420, FIG.

4) having stored therein parameters of first-order polynomial approximations of a nonlinear luma modulation function (e.g., /(), Eq. (1)).

[0090] In some embodiments of any of the above apparatus, the processor is further configured to temporally average (e.g., at 910, FIG. 9; in accordance with Eq. (42)) a sequence of the respective decompressed representations using a plurality of temporal sliding windows, each of the temporal sliding windows corresponding to a respective image-frame pixel, at least some of the temporal sliding windows having different respective lengths (e.g., selected in accordance with Eqs. (45), (47)).

[0091] In some embodiments of any of the above apparatus, the processor is further configured to select the lengths of the temporal sliding windows based on the metadata (e.g., u A and T B , Eqs. (43), (44)). [0092] In some embodiments of any of the above apparatus, the apparatus further comprises a video encoder (e.g., 120, FIG. 1) that comprises: an output interface (e.g., 260, FIG. 2) to output the coded bitstream for the video encoder; and a video-compression module (e.g., 250, FIG. 2) to generate the compressed video bitstream using lossy compression according to a video compression standard.

[0093] According to another example embodiment disclosed above, e.g., in the summary section and/or in reference to any one or any combination of some or all of FIGs. 1-10, provided is an apparatus, comprising a video decoder (e.g., 130, FIG. 1) that comprises: an input interface (e.g., 310, FIG. 3) to receive a coded bitstream (e.g., 122, FIGs. 1, 3) including a compressed video bitstream (e.g., 252, FIG. 3) and metadata (e.g., 212, 232, FIG. 3), the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; and a processor (e.g., 320, 340, FIG. 3) configured to: decompress (e.g., at 320, FIG. 3) the compressed video bitstream to generate a respective decompressed representation (e.g., 330, FIG. 3) of each of the images, each of the decompressed representations including a respective host-image component and a respective film-grain component; and temporally average (e.g., at 910, FIG. 9; in accordance with Eq. (42)) a sequence of the respective decompressed representations using a plurality of temporal sliding windows, each of the temporal sliding windows corresponding to a respective image-frame pixel, at least some of the temporal sliding windows having different respective lengths selected (e.g., in accordance with Eqs. (45), (47)) based on the metadata (e.g., r A and T B , Eqs. (43), (44)).

[0094] In some embodiments of the above apparatus, the apparatus further comprises a video encoder (e.g., 120, FIG. 1) that comprises: an output interface (e.g., 260, FIG. 2) to output the coded bitstream for the video encoder; and a video-compression module (e.g., 250, FIG. 2) to generate the compressed video bitstream using lossy compression according to a video compression standard.

[0095] According to yet another example embodiment disclosed above, e.g., in the summary section and/or in reference to any one or any combination of some or all of FIGs. 1-10, provided is a machine-implemented method of removing film grain from video data, the method comprising the steps of: (A) receiving a coded bitstream (e.g., 122, FIGs. 1 , 3) including a compressed video bitstream (e.g., 252, FIG. 3) and metadata (e.g., 212, 232, FIG. 3), the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; (B) decompressing (e.g., at 320, FIG. 3) the compressed video bitstream to generate a respective decompressed representation (e.g., 330, FIG. 3) of each of the images, each of the decompressed representations including a respective host-image component and a respective film-grain component; (C) computing, based on the metadata, a respective estimate of the respective filmgrain component for each of the decompressed representations; and (D) removing (e.g., at 340, FIG. 3) the respective estimate from the respective decompressed representation to produce a corresponding estimate (e.g., 342, FIG. 4) of the respective host-image component.

[0096] In some embodiments of the above method, the method further comprises computing (e.g., at 210, FIG. 3), based on the metadata, a film-grain image (e.g., 220, FIG. 3); and wherein said computing the respective estimate comprises using the film-grain image.

[0097] In some embodiments of any of the above methods, the method further comprises computing (e.g., at 420, FIG. 4) first-order polynomial approximations of a nonlinear luma modulation function (e.g., /(), Eq. (1 )); and wherein said computing the respective estimate comprises using said approximations.

[0098] In some embodiments of any of the above methods, the method further comprises temporally averaging (e.g., at 910, FIG. 9; in accordance with Eq. (42)) a sequence of the respective decompressed representations using a plurality of temporal sliding windows, each of the temporal sliding windows corresponding to a respective image-frame pixel, at least some of the temporal sliding windows having different respective lengths (e.g., selected in accordance with Eqs. (45), (47)).

[0099] In some embodiments of any of the above methods, the compressed video bitstream has been generated using lossy compression according to a video compression standard.

[00100] According to yet another example embodiment disclosed above, e.g., in the summary section and/or in reference to any one or any combination of some or all of FIGs. 1-10, provided is a non-transitory machine-readable medium, having encoded thereon program code, wherein, when the program code is executed by a machine, the machine implements a method comprising the steps of: (A) receiving a coded bitstream (e.g., 122, FIGs. 1, 3) including a compressed video bitstream (e.g., 252, FIG. 3) and metadata (e.g., 212, 232, FIG. 3), the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; (B) decompressing (e.g., at 320, FIG. 3) the compressed video bitstream to generate a respective decompressed representation (e.g., 330, FIG. 3) of each of the images, each of the decompressed representations including a respective host-image component and a respective film-grain component; (C) computing, based on the metadata, a respective estimate of the respective filmgrain component for each of the decompressed representations; and (D) removing (e.g., at 340, FIG. 3) the respective estimate from the respective decompressed representation to produce a corresponding estimate (e.g., 342, FIG. 4) of the respective host-image component.

[00101] With regard to the processes, systems, methods, heuristics, etc. described herein, it should be understood that, although the steps of such processes, etc. have been described as occurring according to a certain ordered sequence, such processes could be practiced with the described steps performed in an order other than the order described herein. It further should he understood that certain steps could be performed simultaneously, that other steps could be added, or that certain steps described herein could be omitted. In other words, the descriptions of processes herein are provided for the purpose of illustrating certain embodiments, and should in no way be construed so as to limit the claims.

[00102] Accordingly, it is to be understood that the above description is intended to be illustrative and not restrictive. Many embodiments and applications other than the examples provided would be apparent upon reading the above description. The scope should be determined, not with reference to the above description, but should instead be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. It is anticipated and intended that future developments will occur in the technologies discussed herein, and that the disclosed systems and methods will be incorporated into such future embodiments. In sum, it should be understood that the application is capable of modification and variation.

[00103] All terms used in the claims are intended to be given their broadest reasonable constructions and their ordinary meanings as understood by those knowledgeable in the technologies described herein unless an explicit indication to the contrary is made herein. In particular, use of the singular articles such as “a,” “the,” “said,” etc. should be read to recite one or more of the indicated elements unless a claim recites an explicit limitation to the contrary.

[00104] The Abstract of the Disclosure is provided to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in various embodiments for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments incorporate more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in fewer than all features of a single disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separately claimed subject matter.

[00105] While this disclosure includes references to illustrative embodiments, this specification is not intended to be construed in a limiting sense. Various modifications of the described embodiments, as well as other embodiments within the scope of the disclosure, which are apparent to persons skilled in the art to which the disclosure pertains are deemed to lie within the principle and scope of the disclosure, e.g., as expressed in the following claims.

[00106] Some embodiments may be implemented as circuit-based processes, including possible implementation on a single integrated circuit.

[00107] Some embodiments can be embodied in the form of methods and apparatuses for practicing those methods. Some embodiments can also be embodied in the form of program code recorded in tangible media, such as magnetic recording media, optical recording media, solid state memory, floppy diskettes, CD-ROMs, hard drives, or any other non-transitory machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the patented invention(s). Some embodiments can also be embodied in the form of program code, for example, stored in a non-transitory machine-readable storage medium including being loaded into and/or executed by a machine, wherein, when the program code is loaded into and executed by a machine, such as a computer or a processor, the machine becomes an apparatus for practicing the patented invention(s). When implemented on a general -purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.

[00108] Unless explicitly stated otherwise, each numerical value and range should be interpreted as being approximate as if the word “about” or “approximately” preceded the value or range. [00109] The use of figure numbers and/or figure reference labels in the claims is intended to identify one or more possible embodiments of the claimed subject matter in order to facilitate the interpretation of the claims. Such use is not to be construed as necessarily limiting the scope of those claims to the embodiments shown in the corresponding figures.

[00110] Although the elements in the following method claims, if any, are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.

[00111] Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the disclosure. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments necessarily mutually exclusive of other embodiments. The same applies to the term “implementation.”

[00112] Unless otherwise specified herein, the use of the ordinal adjectives “first,” “second,” “third,” etc., to refer to an object of a plurality of like objects merely indicates that different instances of such like objects are being referred to, and is not intended to imply that the like objects so referred-to have to be in a corresponding order or sequence, either temporally, spatially, in ranking, or in any other manner.

[00113] Unless otherwise specified herein, in addition to its plain meaning, the conjunction “if’ may also or alternatively be construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” which construal may depend on the corresponding specific context. For example, the phrase “if it is determined” or “if [a stated condition] is detected” may be construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event]” or “in response to detecting [the stated condition or event].”

[00114] Also for purposes of this description, the terms “couple,” “coupling,” “coupled,” “connect,” “connecting,” or “connected” refer to any manner known in the art or later developed in which energy is allowed to be transferred between two or more elements, and the interposition of one or more additional elements is contemplated, although not required. Conversely, the terms “directly coupled,” “directly connected,” etc., imply the absence of such additional elements.

[00115] As used herein in reference to an element and a standard, the term compatible means that the element communicates with other elements in a manner wholly or partially specified by the standard, and would be recognized by other elements as sufficiently capable of communicating with the other elements in the manner specified by the standard. The compatible element does not need to operate internally in a manner specified by the standard.

[00116] The functions of the various elements shown in the figures, including any functional blocks labeled as “processors” and/or “controllers,” may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non volatile storage. Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.

[00117] As used in this application, the terms “circuit,” “circuitry” may refer to one or more or all of the following: (a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry); (b) combinations of hardware circuits and software, such as (as applicable): (i) a combination of analog and/or digital hardware circuit(s) with software/firmware and (ii) any portions of hardware processor(s) with software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions); and (c) hardware circuit(s) and or processor(s), such as a microprocessors) or a portion of a microprocessor(s), that requires software (e.g., firmware) for operation, but the software may not be present when it is not needed for operation.” This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.

[00118] It should be appreciated by those of ordinary skill in the art that any block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the disclosure. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.

[00119] “BRIEF SUMMARY OF SOME SPECIFIC EMBODIMENTS” in this specification is intended to introduce some example embodiments, with additional embodiments being described in “DETAILED DESCRIPTION” and/or in reference to one or more drawings. “BRIEF SUMMARY OF SOME SPECIFIC EMBODIMENTS” is not intended to identify essential elements or features of the claimed subject matter, nor is it intended to limit the scope of the claimed subject matter.

[00120] Various aspects of the present invention may be appreciated from the following enumerated example embodiments (EEEs):

EEE1. A video delivery system capable of film-grain removal, the system comprising a video decoder that comprises: an input interface to receive a coded bitstream including a compressed video bitstream and metadata, the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; and a processor configured to: decompress the compressed video bitstream to generate a respective decompressed representation of each of the images, each of the decompressed representations including a respective host-image component and a respective film-grain component; compute, based on the metadata, a respective estimate of the respective film-grain component for each of the decompressed representations; and remove the respective estimate from the respective decompressed representation to produce a corresponding estimate of the respective host-image component.

EEE2. The video delivery system of EEE 1, wherein the processor is further configured to: compute, based on the metadata, a film-grain image; and compute the respective estimate using the film-grain image.

EEE3. The video delivery system of EEE 2, wherein the processor is further configured to: compute, for patches of image pixels, standard deviations corresponding to estimated host- image pixel values; and compute the film-grain image based on the standard deviations.

EEE4. The video delivery system of EEE 2 or EEE 3, wherein the processor is further configured to adjust the film-grain image to account for distortions caused by at least one of video compression and decompression.

EEE5. The video delivery system of EEE 4, wherein the processor is configured to adjust the film-grain image using a Gaussian filter.

EEE6. The video delivery system of EEE 4 or EEE 5, wherein the processor is configured to adjust the film-grain image using a finite-impulse-response filter.

EEE7. The video delivery system of any of EEEs 4-6, wherein the processor is configured to adjust the film-grain image using a guided filter.

EEE8. The video delivery system of any of EEEs 1-7, wherein the processor is further configured to: compute first-order polynomial approximations of a nonlinear luma modulation function; and compute the respective estimate using said approximations.

EEE9. The video delivery system of EEE 8, wherein the processor is further configured to use the metadata to compute said approximations.

EEE10. The video delivery system of any of EEEs 1-9, wherein the processor is further configured to compute the respective estimate using a precomputed look-up table having stored therein parameters of first-order polynomial approximations of a nonlinear luma modulation function.

EEE11. The video delivery system of any of EEEs 1-10, wherein the processor is further configured to: temporally average a sequence of the respective decompressed representations using a plurality of temporal sliding windows, each of the temporal sliding windows corresponding to a respective image-frame pixel, at least some of the temporal sliding windows having different respective lengths; and select the lengths of the temporal sliding windows based on the metadata.

EEE12. The video delivery system of any of EEEs 1-11, further comprising a video encoder that comprises: an output interface to output the coded bitstream for the video encoder; and a video-compression module to generate the compressed video bitstream using lossy compression according to a video compression standard.

EEE13. A video delivery system capable of film-grain removal, the system comprising a video decoder that comprises: an input interface to receive a coded bitstream including a compressed video bitstream and metadata, the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; and a processor configured to: decompress the compressed video bitstream to generate a respective decompressed representation of each of the images, each of the decompressed representations including a respective host-image component and a respective film-grain component; and temporally average a sequence of the respective decompressed representations using a plurality of temporal sliding windows, each of the temporal sliding windows corresponding to a respective image-frame pixel, at least some of the temporal sliding windows having different respective lengths selected based on the metadata.

EEE14. The video delivery system of EEE 13, further comprising a video encoder that comprises: an output interface to output the coded bitstream for the video encoder; and a video-compression module to generate the compressed video bitstream using lossy compression according to a video compression standard.

EEE15. A machine-implemented method of removing film grain from video data, the method comprising: receiving a coded bitstream including a compressed video bitstream and metadata, the compressed video bitstream having encoded therein a sequence of images containing digital film grain, the metadata including one or more parameters corresponding to the digital film grain; decompressing the compressed video bitstream to generate a respective decompressed representation of each of the images, each of the decompressed representations including a respective host-image component and a respective film-grain component; computing, based on the metadata, a respective estimate of the respective film-grain component for each of the decompressed representations; and removing the respective estimate from the respective decompressed representation to produce a corresponding estimate of the respective host-image component.

EEE16. The method of EEE 15, further comprising computing, based on the metadata, a filmgrain image; and wherein said computing the respective estimate comprises using the film-grain image. EEE 17. The method of EEE 15 or EEE 16, further comprising computing first-order polynomial approximations of a nonlinear luma modulation function; and wherein said computing the respective estimate comprises using said approximations. EEE18. The method of any of EEEs 15-17, further comprising temporally averaging a sequence of the respective decompressed representations using a plurality of temporal sliding windows, each of the temporal sliding windows corresponding to a respective image-frame pixel, at least some of the temporal sliding windows having different respective lengths.

EEE19. The method of any of EEEs 15-18, wherein the compressed video bitstream has been generated using lossy compression according to a video compression standard.

EEE20. A non-transitory machine-readable medium, having encoded thereon program code, wherein, when the program code is executed by a machine, the machine performs operations comprising the method of any of EEEs 15-19.