Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NEURAL NETWORK CODEC WITH HYBRID ENTROPY MODEL AND FLEXIBLE QUANTIZATION
Document Type and Number:
WIPO Patent Application WO/2023/245460
Kind Code:
A1
Abstract:
Innovations in systems, methods, and software for features of a neural image or video codec are described herein. For example, a neural video encoder can receive a current video frame, encode the current video frame to produce encoded data, and output the encoded data as part of a bitstream. As part of the encoding, the encoder can determine a current latent representation for the current video frame, and encode the current latent representation using an entropy model network that includes one or more convolutional layers. As part of the encoding the current latent representation, the encoder can estimate statistical characteristics of a quantized version of the current latent representation based at least in part on a previous latent representation for a previous video frame, and entropy code the quantized version of the current latent representation based at least in part on the estimated statistical characteristics.

Inventors:
LI JIAHAO (US)
LI BIN (US)
LU YAN (US)
Application Number:
PCT/CN2022/100259
Publication Date:
December 28, 2023
Filing Date:
June 21, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MICROSOFT TECHNOLOGY LICENSING LLC (US)
LI JIAHAO (CN)
International Classes:
H04N19/124; G06N3/04; H04N19/13; H04N19/186; H04N19/517; H04N19/91
Foreign References:
US20210152831A12021-05-20
Other References:
JIAHAO LI ET AL: "Deep Contextual Video Compression", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 30 September 2021 (2021-09-30), XP091061573
XIHUA SHENG ET AL: "Temporal Context Mining for Learned Video Compression", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 27 November 2021 (2021-11-27), XP091104088
REN YANG ET AL: "Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 6 December 2020 (2020-12-06), XP081893227, DOI: 10.1109/JSTSP.2020.3043590
LU GUO ET AL: "An End-to-End Learning Framework for Video Compression", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE COMPUTER SOCIETY, USA, vol. 43, no. 10, 20 April 2020 (2020-04-20), pages 3292 - 3308, XP011875084, ISSN: 0162-8828, [retrieved on 20210901], DOI: 10.1109/TPAMI.2020.2988453
HAOJIE LIU ET AL: "Neural Video Coding using Multiscale Motion Compensation and Spatiotemporal Context Model", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 9 July 2020 (2020-07-09), XP081718592
PREVIOUSLY, HE ET AL.: "Checkerboard context model for efficient learned image compression", PROCEEDINGS OF THE IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2021, pages 14771 - 14780
RANJANBLACK: "Optical flow estimation using a spatial pyramid network", PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2017, pages 4161 - 4170
SHENG ET AL.: "Temporal Context Mining for Learned Video Compression", ARXIV:2111.13850, 2021
CHAN ET AL.: "BasicVSR: The search for essential components in video super-resolution and beyond", PROCEEDINGS OF THE IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2021, pages 4947 - 4956
XUE ET AL.: "Video Enhancement with Task-Oriented Flow", INTERNATIONAL JOURNAL OF COMPUTER VISION (UCV, vol. 127, no. 8, 2019, pages 1106 - 1125, XP036827686, DOI: 10.1007/s11263-018-01144-2
MERCAT ET AL.: "UVG dataset: 50/120fps 4K sequences for video codec analysis and development", PROCEEDINGS OF THE 11TH ACM MULTIMEDIA SYSTEMS CONFERENCE, 2020, pages 297 - 302, XP058459572, DOI: 10.1145/3339825.3394937
WANG ET AL.: "2016 IEEE International Conference on Image Processing (ICIP", 2016, IEEE, article "MCL-JCV: a JND-based H. 264/AVC video quality assessment dataset", pages: 1509 - 1513
LU ET AL.: "An end-to-end learning framework for video compression", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 43, no. 10, 2020, pages 3292 - 3308, XP011875084, DOI: 10.1109/TPAMI.2020.2988453
LIN ET AL.: "M-LVC: multiple frames prediction for learned video compression", PROCEEDINGS OF THE IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2020
YANG ET AL.: "Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model", IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, vol. 15, no. 2, 2021, pages 388 - 401, XP011840033, DOI: 10.1109/JSTSP.2020.3043590
LI ET AL.: "Deep contextual video compression", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, vol. 34, 2021
WANG ET AL.: "Thirty-Seventh Asilomar Conference on Signals, Systems & Computers", vol. 2, 2003, IEEE, article "Multiscale structural similarity for image quality assessment", pages: 1398 - 1402
Attorney, Agent or Firm:
SHIHUI PARTNERS (CN)
Download PDF: