CONTEXT-BASED LOSSLESS IMAGE COMPRESSION FOR EVENT CAMERA

Title:

CONTEXT-BASED LOSSLESS IMAGE COMPRESSION FOR EVENT CAMERA

Document Type and Number:

WIPO Patent Application WO/2023/160789

Kind Code:

Abstract:

Methods for efficient storage and encoding of an event stream (3) from an event camera (1) by either converting the event stream (3) into event frames (8) and generating a combined event frame (11) from a plurality of event frames (8) to be processed as an image; or storing the spatial information (5) and the polarity information (6) from the number of event frames (8) in separately optimized data structures; wherein spatial information (5) is stored in a single Event Map Image (15) and polarity information (6) is merged into a polarity vector (16). The stored event data is encoded using a context-based lossless encoding method, wherein only the pixel positions where at least one event (4) occurs are encoded, according to a category index (20) based on the number of events detected, and an event frame index (23) representing the position of these events in the event frame stack.

More Like This:

JP4485409	Video decoding device
WO/2024/082145	METHOD FOR ENCODING AND DECODING A POINT CLOUD
JP2012138947	VIDEO ENCODER, VIDEO DECODER, VIDEO ENCODING METHOD, VIDEO DECODING METHOD, VIDEO ENCODING PROGRAM AND VIDEO DECODING PROGRAM

Inventors:

SCHIOPU IONUT (SE)
BILCU RADU (SE)

Application Number:

PCT/EP2022/054639

Publication Date:

August 31, 2023

Filing Date:

February 24, 2022

Export Citation:

Click for automatic bibliography generation Help

Assignee:

HUAWEI TECH CO LTD (CN)
SCHIOPU IONUT (SE)

International Classes:

H04N19/61; H03M7/40

Domestic Patent References:

WO2019067732A1

2019-04-04

Other References:

GUILLERMO GALLEGO ET AL: "Event-based Vision: A Survey", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 8 August 2020 (2020-08-08), XP081735140, DOI: 10.1109/TPAMI.2020.3008413
ANH NGUYEN ET AL: "Real-Time 6DOF Pose Relocalization for Event Cameras with Stacked Spatial LSTM Networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 22 August 2017 (2017-08-22), XP080968741
MOHAMMAD MOSTAFAVI I S ET AL: "Event-based High Dynamic Range Image and Very High Frame Rate Video Generation using Conditional Generative Adversarial Networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 20 November 2018 (2018-11-20), XP081784455, DOI: 10.1109/CVPR.2019.01032
INNOCENTI SIMONE UNDRI ET AL: "Temporal Binary Representation for Event-Based Action Recognition", 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), IEEE, 10 January 2021 (2021-01-10), pages 10426 - 10432, XP033910139, DOI: 10.1109/ICPR48806.2021.9412991
KHAN NABEEL ET AL: "Time-Aggregation-Based Lossless Video Encoding for Neuromorphic Vision Sensor Data", IEEE INTERNET OF THINGS JOURNAL, IEEE, USA, vol. 8, no. 1, 8 July 2020 (2020-07-08), pages 596 - 609, XP011827528, DOI: 10.1109/JIOT.2020.3007866
ANONYMOUS: "Balanced ternary - Wikipedia", 1 February 2022 (2022-02-01), pages 1 - 9, XP055970956, Retrieved from the Internet [retrieved on 20221013]
M. GEHRIGW. AARENTSD. GEHRIGD. SCARAMUZZA: "DSEC: A Stereo Event Camera Dataset for Driving Scenarios", IEEE TRANS. ROBOT. AUTOM., vol. 6, no. 3, July 2021 (2021-07-01), pages 4947 - 4954

Attorney, Agent or Firm:

KREUZ, Georg M. (DE)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A method comprising:

- receiving an event stream (3) from an event camera (1); wherein the event stream (3) comprises a plurality of events (4), and each event comprises spatial information (5), timestamp (7), and polarity information (6) associated with a change in brightness;

- converting the event stream (3) received over a first time period into a plurality of event frames (8) by dividing the first time period into a plurality of sub-periods (9), each subperiod (9) corresponding to an event frame (8), and assigning an event frame symbol (10) to each pixel of each event frame (8), each event frame symbol (10) representing the existence and polarity information (6) of an event (4) detected during a respective subperiod (9), and being assigned based on the spatial information (5) and timestamps (7) of detected events (4); and

- generating a combined event frame (11) by merging a number of event frames (8), wherein merging the number of the event frames (8) comprises converting all the event frame symbols (10) assigned to a corresponding pixel of the number of event frames (8) into a combined event frame symbol (12) of the corresponding pixel of the event frame (11).

2. The method according to claim 1 , wherein the event frame symbols (10) are merged into combined event frame symbols (12) according to the following formula: wherein CEF represents the combined event frame symbol (12), EF represents the event frame symbols (10), k is the number of event frame symbols (10), and n is the number of event frames (8) to be merged into the combined event frame (11).

3. The method according to claim 2, wherein the number of event frame symbols (10) is k=3, representing the polarity information (6) as either

- a positive polarity event (4), wherein an increase in brightness is detected at a respective pixel of the event frame (8);

- a negative polarity event (4), wherein a decrease in brightness is detected at a respective pixel of the event frame (8); or - no event (4) detected at a respective pixel of the event frame (8) during a respective subperiod (9); and wherein the number of combined event frame symbols (12) is 3ⁿ, wherein n is the number of event frames (8) to be merged into a combined event frame (11).

4. The method according to claim 3, wherein the number of event frames (8) to be merged into a combined event frame (11) is n=5, and wherein the event frame symbols (10) are merged into combined event frame symbols (12) according to the following formula:

CEF = 3⁴EF_L-4 + 3³FF;_₃ + 3²FF;_₂ + 3¹EF_i-1 + EF_t.

5. The method according to any one of claims 1 to 4, the method further comprising encoding a plurality of subsequent combined event frames (11), either by

- collecting multiple combined event frames (11) in a video sequence (13) and encoding the video sequence (13) by employing a video coding standard, such as High Efficiency Video Coding, HEVC; or

- encoding each combined event frame (11) as a single raw image (14) by employing a lossless image compression codec, such as Context-based Adaptive Lossless Image Codec, CALIC, or Free Lossless Image Format, FLIF, codec.

6. A method comprising:

- receiving (101) an event stream (3) from an event camera (1); wherein the event stream (3) comprises a plurality of events (4), and each event (4) comprises spatial information (5), timestamp (7), and polarity information (6) associated with a change in brightness;

- converting (102) the event stream (3) received over a first time period into a number of event frames (8) by dividing the first time period into a plurality of sub-periods (9), each sub-period (9) corresponding to an event frame (8), and binning the spatial information (5) and polarity information (6) of detected events (4) into event frames (8) based on the respective timestamps (7) of the events (4); and

- storing (103) the spatial information (5) and the polarity information (6) of the events (4) from the number of event frames (8) in separate data structures optimized for the respective type of information to be stored.

7. The method according to claim 6, wherein storing the spatial information (5) comprises merging spatial information (5) from the number of event frames (8) into a single Event Map Image (15) stored as a set of image bit-planes (25).

8. The method according to claim 7, wherein the spatial information (5) contained in an Event Map Image (15) is stored using a combination of:

- a Binary Map (18) comprising a Binary Map symbol (19) assigned to each pixel of the Binary Map (18) to signal the locations where at least one event (4) has occurred in corresponding pixel of any of the number of event frames (8);

- a category index (20) representing the number of events (4), nrEv, that occurred at a respective pixel of the Binary Map (18) signaled by a Binary Map symbol (19), and

- an event frame index (23) representing the individual event frames (8) in the number n of event frames (8) where any event (4) occurred at a respective pixel of the Binary Map (18) signaled by a Binary Map symbol (19).

9. The method according to claim 8, wherein the Binary Map (18) is encoded using a template context model, wherein

- for each pixel of the Binary Map (18), a causal neighborhood (26) of a respective Binary Map symbol is determined;

- an optimal model order is determined by estimating the codelengths needed to encode the Binary Map (18) using different numbers of neighbors collected in the order from the causal neighborhood (26); and

- the optimal model order is encoded and then used to encode the Binary Map (18) traversed in a raster scan order, by encoding each Binary Map symbol (19) based on the distribution model associated to the context index computed using optimal order neighbors found in its respective causal neighborhood (26).

10. The method according to any one of claims 8 or 9, wherein the category index (20) is represented using category index symbols (21) in an alphabet (28), each category index symbol (21) comprising a number p of bit-planes (25); and wherein encoding the category index (20) comprises encoding the category index symbols (21) bit-plane-by-bit-plane, wherein

- the first bit-plane (25) of the category index (20), denoted BP^' , is encoded in a raster scan, by encoding the first bit in the representation of each category index symbol (21) using a template context model based on a context computed for each bit of the category index symbol (21) determined from its respective causal neighborhood (26) on the first bitplane BPQ ¹ (25); and

- each subsequent bit-plane (25) of the category index (20), denoted BP ¹, i = 1, 2, ... , p -

I, is encoded using a template context model, wherein the respective context is determined using the respective causal neighborhood (26) from the current bit-plane BP-^:I, and a respective context template (27) from its preceding at least one bit-plane BP J^.

I I . The method according to claim 10, wherein for each subsequent bit-plane (25) BP ¹ having at least two preceding bit-planes (25), the respective context is determined using context templates (27) from the preceding two bit-planes (25) BP^ and BPil₂.

12. The method according to any one of claims 8 to 11 , wherein the event frame index (23) comprises event frame index symbols (24) representing the individual event frames (8) in the number of event frames (8) where any event (4) occurred at a respective pixel; and wherein each event frame index (23) is encoded according to the category index (20) of a respective pixel, wherein each category the category index (20) represents the total number of events (4), nrEv, detected at the respective pixel during the number of event frames (8); by

- dividing an alphabet (28) into a number n of sub-alphabets (29), and associating a subalphabet (29) to each category;

- remapping the event frame index symbols (24) to remapped symbols (30) of the corresponding sub-alphabets (29), based on the respective category index (20); and

- associating a category symbol (22) based on the respective category index (20).

13. The method according to claim 12, wherein the sub-alphabets (29) comprise different numbers of symbols, and each sub-alphabet (29) is associated to a category based on the number of symbols needed for remapping the event frame index symbols (24) of a corresponding category.

14. The method according to claim 13, wherein each event frame index (23) comprises Cn^rEv symbol combinations, wherein n represents the number of event frames (8) and nrEv corresponds to the category index (20) representing the total number of events (4) that occurred in the n number of event frames (8), and wherein each sub-alphabet (29) therefore comprises all Cn^rEv symbols.

15. The method according to any one of claims 12 to 14, wherein the remapped symbols (30) and associated category symbols (22) are merged into a number n of category vectors (31), the number n of category vectors (31) corresponding to the number n of event frames (8), and wherein the first n - 1 category vectors (31) are encoded using an adaptive Markov model, the last category vector (31) being associated with deterministic cases wherein an event (4) occurs in each event frame (8).

16. The method according to any one of claims 6 to 15, wherein storing the polarity information (6) from the number of event frames (8) comprises merging polarity information (6) from each event frame (8) into a polarity vector (16) comprising binary polarity symbols (17) determined based on an increase or decrease in brightness during a detected event (4).

17. The method according to claim 16, wherein the polarity symbols (17) from each polarity vector (16) of the number of event frames (8) are concatenated into a concatenated polarity vector (32).

18. The method according to claim 16, wherein polarity information (6) from the number n of event frames (8) is stored in a number n of respective polarity vectors PV_t (16), by traversing the spatial information (5) of the event frames (8), either bit-plane-by-bit-plane or event frame-by-event frame.

19. The method according to claim 18, wherein the number n of polarity vectors (16) PV_t are encoded, in an order of the respective event frames (8) EF_i=1^_n received over n subperiods (9) , wherein each polarity vector (16) PVt corresponds to all events (4) that occurred in a corresponding event frame (8) EFi and is encoded either

- as a vector using adaptive Markov modelling, or

- by traversing the Event Map Image (15) using the Binary Map (18) and encoding each polarity symbol (17) using a template context model, wherein the respective context of each PVi symbol is determined using its respective causal neighborhood (26) of polarity symbols (17) from the current event frame (8) EF and wherein at least one preceding event frame (8) EF^ is used as a respective context template (27); and wherein for each event frame (8) EF_t having at least two preceding event frames (8), the respective context for a current polarity symbol (17) in the current polarity vector (16) PV_t is determined using context templates (27) from the preceding two event frames (8) EF^ and EFI_₂.

20. The method according to any one of claim 6 to 19, wherein the spatial information (5) and polarity information (6) from the number n of event frames (8) is encoded using a Sparse Coding Mode, SCM, wherein if a total number of events (4) N detected during a number n of sub-periods (9) is below an event threshold, SCM is activated and at least one of the spatial information (5) or the polarity information (6) is encoded using lower complexity encoding methods; otherwise SCM is not activated and at least one of the spatial information (5) or the polarity information (6) is encoded according to the method of any one of claims 6 to 19.

21. The method according to claim 20, wherein in response to a determination that the total number of events (4) is below a first event threshold N < ET_{l t} Sparse Coding Mode is activated (104) to encode the spatial information (5) as follows:

- one bit is always used to signal if Sparse Coding Mode is on or off;

- if Sparse Coding Mode is on, then:

-- N is encoded in step (105) using log^ET^ bits; and

-- for each event (4) spatial information (5) is encoded in step (106) as follows: x_t using log₂ PP) bits; y_t using log₂(W) bits; and event frame index (23) is encoded in step (107) using log₂(n) bits, wherein H is the event camera sensor (1) height, W is the event camera sensor (1) width.

22. The method according to any one of claims 20 or 21 , wherein in response to a determination that the total number of events (4) is below a second event threshold N < ET₂ , Sparse Coding Mode is activated in step (108) to encode the polarity information (6) as follows:

- the polarity symbols (17) from each polarity vector (16) of the number of event frames (8) are concatenated in step (109) into a single concatenated polarity vector (32) according to claim 17; and - the concatenated polarity vector (32) is encoded in step (110) using an 0 -order Markov mode.

23. The method according to any one of claim 6 to 22, wherein the number of event frames (8) is in the range of 1 to 32, depending on the length of the sub-periods A (9).

24. A computer-based system comprising

- an event camera (1) configured to record an event stream (3); and

- a processor coupled to a storage device and configured to convert the event stream (3) into a number of event frames (8); wherein the storage device comprises instructions that, when executed by the processor, cause the computer-based system to merge the information from the number of event frames (8) into a combined event frame (11) by executing a method according to any one of claims 1 to 5.

25. A computer-based system comprising:

- an event camera (1) configured to record an event stream (3); and

- a processor coupled to a storage device and configured to convert the event stream (3) into a number of event frames (8); wherein the storage device comprises instructions that, when executed by the processor, cause the computer-based system to process the information from the number of event frames (8) according to the method of any one of claims 6 to 23.

26. A non-transitory computer readable medium having stored thereon program instructions that, when executed by a processor, cause the processor to perform the method according to any one of claims 1 to 5.

27. A non-transitory computer readable medium having stored thereon program instructions that, when executed by a processor, cause the processor to perform the method according to any one of claims 6 to 23.

Description:

CONTEXT-BASED LOSSLESS IMAGE COMPRESSION FOR EVENT CAMERA

TECHNICAL FIELD

The disclosure relates in general to the field of digital image processing, and more specifically to the encoding of event data captured by an event camera.

BACKGROUND

An event camera, also known as a neuromorphic camera, silicon retina or dynamic vision sensor, is an imaging sensor that responds to local changes in brightness. Event cameras do not capture images using a shutter as conventional (frame) cameras do. Instead, each pixel inside an event camera operates independently and asynchronously, reporting changes in brightness as they occur, and staying silent otherwise.

During operation, each pixel stores a reference brightness level, and continuously compares it to the current brightness level. If the difference in brightness exceeds a threshold, that pixel resets its reference level and generates an event: a discrete packet that contains the pixel address, timestamp, and polarity (increase or decrease) of a brightness change, or an instantaneous measurement of the illumination level. Thus, event cameras output an asynchronous stream of events triggered by changes in scene illumination.

As a result, only information about variable objects is included within the event stream delivered by a pixel event sensor, and there is no information about homogeneous surfaces or motionless backgrounds.

Such event cameras can offer significant advantages over conventional (color) cameras, namely a high dynamic range and a low latency. In particular, event cameras provide the possibility of a very high temporal resolution as asynchronous events can be triggered at the smallest timestamp distance of 10 ^-6 s, which is equivalent to achieving a frame rate of up to 10 ⁶ frames per second (fps). Due to these advantages, event cameras can be efficiently applied in the technical fields of object recognition, autonomous vehicles, and robotics, among others. In certain circumstances, such as textured scenes with rapid motion, millions of events are generated per second. In order to process such busy scenes, existing event processes can require massive parallel computations.

One existing event stream encoding solution is the Spike Coding approach that is inspired by state-of-the-art solutions proposed to encode a large amount of data, exploits the spatial and temporal characteristics of the event location information for compression. This approach includes an adaptive macro-cube partitioning structure, and address- and timeprior modes. The asynchronous events are packed into macro-cubes which are encoding separately the spatial, time, and polarity information. Consequently, this approach encodes all raw events that occurred in a time interval, however, for some applications not all events are required and a lossy solution is more desired. Furthermore, increasing the distance between the events and accumulating the event information the performance of Spike Coding approach will be affected, which is undesirable. In other words, this solution is designed to deal only with raw asynchronous data, and not for encoding event frames.

Another encoding approach is time aggregation-based lossless video encoding for neuromorphic vision sensor data algorithm, where an event sequence is encoded by accumulating the events over a time interval. The event sequence is generating two separate frames, one associated to the increase of luminance intensity and one to the decrease of luminance, i.e., according to the event positive and negative polarity, respectively. The two frames are then concatenated into one single “superframe” composed of the “positive polarity” frame on the left side and the “negative polarity” frame on the right side. Finally, the superframes are encoded using the High Efficiency Video Coding (HEVC) codec. In the case of higher resolution sensors, e.g., 640x480, 1280x720, the captured events are sparsely distributed in the frames, and the superframes are not efficiently encoded by HEVC, since the video codec is designed to encode a different type of information (found in color formats) than event counts. Hence, one major disadvantage is that the method’s performance depends on the performance of the selected video codec. Another disadvantage is that the method does not provide a solution for the lossless case, when all events and all their corresponding information must be encoded by the event data codec.

Considering the increasingly high frame rates that the event camera can achieve, whereby raw representation of event sequences reaches very high levels, there is a growing need for a method or system that provides a solution for efficient encoding of raw event data received from these event cameras, and in particular for a solution that is suitable for lossless event data compression.

SUMMARY

Accordingly, described herein are systems and methods of efficient data processing for event cameras. Such systems and methods accumulate events received within a time period from an event camera in the form of an event stream, convert the asynchronous events into event frames, and efficiently encode lossless the event frames for further processing by other applications.

The foregoing and other objects are achieved by the features of the independent claims which disclose a novel context-based lossless image compression codec for event camera sensors, and a new lossless coding framework for event data, where spatial information is encoded as a packet of multiple event frames and polarity information is encoded by traversing the spatial information. The disclosed methods provide a novel event data representation, where spatial and polarity information are encoded separately; as well as a novel strategy for encoding event spatial information using a binary map signaling the positions where at least one event occurs in the event frames, the number of events, and a corresponding event frame index. Further provided is a novel event frame-based polarity coding algorithm; and a novel Sparse Coding Mode (SCM) activated under specific eventsparsity constraints.

According to a first aspect, there is provided a method comprising receiving an event stream from an event camera; converting the event stream into a plurality of event frames; and generating a combined event frame by merging a number of event frames. Merging the number of the event frames is done by converting event frame symbols assigned to pixels of the event frames into a combined event frame symbol of the corresponding pixel of the event frame based on the spatial information, timestamp, and polarity information of each event in the event stream.

Generating a combined event frame according to this method provides a more efficient representation of event stream data, since information from multiple event frames is merged into a single image format that can then be further processed, either using a lossless image compression codec by encoding the combined event frames as single images; or using a video codec by collecting multiple combined event frames in a video sequence

In a possible implementation form of the first aspect the event frame symbols are merged into combined event frame symbols according to the following formula: wherein CEF represents the combined event frame symbol, EFrepresents the event frame symbols, k is the number of event frame symbols, and n is the number of event frames to be merged into the combined event frame. Using this mapping formula enables efficient merging of the symbols of the event frames into combined event frame symbols.

In a further possible implementation form of the first aspect the number of event frame symbols is k=3, representing the polarity information as either a positive polarity event, wherein an increase in brightness is detected at a respective pixel of the event frame; a negative polarity event, wherein a decrease in brightness is detected at a respective pixel of the event frame; or no event detected at a respective pixel of the event frame during a respective sub-period. In this implementation form the number of possible combined event frame symbols is 3 ⁿ, wherein n is the number of event frames to be merged into a combined event frame. It has been found by the inventors that these particular parameters provide a surprisingly effective method for merging a number of event frames.

In a further possible implementation form of the first aspect the number of event frames to be merged into a combined event frame is n=5, and the event frame symbols are merged into combined event frame symbols according to the following formula:

CEF = 3 ⁴EFi_ ₄ + 3 ³EF;_ ₃ + 3 ²EF;_ ₂ + 3 ¹EF _i-1 + EF

It has been found by the inventors that these particular parameters provide a surprisingly effective method for merging a number of event frames.

In a further possible implementation form of the first aspect the method further comprises encoding a plurality of subsequent combined event frames, either by collecting the combined event frames into a video sequence and encoding by employing any video coding standard, such as High Efficiency Video Coding, HEVC; or encoding each combined event frame separately as a raw image by employing any lossless image compression codec, such as Context-based Adaptive Lossless Image Codec, CALIC, or Free Lossless Image Format, FLIF. This enables further processing or storing of the combined event frames.

According to a second aspect, there is provided a method comprising receiving an event stream from an event camera; converting the event stream into a number of event frames by binning the spatial information and polarity information of detected events into event frames based on the respective timestamps of the events; and storing the spatial information and the polarity information from the number of event frames in separate data structures optimized for the respective type of information to be stored.

Storing event frame data in separate data structures optimized for the respective type of information to be stored provides an efficient method not only in terms of storage, but enables different encoding solutions to be used for further encoding the different types of information (spatial and polarity) extracted from the event frames. The proposed method provides an improved coding performance compared with state-of-the-art methods designed for lossless video coding and lossless image compression. Furthermore, the proposed method also provides an improved performance when encoding lossless all the asynchronous events received from a sensor.

In a possible implementation form of the second aspect storing the spatial information comprises merging spatial information from the number of event frames into a single Event Map Image stored as a set of image bit-planes, which provides an efficient data structure not only for storage but also for further encoding.

In a further possible implementation form of the second aspect the spatial information contained in an Event Map Image is stored using a combination of a Binary Map comprising a Binary Map symbol assigned to each pixel to signal the locations where at least one event has occurred; a category index representing the number of events, nrEv, that occurred; and an event frame index representing which individual event frame the event occurred at. This spatial event data representation ensures an efficient encoding method, since only those pixel positions are encoded, where at least one event is signaled in an event frame, i.e. the pixel positions with no events are ignored, thereby achieving savings both in data size and also in data processing time.

In a further possible implementation form of the second aspect the Binary Map is encoded using a template context model, wherein for each pixel of the Binary Map, first a causal neighborhood of a respective Binary Map symbol is determined and used to compute the context index. Then an optimal model order is determined by estimating the codelengths needed to encode the Binary Map using different numbers of neighbors collected in a specific order from the causal neighborhood; and the optimal model order is encoded and then used to encode the Binary Map traversed in a raster scan order, by encoding each Binary Map symbol based on its respective causal neighborhood, thereby providing a more efficient encoding method for the event frame spatial information than previously available using prior art encoding methods.

In an embodiment the optimal model order is determined using a model with a maximum order of m causal neighbors, wherein m is searched between the range of 1 to 18. In an example as shown in Fig. 6, m=18. It has been found by the inventors that these particular parameters provide a surprisingly effective method for determining optimal model order.

In a further possible implementation form of the second aspect the category index is represented using category index symbols in an alphabet, each category index symbol comprising a number p of bit-planes. Encoding the category index comprises encoding the category index symbols bit-plane-by-bit-plane, wherein the first bit-plane, denoted BPQ ¹ , is encoded in a raster scan, by encoding the first bit in the representation of each category index symbol using a template context model based on a context computed for each bit of the category index symbol determined from its respective causal neighborhood on the first bit-plane; and each subsequent bit-plane, denoted BP ¹ , is encoded using a template context model, wherein the respective context is determined using the respective causal neighborhood from the current bit-plane BP- ^:I, and a respective context template from its preceding at least one bit-plane BP l . This provides a more efficient encoding method for the event frame spatial information than previously available using prior art encoding methods.

In an embodiment the number p of bit-planes of the category index symbols is determined as p = log ₂(n)], where [ ] denotes the ceiling operator. In a further possible implementation form of the second aspect, for each subsequent bitplane BP ¹ having at least two preceding bit-planes, the respective context is determined using context templates from the preceding two bit-planes It has been found by the inventors that using exactly two preceding bit-planes provides a surprisingly effective method for encoding the category index.

In an embodiment, for each subsequent bit-plane, the respective causal neighborhood has an orderO length, wherein the context template from its preceding bit-plane BP- has an order! length. It has been found by the inventors that using these lengths provides a surprisingly effective method for encoding the category index.

In a further possible implementation form of the second aspect, the event frame index comprises event frame index symbols representing the individual event frames where any event occurred at a respective pixel; and each event frame index is encoded according to the category index of a respective pixel, wherein each category of the category index represents the total number of events, nrEv, detected at the respective pixel. The event frame index is encoded by first dividing an alphabet into a number n of sub-alphabets, and associating a sub-alphabet to each category; then remapping the event frame index symbols to remapped symbols of the corresponding sub-alphabets, based on the respective category index; and finally associating a category symbol based on the respective category index. Using sub-alphabets assigned to each category provides a very efficient method for event frame index encoding.

In an embodiment the alphabet comprises 2? possible symbols, where n is the number of event frames encoded. In an example where n=8, the alphabet comprises 256 symbols.

In a further possible implementation form of the second aspect the sub-alphabets comprise different numbers of symbols, and each sub-alphabet is associated to a category based on the number of symbols needed for remapping the event frame index symbols of a corresponding category. Using such sub-alphabets provides a very efficient method for event frame index encoding.

In a further possible implementation form of the second aspect each event frame index comprises Cn ^rEv possible symbol combinations, wherein n represents the number of event frames and nrEv corresponds to the category index representing the total number of events that occurred in the n number of event frames, and wherein each sub-alphabet therefore comprises all Cn ^rEv possible symbols. It has been found by the inventors that these particular parameters provide a surprisingly effective method for event frame index encoding.

In a further possible implementation form of the second aspect the remapped event frame index symbols and associated category symbols are merged into a number n of category vectors, the number n of category vectors corresponding to the number n of event frames, and wherein the first n - 1 category vectors are encoded using an adaptive Markov model, the last category vector being associated with deterministic cases wherein an event occurs in each event frame, thereby resulting in a surprisingly effective method for event frame index encoding.

In an embodiment the number of event frames is n=8, wherein the sub-alphabets comprise 8, 28, 56, or 70 symbols respectively, and wherein for determining the optimal order for the adaptive Markov model the maximum order is set to: 5 for the sub-alphabets with 8 symbols, 3 for the sub-alphabets with 28 symbols, and 2 for the sub-alphabets with 56 or 70 symbols. It has been found by the inventors that these particular parameters provide a surprisingly effective method for event frame index encoding.

In a further possible implementation form of the second aspect storing the polarity information from the number of event frames comprises merging polarity information from each event frame into a polarity vector comprising binary polarity symbols determined based on an increase or decrease in brightness during a detected event, thereby resulting in a surprisingly effective method for encoding polarity information from event frames.

In a further possible implementation form of the second aspect the polarity symbols from each polarity vector of the number of event frames are concatenated into a concatenated polarity vector, thereby resulting in a very efficient representation for polarity information from event frames.

In an embodiment the polarity symbols from each polarity vector are concatenated into a concatenated polarity vector on the condition of the total number of events associated with a respective Event Map Image being below a threshold event number TnrEv. In an example TnrEv = 150. This conditional concatenation of polarity information into a concatenated polarity vector ensures optimal efficiency of the encoding algorithm.

In a further possible implementation form of the second aspect polarity information from the number of event frames is stored in a number of respective polarity vectors, by traversing the spatial information of the event frames, either bit-plane-by-bit-plane or event frame-by-event frame, thereby ensuring optimal efficiency of the polarity data encoding.

In a further possible implementation form of the second aspect the number of polarity vectors are encoded in an order of the respective event frames EF _i=1^ _n received over n sub-periods 9 , wherein each PVt corresponds to an EFt and is encoded either as a vector using adaptive Markov modelling, or by traversing the Event Map Image using the Binary Map and encoding each polarity symbol in PVi using a template context model, wherein the respective context for a polarity symbol in PV _t is determined using the respective causal neighborhood from the current event frame EFt, and wherein at least one preceding event frame EF^ is used as a respective context template; and wherein for each event frame EFt having at least two preceding event frames, the respective context for a polarity symbol in PV _t is determined using context templates from the preceding two event frames EFt^ and EFI_ ₂. This results in a surprisingly effective method for encoding polarity information from event frames.

In an embodiment, for encoding each polarity symbol in each PV _t, the context index is computed based on three parameters: the optimal model order orderO respective to the causal neighborhood, and optimal model orders orderl, order2 wherein the respective context templates... In an embodiment for each event frame having at least two preceding event frames, the maximum model order search for orderO, orderl , and order2 are set as 7, 6, and 3, respectively. In an embodiment for an event frame having only one preceding event frame, the maximum model order search for orderO, orderl are 7 and 6 respectively, wherein order2 is not used. In an embodiment for each event frame having no preceding event frame, the maximum model order search for orderO is set to 10, wherein orderl and order2 are not used. It has been found by the inventors that using these model orders provides a surprisingly effective method for encoding the polarity information from event frames. In an embodiment where the polarity vector is encoded as a vector using adaptive Markov modelling, then (prderO, order!, order! = (0, 0, 0) is encoded to signal the selection of the adaptive Markov modelling method, thereby ensuring optimal efficiency of the polarity data encoding.

In a further possible implementation form of the second aspect the spatial information and polarity information from the number n of event frames is encoded using a Sparse Coding Mode, SCM, wherein if a total number of events N detected during the plurality of subperiods is below an event threshold, Sparse Coding Mode is activated and at least one of the spatial information or the polarity information is encoded using lower complexity encoding methods; otherwise Sparse Coding Mode is not activated. Using such a Sparse Coding Mode enables efficient encoding of sparse events in a sub-period 9.

In a further possible implementation form of the second aspect in response to a determination total number of events is below a first event threshold N < ET , SCM is activated to encode the spatial information as follows: one bit is always encoded to signal if SCM is on or off; if SCM is on then: N is encoded using log-^ET^ bits; and for each event e _t spatial information is encoded as follows: x _t using log ₂ EI} bits; y _t using log ₂(W} bits; and event frame index is encoded using log ₂(n} bits, wherein El is the event camera sensor 1 height, W is the event camera sensor 1 width. In an embodiment the first event threshold is in the range of 10 < ETi < 50 depending on the event camera 1 resolution W x El. In an example ET = 20. It has been found by the inventors that these particular parameters provide a surprisingly effective method for event frame spatial information encoding.

In a further possible implementation form of the second aspect in response to a determination total number of events is below a second event threshold N < ET ₂ SCM is activated to encode the polarity information as follows: the polarity symbols from each polarity vector of the number of event frames are concatenated into a single concatenated polarity vector; and the concatenated polarity vector is encoded using an 0 -order Markov model. In an embodiment the second event threshold is in the range of 100 < ET ₂ < 200, more preferably ET ₂ = 150. It has been found by the inventors that these particular parameters provide a surprisingly effective method for event frame polarity information encoding. In a further possible implementation form of the second aspect the number of event frames, n, is in the range of 1 to 32, depending on the length of the sub-periods 9. In an embodiment the length of the sub-periods 9 is short and the number of event frames is in the range of 16 to 32. In another embodiment the length of the sub-periods is longer and the number of event frames is in the range of 1 to 8. In an example the number of event frames is n = 8. It has been found by the inventors that these particular event frame ranges for these particular sub-period lengths provide a surprisingly effective method for event frame encoding.

According to a third aspect, there is provided a computer-based system comprising an event camera configured to record an event stream; and a processor coupled to a storage device and configured to convert the event stream into a number of event frames; wherein the storage device comprises instructions that, when executed by the processor, cause the computer-based system to merge the information from the number of event frames into a combined event frame by executing a method according to any one of the possible implementation forms of the first aspect. The resulting system is surprisingly effective for merging a number of event frames.

According to a fourth aspect, there is provided a computer-based system comprising an event camera configured to record an event stream; and a processor coupled to a storage device and configured to convert the event stream into a number of event frames; wherein the storage device comprises instructions that, when executed by the processor, cause the computer-based system to process the information from the number of event frames according to the method of any one of the possible implementation forms of the second aspect. The resulting system is surprisingly effective for event data processing from a plurality of event frames.

According to a fifth aspect, there is provided a non-transitory computer readable medium having stored thereon program instructions that, when executed by a processor, cause the processor to perform the method according to any one of the possible implementation forms of the first aspect. The resulting computer readable medium is surprisingly effective in use for merging a number of event frames.

According to a sixth aspect, there is provided a non-transitory computer readable medium having stored thereon program instructions that, when executed by a processor, cause the processor to perform the method according to any one of the possible implementation forms of the second aspect. The resulting computer readable medium is surprisingly effective for use in event data processing from a plurality of event frames.

These and other aspects will be apparent from the embodiment(s) described below.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following detailed portion of the present disclosure, the aspects, embodiments and implementations will be explained in more detail with reference to the example embodiments shown in the drawings, in which:

Fig. 1 illustrates receiving an event stream from an event camera in accordance with an example of the embodiments of the disclosure;

Fig. 2 illustrates converting an event stream into a plurality of event frames in accordance with an example of the embodiments of the disclosure;

Fig. 3 illustrates the steps of generating a combined event frame from a number of event frames in accordance with an example of the embodiments of the disclosure;

Fig. 4 illustrates the steps of extracting spatial and polarity information from multiple event frames and storing in separate data structures in accordance with another example of the embodiments of the disclosure;

Fig. 5 illustrates the steps of storing spatial information from an event map image in accordance with another example of the embodiments of the disclosure;

Fig. 6 shows a schematic illustration of determining for the current pixel the causal neighborhood and neighbor order in the current bit-plane (on the left) and the template context and neighbor order from subsequent bit-planes (on the right) in accordance with another example of the embodiments of the disclosure;

Fig. 7 shows a schematic illustration of encoding a polarity vector in accordance with another example of the embodiments of the disclosure; Fig. 8 shows a schematic illustration of encoding the event frame index based on the corresponding category index, in accordance with another example of the embodiments of the disclosure;

Fig. 9 illustrates an example of remapping event frame index symbols to remapped symbols for the first category index, where nr EV = 1, in accordance with another example of the embodiments of the disclosure;

Fig. 10 shows a flow chart of applying Sparse Coding Mode (SCM) for encoding spatial information and polarity information from event frames in accordance with another example of the embodiments of the disclosure;

Fig. 11 shows compression results over a training dataset obtained using lossless compression methods for three values of the length of the sub-periods in accordance with examples of the embodiments of the disclosure;

Fig. 12A shows relative compression results over a training dataset for the smallest possible length of sub-periods, 10 ^-6 s; and

Fig.12B shows event density results measured in Mega events per second for each event sequence in a training dataset for the smallest possible length of sub-periods, 10 ^-6 s, in accordance with examples of the embodiments of the disclosure.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant disclosure. However, it should be apparent to those skilled in the art that the present disclosure may be practiced without such details. In other instances, well known methods, procedures, systems, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present disclosure.

Fig. 1 illustrates receiving an event stream 3 from an event camera 1 in accordance with an example of the present disclosure. The event camera 1 may be installed on an autonomous vehicle and comprises a plurality of pixel sensors 2, each pixel sensor 2 being configured to independently detect an event 4 representative of a change in brightness in a captured scene. An event 4 in the event stream 3 comprises spatial information 5 of a pixel sensor 2 that detected the event 4, a timestamp 7 of the event 4, and polarity information 6 of the event 4 depending on a change in brightness detected by the pixel sensor 2. The event stream 3 in this example is received over a first time period, which is then divided into a plurality of sub-periods 9. The length of these sub-periods 9 may be selected according to the desired frequency of data for the application, and may even be as short as a length of 10 ^-6 s, which is equivalent to achieving a frame rate of up to 10 ⁶ frames per second (fps).

The number of event frames 8 for example may be in the range of 1 to 32, depending on the length of the sub-periods 9.

In an example where the length of the sub-periods 9 is relatively short, e.g., 10 ^-6 s, the number of event frames 8 is in the range of 16 to 32. In another example where the length of the sub-periods 9 is longer, the number of event frames 8 is in the range of 1 to 8. In a particular example the number of event frames 8 is n=8.

In applications, the input data is sometimes preferred as an event stream 3 which collects a spatio-temporal neighbourhood of a time interval, and it is then processed to produce an output. However, in general, the event data is usually consumed as an image, where the sequence of asynchronous events is divided into spatio-temporal neighborhoods (i.e., time-volumes) of a sub-period 9.

Fig. 2 illustrates converting such an event stream 3 received over a first time period into event frames 8 by dividing the first time period into a plurality of sub-periods 9, as mentioned above. Each time-volume of size W x H x A (where H is the event camera sensor height, W is the event camera sensor width, and A is the length of a sub-period 9) generates an event frame (EF) 8. The events 4 that occurred at the same pixel position (x, y) are accumulated by first summing the event-pixel polarity information 6 and then setting the pixel-polarity as the sum’s sign, i.e. assigning an event frame symbol 10 to each pixel of the event frame 8, resulting in the final event frame 8 on the right. The raw event frames 8 in one example contain 3 symbols which can be represented using 2 bits per pixel. However, such representation is inefficient. A proposed solution to improve the representation of event data is to merge the information found in multiple event frames 8 into a single combined event frame 11.

Fig. 3 illustrates generating such a combined event frame 11 by merging a number of event frames 8 in accordance with an example of the present disclosure. Merging the event frames 8 is done by converting all the event frame symbols 10 assigned to a corresponding pixel of the event frames 8 into a combined event frame symbol 12 of the corresponding pixel of the event frame 11 .

The event frame symbols 10 are merged into combined event frame symbols 12 according to the following formula:

CEF = k ⁿ~ ¹EF _i-n-1 + ■■■ + k ²EFi_ ₂ + k ¹EF _i-1 + EFi, wherein CEF represents the combined event frame symbol 12, EF represents the event frame symbols 10, k is the number of event frame symbols 10, and n is the number of event frames 8 to be merged into the combined event frame 11.

In an example, the number of event frame symbols 10 is k=3, for example {0, 1, 2}, representing the polarity information 6 as one of either:

- a positive polarity event 4, wherein an increase in brightness is detected at a respective pixel of the event frame 8;

- a negative polarity event 4, wherein a decrease in brightness is detected at a respective pixel of the event frame 8; or

- no event 4 detected at a respective pixel of the event frame 8 during a respective subperiod 9.

The number of possible combined event frame symbols 12 in this example is 3 ⁿ, wherein n is the number of event frames 8 to be merged into a combined event frame 11 . In an example, as illustrated in Fig. 3, the number of event frames 8 to be merged into a combined event frame 11 is n=5, and wherein the event frame symbols 10 are merged into combined event frame symbols 12 according to the following formula:

CEF = 3 ⁴EFi_ ₄ + 3 ³FF;_ ₃ + 3 ²FF _(-2 + 3 ¹EF _i-1 + EF _t.

However, the number n of event frames 8 can be different from 5. For example, the number n of event frames 8 can be smaller than 5, but it will result in inefficient encoding. On the other hand, if the number n of event frames 8 is larger than 5, then the combined event frames 11 will use more than 8 bits to represent the combined event frame symbols 12, which is also undesirable since the image and video codecs are designed to usually accept as input an 8-bit data representation.

Finally, as illustrated in Fig. 3, the combined event frames 11 can be further encoded by collecting the combined event frames 11 into a video sequence 13 and encoding by employing any video coding standard, such as the High Efficiency Video Coding (HEVC) codec. According to an alternative example, each combined event frame 11 can be encoded separately as a single raw image 14 by employing any lossless image compression codec, such as the Context-based Adaptive Lossless Image Codec (CALIC), or the Free Lossless Image Format (FLIF) codec.

Fig. 4 illustrates the steps of extracting spatial 5 and polarity 6 information from multiple event frames 8 and storing in separate data structures in accordance with another example of the present disclosure.

In this example, the initial step of receiving an event stream 3 from an event camera 1 and converting the event stream 3 received over a first time period into a number of event frames 8 corresponding to selected sub-periods 9 is executed as described before, i.e. the spatial information 5 and polarity information 6 of detected events 4 are binned into event frames 8 based on the respective timestamps 7 of the events 4. After the event frames 8 are generated, spatial information 5 and polarity information 6 is stored in separate data structures as will be explained below.

In this example, storing the spatial information 5 comprises merging spatial information 5 from the number of event frames 8 into a single Event Map Image 15 stored as a set of image bit-planes 25; while storing the polarity information 6 comprises merging polarity information 6 from each event frame 8 into a polarity vector 16 comprising binary polarity symbols 17 determined based on an increase or decrease in brightness during a detected event 4.

The polarity vectors 16 may be generated by traversing the spatial information 5 of the event frames 8, either bit-plane-by-bit-plane or event frame-by-event frame.

As illustrated in Fig. 4, the polarity symbols 17 from each polarity vector 16 may then further be concatenated into a concatenated polarity vector 32. This concatenation into a concatenated polarity vector 32 may be dependent on the condition of the total number of events associated with a respective Event Map Image 15 being below a threshold event number, such as the second event threshold ET ₂ as will be explained below in more detail in connection with the Sparse Coding Mode. The threshold event number is calculated based on a total number of events in all event frames 8 that are merged. In an example, as will be explained later, the second event threshold is ET ₂ = 150, i.e. below 150 events the polarity symbols 17 from polarity vectors 16 are concatenated into the concatenated polarity vector 32, which is encoded by activating SCM 108 for polarity information.

Fig. 5 shows the steps of storing spatial information 5 in accordance with an example of the present disclosure. As illustrated, the spatial information 5 contained in an Event Map Image 15 is stored using a combination of a Binary Map 18, a category index 20, and an event frame index 23.

The Binary Map 18 comprises Binary Map symbols 19 assigned to each pixel of the Binary Map 18 to signal the locations where at least one event 4 has occurred in corresponding pixel of any of the number of event frames 8.

The category index 20 represents the number of events 4, nrEv, that occurred at a respective pixel of the Binary Map 18 signaled by a Binary Map symbol 19. The category index 20 is represented using category index symbols 21 , each category index symbol 21 comprising a number p of bit-planes.

In an example the number p of bit-planes of the category index symbols 21 is determined as p = log ₂n], where [ ] denotes the ceiling operator. For example, if there are n = 8 event frames, then p = 3 bit-planes are used to represent the category index symbols 21 since log ₂8] = 3. If more event frames 8 are used, then more than 3 bit-planes are used to represent the number of events.

Finally, the event frame index 23 represents the individual event frames 8 in the number n of event frames 8 where an event 4 occurred at a respective pixel of the Binary Map 18 that is signaled by a Binary Map symbol 19, i.e. the positions of these individual event frames 8 in the event frame stack. The event frame index 23 may comprise event frame index symbols 24 representing the individual event frames 8 in the number of event frames 8 where any event 4 occurred.

In the illustrated example, the number of events 4 that occurred at a respective pixel of the Binary Map 18 is nrEv = 2, which is represented in the event frame index 23 as number [00000101 ₂] in binary (base 2), thus indicating that the events occurred on the first and third event frames 8 EFi and EF3.

Figs. 6-9 illustrate the compression methods used for encoding the spatial information 5 and polarity information 6 which are stored separately, as explained before.

For encoding different types of spatial and polarity information, novel encoding methods have been developed, which will be explained below.

The Binary Map 18 is encoded using a so-called template context model, wherein for each pixel of the Binary Map 18, a causal neighborhood 26 of a respective Binary Map symbol 19 is determined. The left side of FIG. 6 illustrates an example of a causal neighborhood and neighbor order for BP representing the current bit-plane 25 when encoding the polarity information. The Binary Map 18 contains Binary Map symbols 19 in the alphabet {0, 1}, which can be represented using a single bit-plane. So, the Binary Map 18 only has one bit-plane 25 and thus it is using the causal neighborhood 26 where not all the pixels were decoded. The causal neighborhood 26 and the found optimal model order, orderO, are then used to compute the context index of the binary distribution model that is used to encode the current Binary Map symbol 19.

In particular, an optimal model order, orderO, is determined by estimating the codelengths needed to encode the Binary Map 18, using different numbers of neighbors collected in the order of the causal neighborhood 26. Finally, the found optimal model order is encoded and then used to encode the Binary Map 18 traversed in a raster scan order, by encoding each Binary Map symbol 19 based on the context computed using the found optimal order number of neighbors in the current pixel causal neighborhood 26.

In an example the optimal model order is determined using a model with a maximum order of m causal neighbors, wherein m is searched between the range of 1 to 18. In an example as shown in Fig. 6, m=18, i.e. 18 causal neighbors are used to compute the context index of the binary distribution model used to encode the current Binary Map symbol 19.

As described before, the category index 20 is represented using category index symbols 21 , wherein each category index symbol 21 comprises a number p of bit-planes 25. The category index 20 represents the number of events 4, the nrEv, also known as the number of ' 1's in the binary representation of the symbols in the Event Map Image 15, which is represented using p-bit symbols in the alphabet {0, 1, ..., 2 ^P - 1}, which are encoded bit- plane-by-bit-plane.

The first bit-plane 25 of category index 20, denoted BPQ ¹ , is encoded in a raster scan, similarly as the Binary Map 18, by encoding the first bit in the representation of each category index symbol 21 using a template context model based as explained above, on a context computed for each bit of the category index symbol 21 determined from its respective causal neighborhood 26 on the first bit-plane 25 of the category index 20, BPQ ¹.

Following this, each subsequent bit-plane 25, denoted BP ‘, 1 = 1, 2, 1, is encoded by determining the respective context using the respective causal neighborhood 26 from the current bit-plane BP ¹, and a respective context template 27 from a preceding at least one bit-plane BP l , as illustrated in Fig. 6 for the current bit-plane BP _t and preceding bitplane BPt^ for polarity information 16 encoding bit-plane-by-bit-plane.

For each subsequent bit-plane 25 BP ¹ having at least two preceding bit-planes 25, the respective context can be determined using context templates 27 from the preceding two bit-planes 25 BP^ and BPil ₂. This is similar to the polarity information encoding illustrated in Fig. 7, which is explained below. In an example, for each subsequent bit-plane 25, the respective causal neighborhood 26 has an orderO length, wherein the context template 27 from its preceding bit-plane BP^ has an orderl length.

The second bit-pane, denoted BP^' , is thus encoded in a way where the context is formed using: the causal neighborhood 26 from BP^' having an orderl length, and the context template 27 from BP^ ¹ having an orderO length. Similarly, the third bit-pane, denoted BP^' is encoded using this method, where the context is formed using the causal neighborhood 26 from BP ' , and the context template 27 from BP^ ¹ and BP ¹.

Fig. 7 illustrates encoding the number of polarity vectors 16 in an order of the respective event frames 8 received over n sub-periods 9, EF _l=1^ _n, wherein each polarity vector 16, PV corresponds to an EF _t. The polarity vectors 16, as explained before, comprise binary polarity symbols 17, and can be encoded as a vector using adaptive Markov modelling, for example using a maximum order of 14.

Alternatively, the polarity vectors 16 can also be encoded by traversing the Event Map Image 15 using the Binary Map 18 and encoding each polarity symbol 17 using a template context model as shown in Figs. 6 and 7, wherein the respective context for a PV _t is determined using the respective causal neighborhood 26 from the current event frame EFt, and wherein at least one preceding event frame EF^ is used as a respective context template 27; and wherein for each event frame EFt having at least two preceding event frames 8, the respective context for a PVi is determined using context templates 27 from the preceding two event frames EF^ and EF^-

Consequently, in the case of encoding polarity vectors 16 the causal neighborhood 26 will contain polarity information 6, in contrast to encoding the Event Map Image 15 where the causal neighborhood 26 will contain spatial information 5. Although each type of information is using a causal neighborhood 26, each neighbor is represented by the corresponding type of information (spatial or polarity).

For encoding each PVi, the respective causal neighborhood 26 may have an optimal model order orderO, wherein the respective context templates 27 may have respective optimal model orders orderl and order2. In an example for each event frame 8 having at least two preceding event frames 8, the maximum model order for orderO, orderl, and order2 are set as 7, 6, and 3, respectively. For an event frame 8 having only one preceding event frame 8, the maximum model order for orderO and orderl may be 7 and 6 respectively, in which case order2 is not used. For each event frame 8 having no preceding event frame 8, the maximum model order for orderO may be set to 10, wherein orderl and order2 are not used.

In an example where the polarity vector 16 is encoded as a vector using adaptive Markov modelling, then orderO, orderl, order2) = (0, 0, 0) is encoded to signal the selection of the adaptive Markov modelling method.

Fig. 8 shows a schematic illustration of the encoding of the event frame index 23 in accordance with an example of the present disclosure. As explained before, the event frame index 23 comprises event frame index symbols 24 representing the individual event frames 8 in the number of event frames 8 where any event 4 occurred.

Encoding of each event frame index 23 depends on the category index 20 of a respective pixel, wherein each category of the category index 20 represents the total number of events 4, nrEv, detected at the respective pixel during the number of event frames 8. To do this, as shown in Fig. 8, an alphabet 28 of 2 ⁿ symbols is divided into a number n of sub-alphabets 29, wherein n is the number of event frames 8 encoded. Then a subalphabet 29 is associated to each category, and a number of C ^rEV event frame index symbols 24 are remapped into C ^rEV remapped symbols 30 of the corresponding subalphabets 29, based on nrEV the respective category index 20, as shown in Fig. 8. Finally, a category symbol 22 is associated based on the nrEV index of the respective category index 20, where a category index 22 is represented using the binary representation of nrEV - 1 using p bit-planes.

In an example the alphabet 28 comprises 2 ⁿ possible symbols, where n is the number of event frames encoded. In the illustrated example as shown in Fig. 8 where n=8, the alphabet 28 comprises 256 symbols. The sub-alphabets 29 thus comprises different numbers of symbols, as shown, each sub-alphabet 29 being associated to a category based on the number of symbols needed for remapping the event frame index symbols 24 of a corresponding category. As shown in Fig. 8, each event frame index 23 may comprise Cn ^rEv possible symbol combinations, wherein n represents the number of event frames 8 and nrEv corresponds to the category index 20 representing the total number of events 4 that occurred in the n number of event frames 8. Each sub-alphabet 29 therefore comprises all Cn ^rEv possible symbols, as will be explained below in more detail.

In the illustrated example the number of event frames 8 is n=8. In effect, what this means is that eight categories are found in the Event Map Image 5 as eight event frames (EF) 8 are stored, i.e. , 1, 2, ..., 8 events may occur at the current pixel position. The EF index 23 is encoded using an alphabet associated to each category, whereby the first category (where nrEv = 1) signals that one event 4 occurs in the EF stack, therefore, the symbols {1, 2 ¹, 2 ², ..., 2 ⁷} in the alphabet 28 are remapped into an Cn ^rEV = C ₈ = 8-symbol alphabet ({0, 1, 2, ..., 7}), and a category symbol computed as nrEv - l ₁₀ = b _p-1b _p-2 using p = [log ₂ n] = 3 bit-planes, i.e., 0 _lo = 000 ₂ is associated. The second category (where nrEv = 2) signals that two events 4 occurs in the EF stack, therefore, the symbols in the alphabet 28 which contains in their binary representation 2 bits set as 'T (e.g., {3, 5, ..., 192}) are remapped into an C ₈ = 28-symbol alphabet, and category symbol l ₁₀ = 001 ₂ is associated. The third category (where nrEv = 3) signals that three events 4 occurs in the EF stack, therefore, the symbols in the alphabet 28 which contains in their binary representation 3 bits set as ' T (e.g., {7, 11, ... , 224}) are remapped into an C ₈ ³ = 56 -symbol alphabet, and category symbol 2 ₁₀ = 010 ₂ is associated. Following this approach, the 4 ^th, 5 ^th, 6 ^th, and 7 ^th category remaps the symbols in the alphabet 28 into an C ₈ = 70, C ₈ = 56, C ₈ = 28, and, respectively, C ₈ = 8 -symbol alphabet, and a category symbol 3 ₁₀ = 011 ₂, 4 ₁₀ = 100 ₂, 5 ₁₀ = 101 ₂, and, respectively, 6 ₁₀ = 110 ₂, is associated. Finally, the 8 ^th category signals that an event occurs in each EF, which is a deterministic case as all bits are set 1 ; symbol 7 ₁₀ = 111 ₂ is associated.

According to another example (not shown), if we would select for example n=10 event frames 8 to encode, then the alphabet 28 would comprise 2 ¹⁰ = 1024 symbols, with a total number of events 4 nrEV = 10 at maximum, thus we would need p=4 bit-planes to encode the category index 20, and there would be Ci ₀, C ² ₀, and so on symbols for each subalphabet 29. To summarize, the number of bit-planes that are encoded in the method include 1 bit-pane for the Binary Map 18; p = log ₂n] bit-panes for the category index 20; and n bit-planes for the polarity information, i.e. in total: 1 + log ₂n] + n bit-planes, wherein if n=8 this means 1+3+8=12 bit-planes, whereas for example in case of n=4 it would mean 1 +4+2=7 bit-planes in total.).

Fig. 9 illustrates the step of remapping for the first category of the event frame index symbols 24 to remapped symbols 30 in accordance with an example of the present disclosure.

The remapped symbols 30 and associated category symbols 22 are then merged into a number n of category vectors 31 , the number n of category vectors 31 corresponding to the number n of event frames 8. The first n-1 category vectors 31 can be encoded using an adaptive Markov model, whereby the last category vector 31 is associated with deterministic cases wherein an event 4 occurs in each event frame 8. For determining the optimal order for the adaptive Markov model, the maximum order search is set to: 5 for the sub-alphabets 29 with 8 symbols, 3 for the sub-alphabets 29 with 28 symbols, and 2 for the sub-alphabets 29 with 56 or 70 symbols. The experiments show that a large percentage of symbols are collected by the first 2-3 categories, which proves the efficiency of the proposed method.

Fig. 10 shows a flow chart of applying Sparse Coding Mode (SCM) for encoding spatial information 5 and polarity information 6 from event frames 8 in accordance with another example of the present disclosure. When the selected length of the sub-periods 9 (A) is very small, e.g., A = 10 ^-6 s, only a few events 4 occur in each event frame 8. In such case, the above proposed algorithms might be too complex to be employed and have a higher bitrate. Hence, the SCM is activated depending on N, the number of events in the Event Map Image 15.

In particular, if a total number /V of events 4 detected during the plurality of sub-periods 9 is below an event threshold, Sparse Coding Mode is activated and at least one of the spatial information 5 or the polarity information 6 is encoded using lower complexity encoding methods. Otherwise, i.e. if a total number N of events 4 detected during the plurality of sub-periods 9 is above an event threshold Sparse Coding Mode is not activated and at least one of the spatial information 5 or the polarity information 6 is encoded according to the methods described before.

The initial steps 101-103 are similar or identical to the steps described before, i.e. in step 101 an event stream 3 is received from an event camera 1 ; in step 102 the event stream 3 is converted into a number of event frames 8; and in step 103 the spatial information 5 and the polarity information 6 from the event frames 8 are extracted and stored in separate data structures.

Following this, in response to a determination that the total number of events 4 is below a first event threshold, N < ET , Sparse Coding Mode is activated in step 104 to encode the spatial information 5. During this encoding one bit is always used to signal if Sparse Coding Mode is on or off; if Sparse Coding Mode is on, then in step 105 ? is encoded using log-^ET^ bits. Finally, in step 106, for each event 4 spatial information 5 is encoded as follows: x _t using log ₂ EP) bits; using log ₂(W) bits; and event frame index 23 is encoded in step 107 using log ₂(n) bits, wherein El is the event camera sensor 1 height, W is the event camera sensor 1 width.

The first event threshold may be in the range of 10 < ETi < 50 depending on the event camera resolution W x El. In an example ETi = 20.

The polarity information 6 encoding is also dependent on the Sparse Coding Mode activation. In particular, in a parallel step 108, in response to a determination total number of events 4 is below a second event threshold N < ET ₂ , SCM is activated to encode the polarity information 6. In step 109, the polarity symbols 17 from each polarity vector 16 are concatenated into a single concatenated polarity vector 32 (as described before); and in step 110 the concatenated polarity vector 32 is encoded using a 0-order Markov mode.

The second event threshold may be in the range of 100 < ET2 < 200, more preferably ET2 = 150.

Figs. 11 and 12 show lossless compression results over a training dataset obtained using different codecs in accordance with examples of the present disclosure.

The experimental evaluation of the encoding methods was performed using the ETH_T raining dataset of 82 event sequences, having a 640 x 480 event camera resolution, as disclosed in the publication by M. Gehrig, W. Aarents, D. Gehrig and D. Scaramuzza “DSEC: A Stereo Event Camera Dataset for Driving Scenarios,” [IEEE Trans. Robot. Autom., vol. 6, no. 3, pp. 4947-4954, Jul. 2021.]. The proposed encoding methods as explained above were implemented in C and encodes lossless a sequence of 8 event frames (EFs). Four frame rates are studied: (7 A = 5.555 ms (180 fps); (ii) = 1 ms (1,000 fps); (Hi) A = 0.1 ms (10,000 fps); and (iv) A = 10 ^-6 s (1,000,000 fps), i.e., all events acquired by the sensor are collected by the EFs. For A = 5.555 ms, 1 ms, 0.1 ms, the Raw data size is reported using a representation of 2 bits per EF pixel. For A = 10 ^-6 s, the Raw data size is reported using a representation of 8 Bytes (B) per event as in the sensor specification. For A = 5.555 ms, the full event sequence is encoded, while for A = 1 ms, 0.1 ms, 10 ⁶ s, the first 20 s, 2 s, and, respectively, 20 ms, are encoded from each event sequence.

The proposed method performance is compared with the following state-of-the-art- methods which can be applied for lossless encoding of events represented as combined event frames 11 : (1) HEVC standard using the FFmpeg implementation; (2) Context Adaptive Lossless Image Codec (CALIC) codec; and (3) Free Lossless Image Format (FLIF) codec. The lossless compression results are compared using: (a) the Compression Ratio (CR), defined as the ratio between compressed size and Raw data size, and (b) the Relative Compression (RC), defined as the ration between compressed size and Proposed method size.

Fig. 11 shows compression results over the ETH_T raining training dataset. Relative Compression (RC) results are shown for different sub-period 9 lengths in the following sub-figures: (a) A = 5.555 ms; (b) A = 1 ms; and (c) A = 0.1 ms. Compression Ratio (CR) results are shown for different sub-period 9 lengths in the following sub-figures: (d) A = 5.555 ms; (e) A = 1 ms; and (f) A = 0.1 ms. The proposed method provides an average improvement in performance of 20.66% compared with FLIF, and 70.68% compared with HEVC. When A = 0.1 ms, the proposed representation is up to 274 times more efficient than the asynchronous raw data representation.

Fig. 12A shows the lossless compression results over the ETH_T raining dataset for a subperiod 9 length of A = 10 ^-6 s. Fig. 12B shows the event density results measured in Mega events per second (Mev/s) for each event sequence in the ETH_T raining dataset for the same sub-period 9 length of A = 10 ^-6 s. The results show that the proposed representation is up to 5.8 times more efficient than the sensor representation. The proposed method's performance is correlated with the event density in each sequence as in ETH_T raining a larger number of Mega events per second (Mev/s) is captured using a car that moves with a high speed.

In conclusion the disclosed methods propose an efficient context-based lossless image codec for encoding event data, and a more efficient way to store the event data. The proposed encoding methods employ an approach which encodes the positions where at least one event occurs, followed by the EF index in the EF stack. The experimental evaluation using four frame rates shows an improved average performance of up to 20.66% compared with FLIF, and 70.68% compared with HEVC. When all events are collected by event frames, the proposed representation is up to 5.8 more efficient than the sensor camera event representation.

Although not shown in the figures, the present disclosure also extends to a computer- based system comprising an event camera 1 as described in the examples before, configured to record an event stream 3; and at least a processor coupled to a storage device and configured to convert the event stream 3 into a number of event frames 8 as described before. This storage device may further comprise instructions that, when executed by the processor, cause the computer-based system to either merge the information from the number of event frames 8 into a combined event frame 11 , or process the information from the number of event frames 8, by executing a method according to one of the above examples.

The processor accordingly may process images and/or data relating to one or more functions described in the present disclosure. In some embodiments, the processor may include a central processing unit (CPU), an application-specific integrated circuit (ASIC), an application-specific instruction-set processor (ASIP), a graphics processing unit (GPU), a physics processing unit (PPU), a digital signal processor (DSP), a field programmable gate array (FPGA), a programmable logic device (PLD), a controller, a microcontroller unit, a reduced instruction-set computer (RISC), a microprocessor, or the like, or any combination thereof. In some embodiments the processor may be further configured to control a display device of the computer-based system (not shown) to display any of the raw or encoded data received from the event camera 1. The display may include a liquid crystal display (LCD), a light emitting diode (LED)-based display, a flat panel display or curved screen, a rollable or flexible display panel, a cathode ray tube (CRT), or a combination thereof.

In some embodiments the processor may be further configured to control an input device (not shown) for receiving a user input. An input device may be a keyboard, a touch screen, a mouse, a remote controller, a wearable device, or the like, or a combination thereof. The input device may include a keyboard, a touch screen (e.g., with haptics or tactile feedback, etc.), a speech input, an eye tracking input, a brain monitoring system, or any other comparable input mechanism. The input information received through the input device may be communicated to the processor for further processing. Another type of the input device may include a cursor control device, such as a mouse, a trackball, or cursor direction keys to communicate direction information and command selections to, for example, the processor and to control cursor movement on a display device.

The storage device can be configured to store data directly obtained from the event camera 1 and any further camera; and/or processed data from the processor. In some embodiments, the storage device may store images received from the respective cameras and/or processed images received from the processor with different formats including, for example, bmp, jpg, png, tiff, gif, pcx, tga, exif, fpx, svg, psd, cdr, ped, dxf, ufo, eps, ai, raw, WMF, or the like, or any combination thereof. In some embodiments, the storage device may store algorithms to be applied in the processor, such as an encoding algorithm as described in any of the examples above. In some embodiments, the storage device may include a mass storage, a removable storage, a volatile read-and-write memory, a readonly memory (ROM), or the like, or any combination thereof. Exemplary mass storage may include a magnetic disk, an optical disk, a solid-state drive, etc.

Although also not illustrated explicitly, the storage device and the processor can also be implemented as part of a server that is in data connection with the event camera 1 or a client device (such as an autonomous vehicle) that is connected to the event camera 1 through a network, wherein the client device can send event streams over the network to the server. The server may then run the encoding algorithms as explained before. In some embodiments, the network may be any type of a wired or wireless network, or a combination thereof. Merely by way of example, the network may include a cable network, a wire line network, an optical fiber network, a telecommunication network, an intranet, an Internet, a local area network (LAN), a wide area network (WAN), a wireless local area network (WLAN), a metropolitan area network (MAN), a wide area network (WAN), a public telephone switched network (PSTN), a Bluetooth network, a ZigBee network, a near field communication (NFC) network, or the like, or any combination thereof.

The various aspects and implementations have been described in conjunction with various embodiments herein. However, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject-matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measured cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.

The reference signs used in the claims shall not be construed as limiting the scope.

Previous Patent: SYSTEMS FOR DETERMINING VEHICLE SPEED OVER GROUND

Next Patent: BEAMFORMING CONTROL