Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SIGNAL CODING BASED ON INTERPOLATION BETWEEN KEYFRAMES
Document Type and Number:
WIPO Patent Application WO/2023/217677
Kind Code:
A1
Abstract:
An input signal such as a haptic signal is decomposed in multiple frequency bands. A low frequency band of the signal is encoded by information comprising a set of keyframes and a type of interpolation chosen amongst a set of comprising linear, cubic, quadratic, cubic Bezier, quadratic Bezier, BSpline, Nurbs or Akima interpolations. The information allows a reconstruction of such input signal. The application to haptic signal is described. Encoding and decoding methods as well as encoding and decoding apparatuses are described.

Inventors:
GALVANE QUENTIN (FR)
GUILLOTEL PHILIPPE (FR)
LE CARPENTIER JEAN-MAXIME (FR)
Application Number:
PCT/EP2023/062044
Publication Date:
November 16, 2023
Filing Date:
May 05, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INTERDIGITAL CE PATENT HOLDINGS SAS (FR)
International Classes:
G06F3/01
Foreign References:
US20170090577A12017-03-30
US20180232194A12018-08-16
Other References:
QUENTIN GALVANE ET AL: "[Haptics] Considerations for a WD on Haptics", no. m59357, 29 April 2022 (2022-04-29), XP030301552, Retrieved from the Internet [retrieved on 20220429]
Attorney, Agent or Firm:
INTERDIGITAL (FR)
Download PDF:
Claims:
CLAIMS

1. A method for encoding a signal, comprising

- obtaining a set of keyframes, a keyframe being characterized by an amplitude and a temporal reference,

- obtaining a type of interpolation selected amongst a list of interpolation methods, and

- encoding information representative of the signal comprising the set of keyframes and the type of interpolation.

2. The method of claim 1, wherein the type of interpolation is selected amongst linear, cubic, quadratic, Bezier cubic, Bezier quadratic, BSpline, Nurbs or Akima.

3. The method of any of claims 1 or 2 wherein the signal is a haptic signal, and the information representative of the haptic signal is associated with information representative of a haptic effect.

4. The method of any of claims 1 to 3, wherein the haptic signal comprises multiple frequency bands and the information representative of the haptic signal is defined for one of the frequency bands.

5. The method of any of claims 1 to 4, wherein the set of keyframes and type of interpolation are determined using a graphical user interface.

6. The method of any of claims 1 to 5, wherein the set of keyframes and type of interpolation are determined according to a curve fitting algorithm applied on an obtained input signal.

7. A method for decoding a signal, comprising

- obtaining information representative of a signal, the information comprising a set of keyframes and a type of interpolation, a keyframe being characterized by an amplitude and a temporal reference,

- determining a signal based on the set of keyframes interpolated according to the type of interpolation, and

- providing the signal. 8. The method of claim 7, wherein the type of interpolation is selected amongst linear, cubic, quadratic, Bezier cubic, Bezier quadratic, BSpline, Nurbs or Akima.

9. The method of any of claims 7 or 8 wherein the signal is a haptic signal, and the information representative of the haptic signal is associated with information representative of a haptic effect.

10. The method of any of claims 7 to 9, wherein the haptic signal is comprising multiple frequency bands and the information is defined for one of the frequency bands.

11. A device for encoding a signal comprising a processor configured to:

- obtain a set of keyframes, a keyframe being characterized by an amplitude and a temporal reference,

- obtain a type of interpolation selected amongst a list of interpolation methods, and

- encode information representative of the signal comprising the set of key frames and the type of interpolation.

12. The device of claim 11, wherein the type of interpolation is selected amongst linear, cubic, quadratic, Bezier cubic, Bezier quadratic, BSpline, Nurbs or Akima.

13. The device of any of claims 11 or 12 wherein the signal is a haptic signal, and the information representative of the haptic signal is associated with information representative of a haptic effect.

14. The device of any of claims 11 to 13, wherein the haptic signal is comprising multiple frequency bands and the information representative of the haptic signal is defined for one of the frequency bands.

15. The device of any of claims 11 to 14, wherein the set of keyframes and type of interpolation are determined using a graphical user interface.

16. The device of any of claims 11 to 14, wherein the set of keyframes and type of interpolation are determined according to a curve fitting algorithm applied on an obtained input signal.

17. A device for decoding a signal comprising a processor configured to: - obtain information representative of a signal, the information comprising a set of keyframes and a type of interpolation, a keyframe being characterized by an amplitude and a temporal reference,

- determine a signal based on the set of keyframes interpolated according to the type of interpolation, and

- provide the signal.

18. The device of claim 17, wherein the type of interpolation is selected amongst linear, cubic, quadratic, Bezier cubic, Bezier quadratic, BSpline, Nurbs or Akima.

19. The device of any of claims 17 or 18 wherein the signal is a haptic signal, and the information representative of the haptic signal is associated with information representative of a haptic effect.

20. The device of any of claims 17 to 19, wherein the haptic signal is comprising multiple frequency bands and the information is defined for one of the frequency bands.

21. A non-transitory computer readable medium comprising information representative of the haptic effect generated according to any of the method of claims 1 to 5 when executed by a processor.

22. A computer program comprising program code instructions for implementing the method according to any of claims 1 to 10 when executed by a processor.

23. A non-transitory computer readable medium comprising program code instructions for implementing the method according to any of claims 1 to 10 when executed by a processor.

Description:
SIGNAL CODING BASED ON INTERPOLATION BETWEEN KEYFRAMES

TECHNICAL FIELD

At least one of the present embodiments generally relates to signal encoding and more particularly to a method and device for accurately coding a signal based on interpolation between keyframes. Embodiments are describing the application to haptic signals.

BACKGROUND

Fully immersive user experiences are proposed to users through immersive systems based on feedback and interactions. The interaction may use conventional ways of control that fulfill the need of the users. Current visual and auditory feedback provide satisfying levels of realistic immersion. Additional feedback can be provided by haptic effects that allow a human user to perceive a virtual environment with his senses and thus get a better experience of the full immersion with improved realism. However, haptics is still one area of potential progress to improve the overall user experience in an immersive system.

Conventionally, an immersive system may comprise a 3D scene representing a virtual environment with virtual objects localized within the 3D scene. To improve the user interaction with the elements of the virtual environment, haptic feedback may be used through stimulation of haptic actuators. Such interaction is based on the notion of “haptic objects” that correspond to physical phenomena to be transmitted to the user. In the context of an immersive scene, a haptic object allows to provide a haptic effect by defining the stimulation of appropriate haptic actuators to mimic the physical phenomenon on the haptic rendering device. Different types of haptic actuators allow to restitute different types of haptic feedbacks.

An example of a haptic object is an explosion. An explosion can be rendered though vibrations and heat, thus combining different haptic effects on the user to improve the realism. An immersive scene typically comprises multiple haptic objects, for example using a first haptic object related to a global effect and a second haptic object related to a local effect.

The principles described herein apply to any immersive environment using haptics such as augmented reality, virtual reality, mixed reality, or haptics-enhanced video (or omnidirectional/360° video) rendering, for example, and more generally apply to any haptics- based user experience. A scene for such examples of immersive environments is thus considered an immersive scene.

Haptics refers to sense of touch and includes two dimensions, tactile and kinesthetic. The first relates to tactile sensations such as friction, roughness, hardness, temperature and is felt through the mechanoreceptors of the skin (Merkel cell, Ruffini ending, Meissner corpuscle, Pacinian corpuscle). The second is linked to the sensation of force/torque, position, motion/velocity provided by the muscles, tendons and the mechanoreceptors in the joints. Haptics is also involved in the perception of self-motion since it contributes to the proprioceptive system (i.e., perception of one’s own body). Thus, the perception of acceleration, speed or any body model could be assimilated as a haptic effect. The frequency range is about 0 to 1 kHz depending on the type of modality. Most existing devices able to render haptic signals generate vibrations. Examples of such haptic actuators are linear resonant actuator (LRA), eccentric rotating mass (ERM), and voice-coil linear motor. These actuators may be integrated into haptic rendering devices such as haptic suits but also smartphones or game controllers.

To encode haptic signals, several formats have been defined related to either a high- level description using XML-like formats (for example MPEG-V), parametric representation using j son-like formats such as Apple Haptic Audio Pattern (AHAP) or Immersion Corporation’s HAPT format, or waveform encoding (IEEE 1918.1.1 ongoing standardization for tactile and kinesthetic signals). The HAPT format has been recently included into the MPEG ISOBMFF file format specification (ISO/IEC 14496 part 12). Moreover, GL Transmission Format (glTF™) is a royalty-free specification for the efficient transmission and loading of 3D scenes and models by applications. This format defines an extensible, common publishing format for 3D content tools and services that streamlines authoring workflows and enables interoperable use of content across the industry.

Moreover, a new haptic file format is being defined within the MPEG standardization group and relates to a coded representation for haptics. The Reference Model of this format is not yet released but is referenced herein as RMO. With this reference model, the encoded haptic description file can be exported either as a JSON interchange format (for example a .gmpg file) that is human readable or as a compressed binary distribution format (for example a .mpg) that is particularly adapted for transmission towards haptic rendering devices. SUMMARY

Embodiments are related to the encoding of an input signal (for example a haptic signal), particularly for a low frequency band of the signal when decomposed in multiple frequency bands, and enable the selection of a diverse variety of type of interpolations between keyframes (i.e. control points) to define a curve representing the signal, the interpolation being chosen amongst a set of curves comprising linear, cubic, quadratic, cubic Bezier, quadratic Bezier, BSpline, Nurbs or Akima interpolations.

A first aspect of at least one embodiment is directed to a method for encoding a signal, comprising: obtaining a set of keyframes, a keyframe being characterized by an amplitude and a temporal reference, obtaining a type of interpolation selected amongst a list of interpolation methods, and encoding information representative of the signal comprising the set of keyframes and the type of interpolation.

A second aspect of at least one embodiment is directed to a method for decoding a signal, comprising: obtaining information representative of a signal, the information comprising a set of keyframes and a type of interpolation, a keyframe being characterized by an amplitude and a temporal reference, determining a signal based on the set of keyframes interpolated according to the type of interpolation, and providing the signal.

A third aspect of at least one embodiment is directed to a device for encoding a signal comprising a processor configured to: obtain a set of keyframes, a keyframe being characterized by an amplitude and a temporal reference, obtain a type of interpolation selected amongst a list of interpolation methods, and encode information representative of the signal comprising the set of keyframes and the type of interpolation.

A fourth aspect of at least one embodiment is directed to a device for decoding a signal, comprising a processor configured to: obtain information representative of a signal, the information comprising a set of keyframes and a type of interpolation, a keyframe being characterized by an amplitude and a temporal reference, determine a signal based on the set of keyframes interpolated according to the type of interpolation, and provide the signal.

A fifth aspect of at least one embodiment is directed to a computer program comprising program code instructions executable by a processor, the computer program implementing at least the steps of a method according to the first or second aspect. A sixth aspect of at least one embodiment is directed to a computer program product stored on a non-transitory computer readable medium and comprising program code instructions executable by a processor, the computer program product implementing at least the steps of a method according to the first or second aspect.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 illustrates a block diagram of an example of a system in which various aspects and embodiments are implemented.

Figure 2 illustrates an example of structure for describing an immersive scene according to one embodiment.

Figure 3 illustrates an example of structure for the interchange file format describing an immersive scene.

Figure 4 illustrates an example of signal coded using two haptic bands.

Figure 5 A illustrates a conventional method for encoding a low-level haptic signal in two bands of frequencies.

Figure 5B illustrates a conventional method for decoding a low-level haptic signal in two bands of frequencies.

Figure 6A illustrates an example of input signal.

Figure 6B and 6C respectively shows the decomposition into the low frequency band and the high frequency band.

Figure 6D shows the result of the analysis of the low frequency band and the identification of the keyframes, herein identified by crosses, corresponding to the extreme values of the curve.

Figure 6E illustrates the discrepancies in the reconstruction of the low frequency signal.

Figure 6F illustrates both the input signal of the encoder and the signal reconstructed on the decoder side.

Figure 7 illustrates an example implementation of a user interface for a haptic signal generator according to embodiments. Figure 8A illustrates an example of interpolation based on Bezier curves.

Figure 8B illustrates an example of interpolation based on B-Splines.

Figure 8C illustrates an example of interpolation based on Nubrs.

Figure 8D illustrates an example of interpolation based on Akima.

Figure 9A illustrates an example of reconstructed signal using a cubic interpolation according to at least one embodiment.

Figure 9B illustrates an example of reconstructed signal using Akima interpolation according to at least one embodiment.

Figure 10 illustrates the haptic signal reconstructed according to the description of table 2 using Akima interpolation.

Figure 11 illustrates the difference between haptic signals based on different interpolations according to an embodiment where curve fitting is used.

DETAILED DESCRIPTION

Figure 1 illustrates a block diagram of an example of immersive system in which various aspects and embodiments are implemented. In the depicted immersive system, the user Alice uses the haptic rendering device 100 to interact with a server 180 hosting an immersive scene 190 through a communication network 170. This immersive scene 190 may comprise various data and/or files representing different elements (scene description 191, audio data, video data, 3D models, and haptic description file 192) required for its rendering. The immersive scene 190 may be generated under control of an immersive experience editor 110 that allows to arrange the different elements together and design an immersive experience. Appropriate description files and various data files representing the immersive experience are generated by an immersive scene generator 111 (a.k.a encoder) and encoded in a format adapted for transmission to haptic rendering devices. The immersive experience editor 110 is typically performed on a computer that will generate immersive scene to be hosted on the server. For the sake of simplicity, the immersive experience editor 110 is illustrated as being directly connected through the dotted line 171 to the immersive scene 190. In practice, the immersive scene 190 is hosted on the server 180 and the computer running the immersive experience editor 110 is connected to the server 180 through the communication network 170. The Immersive experience editor comprises a haptic signal generator 112 according to at least one embodiment. In at least one embodiment, the haptic signal generator comprises a graphical user interface or a dedicated authoring tool configured to select a set of keyframes and a type of curve for encoding the haptic signal. In at least one embodiment, the haptic signal generator encodes a haptic signal based on a configuration file comprising a set of keyframes and a type of curve for encoding the haptic signal. In at least one embodiment, the haptic signal generator comprises a curve fitting algorithm configured to generate a set of keyframes and a type of curve for encoding the haptic signal corresponding to a frequency band of a selected (i.e. , captured) haptic signal.

The haptic rendering device 100 comprises a processor 101. The processor 101 may be a general-purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor may perform data processing such as haptic signal decoding, input/ output processing, and/or any other functionality that enables the device to operate in an immersive system.

The processor 101 may be coupled to an input unit 102 configured to convey user interactions. Multiple types of inputs and modalities can be used for that purpose. Physical keypad or a touch sensitive surface are typical examples of input adapted to this usage although voice control could also be used. In addition, the input unit may also comprise a digital camera able to capture still pictures or video in two dimensions or a more complex sensor able to determine the depth information in addition to the picture or video and thus able to capture a complete 3D representation. The processor 101 may be coupled to a display unit 103 configured to output visual data to be displayed on a screen. Multiple types of displays can be used for that purpose such as a liquid crystal display (LCD) or organic light-emitting diode (OLED) display unit. The processor 101 may also be coupled to an audio unit 104 configured to render sound data to be converted into audio waves through an adapted transducer such as a loudspeaker for example. The processor 101 may be coupled to a communication interface 105 configured to exchange data with external devices. The communication preferably uses a wireless communication standard to provide mobility of the haptic rendering device, such as cellular (e.g., LTE) communications, Wi-Fi communications, and the like. The processor 101 may access information from, and store data in, the memory 106, that may comprise multiple types of memory including random access memory (RAM), read-only memory (ROM), a hard disk, a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, any other type of memory storage device. In embodiments, the processor 101 may access information from, and store data in, memory that is not physically located on the device, such as on a server, a home computer, or another device.

The processor 101 is coupled to a haptic unit 107 configured to provide haptic feedback to the user, the haptic feedback being described in the haptic description file 192 that is related to the scene description 191 of an immersive scene 190. The haptic description file 192 describes the kind of feedback to be provided according to the syntax described further hereinafter. Such description file is typically conveyed from the server 180 to the haptic rendering device 100. The haptic unit 107 may comprise a single haptic actuator or a plurality of haptic actuators located at a plurality of positions on the haptic rendering device. Different haptic units may have a different number of actuators and/or the actuators may be positioned differently on the haptic rendering device.

In at least one embodiment, the processor 101 is configured to render a haptic signal according to embodiments described further below, in other words to apply a low-level signal to a haptic actuator to render the haptic effect. Such low-level signal may be represented using different forms, for example by metadata or parameters in the description file or by using a digital encoding of a sampled analog signal (e.g., PCM or LPCM).

The processor 101 may receive power from the power source 108 and may be configured to distribute and/or control the power to the other components in the device 100. The power source may be any suitable device for powering the device. As examples, the power source may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), and the like), solar cells, fuel cells, and the like.

While the figure depicts the processor 101 and the other elements 102 to 108 as separate components, it will be appreciated that these elements may be integrated together in an electronic package or chip. It will be appreciated that the haptic rendering device 100 may include any sub-combination of the elements described herein while remaining consistent with an embodiment. The processor 101 may further be coupled to other peripherals or units not depicted in figure 1 which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals may include sensors such as a universal serial bus (USB) port, a vibration device, a television transceiver, a hands-free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like. For example, the processor 101 may be coupled to a localization unit configured to localize the haptic rendering device within its environment. The localization unit may integrate a GPS chipset providing longitude and latitude position regarding the current location of the haptic rendering device but also other motion sensors such as an accelerometer and/or an e-compass that provide localization services.

Typical examples of haptic rendering device 100 are haptic suits, smartphones, game controllers, haptic gloves, haptic chairs, haptic props, motion platforms, etc. However, any device or composition of devices that provides similar functionalities can be used as haptic rendering device 100 while still conforming with the principles of the disclosure.

In at least one embodiment, the device does not include a display unit but includes a haptic unit. In such embodiment, the device does not render the scene visually but only renders haptic effects. However, the device may prepare data for display so that another device, such as a screen, can perform the display. Example of such devices are haptic suits or motion platforms.

In at least one embodiment, the device does not include a haptic unit but includes a display unit. In such embodiment, the device does not render the haptic effect but only renders the scene visually. However, the device may prepare data for rendering the haptic effect so that another device, such as a haptic prop, can perform the haptic rendering. Examples of such devices are smartphones, head-mounted displays, or laptops.

In at least one embodiment, the device does not include a display unit nor does it include a haptic unit. In such embodiment, the device does not visually render the scene and does not render the haptic effects. However, the device may prepare data for display so that another device, such as a screen, can perform the display and may prepare data for rendering the haptic effect so that another device, such as a haptic prop, can perform the haptic rendering. Examples of such devices are computers, game consoles, optical media players, or set-top boxes.

In at least one embodiment, the immersive scene 190 and associated elements are directly hosted in memory 106 of the haptic rendering device 100 allowing local rendering and interactions. In a variant of this embodiment, the device 100 also comprises the immersive experience editor 110 allowing a fully standalone operation, for example without needing any communication network 170 and server 180.

Although the different elements of the immersive scene 190 are depicted in figure 1 as separate elements, the principles described herein apply also in the case where these elements are directly integrated in the scene description and not separate elements. Any mix between two alternatives is also possible, with some of the elements integrated in the scene description and other elements being separate files.

Figure 2 illustrates an example of process for encoding an immersive description file. This encoding process 200 is for example implemented as a module of immersive scene generator 111 of an immersive editor 110 and typically performed on a computer generating the files describing the immersive scene. It may also be implemented on a computer, or a specific hardware platform dedicated to encoding immersive description files. The inputs are a metadata file 201 and at least one low-level haptic signal file 203. The metadata file 201 is for example based on the ‘OHM’ haptic object file format. The signal files represent analog signals to be applied to haptic actuators and are conventionally encoded using a pulse coded modulation (PCM) for example based on the WAV file format. The descriptive files 202 are for example based on the AHAP or HAPT file formats.

Metadata is extracted, in step 210, from the metadata file 201, allowing to identify the descriptive files and/or signal files. Descriptive files are analyzed and transcoded in step 211. In step 212, signal files are processed to decompose the signal in frequency bands and keyframes or wavelets, as further described in figure 4.

The interchange file 204 is then generated in step 220, in compliance with the data format according to one of the embodiments described herein. The interchange file 204 may be compressed in step 230 to be distributed in a transmission-friendly form such as the distribution file 205, more compact than the interchange file format.

The interchange file 204 can be a human readable file for example based on glTF, XML or JSON formats. The distribution file 205 is a binary encoded file for example based on MPEG file formats adapted for streaming or broadcasting to a decoder device. Figure 3 illustrates an example of structure for the interchange file format describing an immersive scene. The data structure 300 represents the immersive scene 190. It can be decomposed in a set of layers. At the upper layer, metadata 301 describe high-level metadata information regarding the overall haptic experience defined in the data structure 300 and a list of avatars 302 (i.e., body representation) later referenced in the file. These avatars 302 allow to specify a target location of haptic stimuli on the body. The haptic effects are described through a list of perceptions 310, 3 IN. These perceptions correspond to haptic signals associated with specific perception modalities such as vibration, force, position, velocity, temperature, etc.). A perception comprises metadata 320 to describe the haptic content of the signal, devices 321 to describe specifications of the haptic devices for which the signal was designed and a list of haptic tracks 331 , 33N. A haptic track comprises metadata 340 to describe the content of the track, the associated gain value, a mixing weight, body localization information and a reference to haptic device specification (defined at the perception level). The track finally contains a list of haptic bands 351, 35N, each band defining a subset of the signal within a given frequency range. For example, the haptic band 351 may correspond to the range of frequencies from 0 to 50Hz while the haptic band 35N may correspond to the range of frequencies over 2kHz. A haptic band comprises band data 360 to describe the frequency range of the band, the type of encoding modality (Vectorial or Wavelet), the type of band (Transient, Curve and Wave) and optionally the type of curve (Cubic, Linear or unknown) or the window length. A haptic band is defined by a list of haptic effects 371, 37N. Finally, a haptic effect comprises a list of keyframes 391, 39N and effect data 380, a keyframe being defined by a position (i.e., a temporal reference), a frequency and an amplitude. The effect data describes the type of base signal selected amongst Sine, Square, Triangle, SawToothUp, and SawToothDown as well as provide temporal references such as timestamps. The low-level haptic signal can then be reconstructed by combining the keyframes of the haptic effects in the different bands, as illustrated in the example of figure 3.

Figure 4 illustrates an example of signal coded using two haptic bands. With this technique, a low-level haptic signal is encoded using a two frequency bands, a low frequency band 410 and a high frequency band 420, each of them defining a part of the signal in a given frequency range. In this example, the low frequency band corresponds to frequencies below 72.5 Hz while the high frequency band corresponds to frequencies equal to or higher than 72.5 Hz. On the rendering side, the device combines the two parts together (i.e., adding them together) to generate the final haptic signal 440.

The data for a frequency band may be reconstructed based on keyframes and according to a type of haptic band selected amongst Transient, Curve and Wave bands. Additionally, for Wave bands, two types of encoding modalities can be used: Vectorial or Wavelet. Each band is composed of a series of Effects and each Effect is defined by a list of Keyframes that are represented as dots in the figure. The data contained in the effects and keyframes is interpreted differently for different types of haptic bands and encoding modalities.

For a Transient band, each effect stores a set of keyframes defining a position, an amplitude, and a frequency. A keyframe represents a transient event. The signal may be reconstructed using the type of periodic base signal specified in the effect metadata with the amplitude specified in the keyframe and the period given by the frequency of the keyframe. A transient event is a very short signal generated for a few periods only. The number of generated periods is determined by the decoder.

For a Curve band, each effect stores a set of keyframes defining a position (i.e., a temporal reference) and an amplitude. The keyframes represent control points of a curve and an interpolation is performed to generate the curve from the control points. The type of interpolation function is either cubic or linear and is specified in the metadata of the band (380 in Figure 3). The signal may be reconstructed by performing an interpolation between the amplitudes of key frames according to their temporal references.

For Vectorial Wave bands, the effect stores a set of keyframes defining a position (i.e., a temporal reference), an amplitude and a frequency. In this case, the signal is generated using the type of periodic base signal specified in the effect metadata with the amplitude specified in the keyframe and the period given by the frequency of the keyframe. The SPIHT wavelet encoding scheme (http://www.sihconimaging.corn/SPIHT.htm) may be used for the Wavelet band or types of wavelet encoding. For example, for the Wavelet band, the effect may store the contents of one wavelet block. It contains a keyframe for every coefficient of the wavelet transformed and quantized signal, indicating the amplitude value of the wavelet. The coefficients are scaled to a range of [-1,1], Additionally, the original maximum amplitude is stored in a keyframe, as well as the maximum number of used bits. In this case, the signal may be reconstructed using the coefficients to perform an inverse wavelet transform. The frequency band decomposition may use a Low Pass Filter and a High pass filter to split the signal into a low frequency band and a high frequency band. The two bands are then processed differently. Various methods can be used for the encoding of the high frequency part. A first solution is to split the high frequency signal into smaller fixed length windows and use Short-time Fourier Transform (STFT) to decompose the signal in the frequency spectrum. Another solution is to use wavelet transforms to encode the high frequencies. The data structure illustrated in figure 3 allows to define multiple bands with different frequency ranges. These bands are used to store the coefficients of the Fourier or Wavelet Transforms.

For the low frequency part of the signal, the data of this frequency band is stored through a list of keyframe points defined by a timestamp and an amplitude. The data also contains information relative to the type of interpolation used to reproduce the signal of this band. The keyframes (i.e., control points) defining the low frequency band are obtained by simply extracting the local extrema of the low frequency signal.

In the example of the figure, the low frequency band 410 is defined as a Curve band using a single effect 411. Such representation is particularly adapted to the low frequency part of the signal. The effect 411 is defined by the key frames 4111, 4112, 4113, 4114, 4115, 4116, 4117, 4118, 4119. The signal for the low frequency band is generated by a cubic interpolation between these keyframes. The high frequency band 420 is defined by 4 effects 421, 422, 423, 424. These effects are defined as Vectorial bands. The effect 421 is based on 4 keyframes 4211, 4212, 4213, 4214. The effect 422 is based on 11 keyframes (not numbered on the figure). The effect 423 is based on 2 keyframes 4231 and 4232. The effect 424 is based on 2 keyframes 4241 and 4242.

While the description is based on a set of two bands defining a range for low frequencies and a range for high frequencies, the principles apply also in the case more than two ranges of frequencies are used. In this case, the low frequency band becomes the lowest frequency band, and the high frequency band becomes the highest frequency band. The lowest frequency band may for example be encoded using a curve band using a single effect, as represented by the low frequency band 410 of figure 4. Other frequency bands may be encoded with any of the other type of encoding, for example using a vectorial wave band based on wavelets, as represented by the high frequency band 420 of figure 4 but using multiple instances of encoding, one for each band of frequencies. One advantage of this solution with regards to the structure is that the signal data is easy to package and particularly convenient for streaming purposes Indeed, with such linear structure, the data can be easily broken down to small consecutive packages and does not require complicated data-pre-fetching operations. The signal is easily reconstructed by patching the packages back together to ensure a smooth playback of the signal. It may also be reconstructed by only taking the low frequency part and reconstruct a lower quality (but potentially sufficient) signal without taking into account the high frequency band.

As detailed in the following section, the further sections of this document describe the encoding of haptic signal based on a manually generated signal or of PCM waveform signals, for example carried by input WAV files. In this context, the haptic signal describes a single perception modality and even if the file contains multiple tracks, the encoder will process each track separately. Therefore, for the sake of clarity in the remainder of the disclosure, the description will describe the coding of a single track.

Figure 5A illustrates a conventional method for encoding a low-level haptic signal in two bands of frequencies. This corresponds to the signal processing step 212 of figure 2 and is for example implemented by an encoder such as the immersive scene generator 111 of figure 1. Given an input PCM signal, the encoder starts the process 500 by performing a frequency band decomposition. Using a Low Pass Filter and a High pass filter, the encoder splits, in step 510, the signal into low frequency bands 511 and high frequency bands 512. In step 520, the encoder analyses each low frequency bands and extracts data 521 representing the low frequency bands, and in step 530 analyses each high frequency band and extracts data 531 representing the high frequency bands. The extracted data are then formatted according to the structure of figure 3 in the formatting step 220 of figure 2.

In a typical example implementation, there is a single low frequency band that is encoded using a Curve band, so that the LF data comprises a set of keyframes extracted in step 520 and there is a single high frequency band that is encoded using a vectorial wave band so that the HF data comprises a set of wavelets extracted in step 530, as described above in figure 4.

This hybrid format combining Curve bands and Wave bands is interesting and allows to store low frequency signals very easily. This is especially convenient for synthetic signals that were produced through Haptic authoring tools (in particular kinesthetic signals). Figure 5B illustrates a conventional method for decoding a low-level haptic signal in two bands of frequencies. This process 550 is for example implemented in a haptic rendering device 100 of figure 1 and typically executed by the processor 101 of such device. The processor receives encoded data 551 generated according to figure 5 A, for example formatted according to the data structure described in figure 3. In step 560, the processor reconstructs the signal 552 corresponding to the low frequency band signal. In step 570, the processor reconstructs the signal 553 corresponding to the high frequency band signal. By adding the signals 552 and 553 together, the processor reconstructs the haptic signal 554 corresponding to the encoded data 551.

Figure 6A to 6E illustrates the differences induced by the keyframe interpolation on an example of input signal. Figure 6A shows the input signal. Unfortunately, the limited resolution of the picture in the document does not allow to convey the fine details and especially details of the high frequencies. Figure 6B and 6C respectively shows the decomposition into the low frequency band and the high frequency band. Figure 6D shows the result of the analysis of the low frequency band and the identification of the key frames, herein identified by crosses, corresponding to the extreme values of the curve. These values would then be provided to the decoder, i.e., a haptic rendering device, for decoding along with the type of interpolation to be used, selected between cubic or linear. Figure 6E illustrates the discrepancies in the reconstruction of the low frequency signal. It shows the low frequency band part of the input signal represented by a dotted line and the low frequency signal as it may be generated using the selected interpolation of the keyframes, represented by the solid line. The figure only shows a subset of the complete signal but with better resolution in order to be able to visualize the difference between the input signal and its reconstructed version, which is clearly visible. Figure 6F shows the input signal of the encoder, represented by a solid line and the output signal, i.e., the signal reconstructed on the decoder/rendering side, represented as a dotted line, thus comprising both the low and high frequency bands. This figure allows to visualize the differences between the signal intended to be rendered by the haptic rendering and the signal that will be provided to the haptic actuators. It particularly comprises phase offsets that may hinder the haptic experience since they may not be perfectly synchronized with the haptic scene. For example, the slopes of the reconstructed signal at temporal markers around 2.1, 2.7, and 3.3 are shifted by the reconstruction so that will be rendered in advance with regards to the input signal and thus temporally out of synchronization with the expected rendering. Although the example of this figure shows small delays, other type of signals may suffer from greater delay. Synchronization between a haptic effect and the corresponding change in the immersive scene are critical to ensure a satisfying user experience. Even a small delay may be noticed by the users. In addition, more severe coding artefacts from higher compression rate or simpler interpolation function (i.e., linear) will result in very bad rendering of the haptic experience.

Embodiments described hereafter have been designed with the foregoing in mind and propose to encode a signal such as a low-level haptic signal decomposed into a set of frequency bands in a more accurate and/or more diverse way than the conventional techniques, thus allowing improved fidelity of the rendering towards the expected rendering (i.e., the input signal to be encoded). The accuracy improvement is obtained thanks to a great diversity regarding the type of interpolation for the definition of the haptic signal.

Embodiments propose a new method to accurately encode a haptic signal decomposed in a set of frequency bands by describing the part of the haptic signal for a frequency band as a set of keyframes to be interpolated according to a selected type of curve selected from a set of curves comprising linear, cubic, quadratic, cubic Bezier, quadratic Bezier, bSpline, Nurbs or Akima.

In at least one embodiment, the haptic signal is manually generated, for example through a graphical user interface or a configuration file, by obtaining a set of keyframes and a selected type of curve. This allows perfect encoding since the encoded signal corresponds exactly to the signal as designed by the creator.

In at least one embodiment, the haptic signal is generated by approximating a captured haptic signal, the approximation is done for example though curve fitting algorithms, so that the haptic signal is defined as a set of keyframes and a selected type of curve. This allows to find a compromise between the cost of curve fitting and accuracy. Indeed, some interpolation methods are more accurate but need more computation to define the appropriate parameters.

At least one embodiment proposes to define a haptic signal based on an interpolation method selected amongst Linear, Cubic, Quadratic, Cubic Bezier, Quadratic Bezier, BSpline, Nurbs, and Akima. This choice of potential interpolation methods allows to improve the accuracy of the encoding of low frequencies and makes the format compatible with conventional curve editing tools. This embodiment is implemented in an immersive scene description (300 in figure 3) comprising a haptic band for which the interpolation for a curve type of band is specified by a “curve_type” element according to the JSON schema of Table 1. The proposed changes to the specifications are illustrated in Table 1 with the human readable interchange format but may also be applied to the binary version. Table 1

Given this modification, the encoder can be improved by using different types of curve representation (i.e., different interpolations). In the case where the haptic signal is manually generated, for example through a graphical user interface or a configuration file, the diverse variety of interpolation allows to perfectly fit the intent of the signal creator, giving more freedom to the haptic signal synthetic creation process. In addition, it allows to adapt the signal encoding and decoding process to the expected computation resources. Indeed, an Akima interpolation is much more complex than a linear interpolation and thus uses more memory and more computing resources. A simple haptic encoding or rendering device may consume an important part of its resources to reconstruct a signal using an Akima interpolation, probably at the cost of a less fluid user experience during the computation. For this reason, the diverse set of interpolation to be used allows the creator to find a good compromise between complexity and accuracy.

Figure 7 illustrates an example implementation of a user interface for a haptic signal generator according to embodiments. It allows to generate a haptic signal according to an embodiment where the haptic signal is manually generated but also according to an embodiment where the haptic signal is generated using curve fitting, i.e., trying to accurately represent an input signal. This example screen 700 is for example controlled by the haptic signal generator 112 of figure 1.

In an embodiment where the haptic signal is manually generated, the user enters a set of keyframes 711, 712, 713, 714, 715, 716, 717, 718, 719 in the area on the right side according to the corresponding time of occurrence along the horizontal temporal axis and intensity along the vertical axis. A contextual menu 730 allows to add a new keyframe, move an existing keyframe or delete one of the existing keyframes. In the area on the left side, the set of interpolation methods is shown and one of them is selected (linear 701 in the example). The corresponding haptic signal is simulated and may be generated once the curve is completed, for example using the generate button 750. In the example, the dashed line 720 represents the simulation of a haptic signal corresponding to the set of keyframes 711, 712... 719 using the linear type of curve, as selected by element 701. If the user selects another type of interpolation, moves a position of a keyframe, or removes a keyframe, the doted line is updated accordingly, allowing a very intuitive haptic signal creation process. The generated signal is associated with a haptic effect, as described above with respect to figures 3 and 4.

In an embodiment where the haptic signal is generated based on curve fiting, the user starts by selecting an input file comprising a signal to encode using the load buton 770. A file selection window is displayed allowing the user to browse through files and select the desired file. The input file is then analyzed by a curve fiting algorithm to determine a set of keyframes and, using the selected interpolation, to simulate a curve representing the input signal. The user may then modify the simulated curve by adding a keyframe, moving a keyframe, deleting a keyframe or by changing the type of interpolation. In this embodiment, an additional field (not illustrated) may be displayed to inform the user of the level of accuracy of the simulation with regards to the input signal. It may take the form of a conventional peak signal-to-noise ratio (PSNR) for example.

The curve fiting mechanism uses an input signal to be matched. In at least one embodiment, the processor executing the curve fiting extracts the extreme values of the input signal, applies all interpolations available and selects the most accurate, i.e. the simulated curve with minimal error compared to the input signal.

In the case of complex and long signals, that process may take some time. In at least one embodiment, a pre-processing is performed on the input signal to select the most appropriate interpolation, without having to test all types of interpolation.

Figure 8A illustrates an example of interpolation based on Bezier curves. A Bezier curve is defined by a set of control points P0 through Pn, where n is called the order of the curve (n = 1 for linear, 2 for quadratic, 3 for cubic, etc.). The first and last control points are always the endpoints of the curve; however, the intermediate control points (if any) generally do not he on the curve. The figure shows the case of quadratic Bezier interpolation where three points are used (P0, Pl, P2) and where the following parametric function is defined:

B(t) = (1 - t) 2 P0 + 2(1 - t)tPl + t 2 P2

Figure 8B illustrates an example of interpolation based on B-Splines. A B-Spline (basis spline) function is a kind of generalization of Bezier curves. It is based on a combination of flexible sections controlled by a number of control points creating smooth curves. A B-spline of order n is a piecewise polynomial function of degree n-1 in a variable x. It is defined over 1+n locations called knots (control points), which must be in non-descending order. The shape of the curve only depends on the position of the knots. Modifying one knot will only change the curve locally. The figure shows the case of a B-Spline of order 3, i.e., a cubic B-spline.

Figure 8C illustrates an example of interpolation based on Nurbs. Nurbs stands for Non-Uniform Rational Basis Spline and is a generalization of B-Splines. A Nurbs curve is defined by its order, a set of weighted control points, and a knot vector. The order of a Nurbs curve defines the number of nearby control points that influence any given point on the curve. A Nurbs curve is represented mathematically by a polynomial of degree one less than the order of the curve. Hence, second-order curves (which are represented by linear polynomials) are called linear curves, third-order curves are called quadratic curves, and fourth-order curves are called cubic curves. The control points determine the shape of the curve. The knot vector is a sequence of parameter values that determines where and how the control points affect the curve.

Figure 8D illustrates an example of interpolation based on Akima. The Akima method is also based on piecewise polynomials but differs from the spline by the conditions imposed at the data points. Akima interpolation uses only values from neighboring knot points in the construction of the coefficients of the interpolation polynomial between any two knot points. Therefore, there is no large system of equations to solve and the Akima spline avoids unphysical wiggles in regions where the second derivative in the underlying curve is rapidly changing. It gives good fits to curves where the second derivative is rapidly varying.

Figure 9A and 9B illustrate examples of reconstructed signal using different types of interpolation according to an embodiment where the haptic signal is generated by approximating an input signal using curve fitting. An input signal is obtained and a curve that fits the input signal is determined. A set of control points is first determined, for example using the extreme values of the signal and a corresponding curve is defined by using an interpolation as mentioned above. Depending on the input signal, the choice of control points and the type of interpolation, the resulting curve fits more or less the input curve. In the encoding process 500 of figure 5A, the data representing the determined curve are then encoded and provided to the decoding process 550 of figure 5B. The figures 9A to 9D shows the differences between an example of input signal in solid lines and the corresponding signal as reconstructed by the decoding process. These illustrations only represent the signal for a low frequency band.

Figure 9A illustrates an example of reconstructed signal using a cubic interpolation according to at least one embodiment. Cubic spline interpolation is a way to find a curve that fits data points using a Polynomial with a degree of three. Splines are smooth, continuous polynomials on a given path. A cubic spline has a cubic function for each data interval which is constrained to satisfy the CO, Cl and C2 conditions. The same control points as the Quadratic interpolation can be used.

Figure 9B illustrates an example of reconstructed signal using an Akima spline interpolation according to at least one embodiment. The Akima method is also based on piecewise polynomials but differs from the spline by the conditions imposed at the data points. In this method, the interpolation function is a cubic polynomial, the coefficients of which are fixed between every pair of successive data points by the condition that the function passes through the points with specified slopes. These slopes are determined by a local procedure, the slope at a given data point being a weighted average of the slopes of the line segments connecting that point with those on either side.

The Table 2 illustrates a description of a haptic kinesthetic effect according to at least one embodiment using Akima interpolation. The effect comprises only a low-frequency band. It uses 4 keyframes to describe its shape reconstructed using Akima interpolation. Figure 10 illustrates the haptic signal reconstructed according to the description of Table 2 using Akima interpolation.

Table 2

Figure 11 illustrates the difference between haptic signals based on different interpolations according to an embodiment where curve fitting is used. The figure zooms into a small portion of an input signal and shows the corresponding reconstructed signal after being encoded using different types of interpolation. It shows that Akima interpolation is the best solution to represent this example of curve.

Table 3 shows the mean square error between the reconstructed signal and the input signal on the whole signal duration using respectively linear, cubic and Akima interpolation, with reference to the signal of Figure 6A.

Table 3

In an embodiment, the encoding principle of the first or second method is applied to perform the encoding of an audio signal. Such audio signal may represent any type of audio communication such as a background soundtrack, a sound effect (e.g., explosion) or a voice communication between two users. The audio signal may be part of an immersive scene or can be independent from any immersive scene but use the same format as described in figure 3. In addition, an audio signal is sometimes used to render a haptic signal, after a low pass filtering stage. This encoding technique may particularly be interesting for low frequencies such as an audio signal for a subwoofer. All the encoding principles are the same as described above in the context of low-level haptic signals but applied to a more general audio signal or a set of signals (for example: stereo, 5.1 multi-channel audio, etc.). Indeed, a low-level haptic signal is very similar to an audio signal and shares the same characteristics. Such embodiment could therefore be applied to any audio distribution system and the resulting encoded data could be stored on a removable media (for example: memory card, USB stick, hard disk drive, solid- state disk, optical media, etc.) or transmitted over a communication network.

When multiple frequency bands are encoded using keyframes, the principles described in the first or second embodiment are used for each of the frequency bands encoded using keyframes. Resulting residual signals may be encoded separately as different frequency bands or combined in a single frequency band.

Although embodiments have been described mainly using a decomposition into two frequency bands, the principles of the first and second embodiment easily apply to an application where the decomposition uses more than two frequency bands.

Although different embodiments have been described separately, any combination of the embodiments together can be done while respecting the principles of the disclosure. Although embodiments are related to haptic effects, the person skilled in the art will appreciate that the same principles could apply to other effects such as the sensorial effects for example and thus would comprise smell, taste, temperature, emotions, intensity highlights, etc. Appropriate syntax would thus determine the appropriate parameters related to these effects.

Reference to “one embodiment” or “an embodiment” or “one implementation” or “an implementation”, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.

Additionally, this application or its claims may refer to “determining” various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.

Additionally, this application or its claims may refer to “obtaining” various pieces of information. Obtaining is, as with “accessing”, intended to be a broad term. Obtaining the information may include one or more of, for example, receiving the information, accessing the information, or retrieving the information (for example, from memory or optical media storage). Further, “obtaining” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.

It is to be appreciated that the use of any of the following “and/or”, and “at least one of’, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.