Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ENCODING OF A THREE-DIMENSIONAL REPRESENTATION OF A USER AND DECODING OF THE SAME
Document Type and Number:
WIPO Patent Application WO/2022/242880
Kind Code:
A1
Abstract:
There is provided mechanisms for encoding a 3D representation of a user. A method is performed by an encoder module. The method comprises obtaining a 3D representation of a first user from a first user device. The 3D representation is associated with a voice activity indicator that per time unit of the 3D representation indicates whether the first user is engaged in vocal communication or not. The method comprises encoding the 3D representation per time unit according to an encoding level. The encoding level per time unit is a function of the voice activity indicator per said time unit. The method comprises providing, over a data network and towards a decoder module, the encoded 3D representation.

Inventors:
AKAN ESRA (DE)
EL ESSAILI ALI (DE)
TYUDINA NATALYA (DE)
EWERT JÖRG CHRISTIAN (DE)
Application Number:
PCT/EP2021/063685
Publication Date:
November 24, 2022
Filing Date:
May 21, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ERICSSON TELEFON AB L M (SE)
International Classes:
H04N21/233; H04N21/2343; H04N21/81
Foreign References:
US20110103468A12011-05-05
Attorney, Agent or Firm:
ERICSSON (SE)
Download PDF:
Claims:
CLAIMS

1. A method for encoding a three-dimensional, 3D, representation of a user, the method being performed by an encoder module (200), the method comprising: obtaining (S102) a 3D representation of a first user from a first user device (110a), wherein the 3D representation is associated with a voice activity indicator that per time unit of the 3D representation indicates whether the first user is engaged in vocal communication or not; encoding (S106) the 3D representation per time unit according to an encoding level, wherein the encoding level per time unit is a function of the voice activity indicator per said time unit; and providing (S108), over a data network and towards a decoder module (300), the encoded 3D representation.

2. The method according to claim 1, wherein, for at least part of the 3D representation, the 3D representation is encoded with a higher quality when the voice activity indicator indicates that the first user is engaged in vocal communication than when the voice activity indicator indicates that the first user is not engaged in vocal communication.

3. The method according to claim 1 or 2, wherein the method further comprises: segmenting (S104) the 3D representation per time unit into segments according to a segmentation level, wherein the segmentation level per time unit is a function of the voice activity indicator per said time unit, and wherein the encoding is performed on the 3D representation after having been segmented.

4. The method according to claim 3, wherein the encoding level varies among the segments per time unit. 5. The method according to claim 4, wherein, when voice activity indicator indicates that the first user is engaged in vocal communication, the encoding level for segments associated with speech articulation of the first user is higher than for segments not associated with speech articulation of the first user.

6. The method according to claim 3, wherein, for at least part of the 3D representation, the 3D representation is segmented into a higher number of segments when the voice activity indicator indicates that the first user is engaged in vocal communication than when the voice activity indicator indicates that the first user is not engaged in vocal communication.

7. The method according to any preceding claim, wherein parameters of the encoding are provided as configuration data, and wherein the method further comprises: providing (S110), over the data network, the configuration data towards the second user device (110b).

8. The method according to a combination of claim 3 and claim 7, wherein the configuration data further comprises parameters of the segmenting.

9. The method according to any preceding claim, wherein the 3D representation is any of: a point cloud, a polygon mesh, a triangulated mesh. 10. The method according to any preceding claim, wherein the voice activity indicator is obtained from image frames corresponding to the 3D representation or from an audio signal that is timewise synchronized with the 3D representation.

11. The method according to any preceding claim, wherein the time unit has a length that is dependent on any of: parameters of the data network, number of second user devices (110b) intended to obtain a decoded 3D representation of the 3D representation from the decoder module (300), data network conditions of the second user devices (110b).

12. A method for decoding an encoded three-dimensional, 3D, representation of a user, the method being performed by a decoder module (300), the method comprising: obtaining (S202), over a data network and from an encoder module (200), an encoded 3D representation of a first user, wherein the encoded 3D representation is associated with a voice activity indicator dependent encoding level indicative of per time unit whether the first user is engaged in vocal communication or not; decoding (S206) the encoded 3D representation per time unit according to the encoding level per time unit into a decoded 3D representation of the first user; and providing (S214) the decoded 3D representation towards a second user device (110b). 13. The method according to claim 12, wherein, for at least part of the encoded 3D representation, the encoded 3D representation is decoded with a higher quality when the encoding level is indicative of that the first user is engaged in vocal communication than when the encoding level is indicative of that the first user is not engaged in vocal communication. 14. The method according to claim 12 or 13, wherein the method further comprises: obtaining (S204), over the data network and from the encoder module (200), configuration data comprising the encoding level.

15. The method according to any of claims 12 to 14, wherein the encoded 3D representation has been encoded into segments of the encoded 3D representation after having been segmented according to a voice activity indicator dependent segmentation level, wherein decoding the 3D representation is performed per segment of the encoded 3D representation, resulting in decoded segments of the 3D representation, and wherein the method further comprises: combining (S208) the decoded segments of the 3D representation after having decoded all segments of the encoded 3D representation.

16. The method according to a combination of claim 14 and claim 15, wherein the configuration data further comprises the segmentation level.

17. The method according to any of claims 12 to 16, wherein the method further comprises: segmenting (S210) the decoded 3D representation per time unit into decoded segments according to a segmentation level per time unit being a function of the encoding level per said time unit; and post-processing (S212) at least some of the decoded segments before providing the decoded 3D representation towards the second user device (110b).

18. The method according to claim 17, wherein, for at least part of the decoded 3D representation, the decoded 3D representation is segmented into a higher number of segments when the segmentation level is indicative of that the first user is not engaged in vocal communication than when the segmentation level is indicative of that the first user is engaged in vocal communication.

19. The method according to claim 17 or claim 18, wherein the post-processing comprises replacing at least some of the decoded segments with content from a 3D model.

20. The method according to any of claims 12 to 19, wherein the 3D representation is any of: a point cloud, a polygon mesh, a triangulated mesh.

21. An encoder module (200) for encoding a three-dimensional, 3D, representation of a user, the encoder module (200) comprising processing circuitry (210), the processing circuitry being configured to cause the encoder module (200) to: obtain a 3D representation of a first user from a first user device (110a), wherein the 3D representation is associated with a voice activity indicator that per time unit of the 3D representation indicates whether the first user is engaged in vocal communication or not; encode the 3D representation per time unit according to an encoding level, wherein the encoding level per time unit is a function of the voice activity indicator per said time unit; and provide, over a data network and towards a decoder module (300), the encoded 3D representation. 22. An encoder module (200) for encoding a three-dimensional, 3D, representation of a user, the encoder module (200) comprising: an obtain module (210a) configured to obtain a 3D representation of a first user from a first user device (110a), wherein the 3D representation is associated with a voice activity indicator that per time unit of the 3D representation indicates whether the first user is engaged in vocal communication or not; an encode module (210c) configured to encode the 3D representation per time unit according to an encoding level, wherein the encoding level per time unit is a function of the voice activity indicator per said time unit; and a provide module (2iod) configured to provide, over a data network and towards a decoder module (300), the encoded 3D representation.

23. The encoder module (200) according to claim 21 or 22, further being configured to perform the method according to any of claims 2 to 11.

24. A decoder module (300) for decoding an encoded three-dimensional, 3D, representation of a user, the decoder module (300) comprising processing circuitry (310), the processing circuitry being configured to cause the decoder module (300) to: obtain, over a data network and from an encoder module (200), an encoded 3D representation of a first user, wherein the encoded 3D representation is associated with a voice activity indicator dependent encoding level indicative of per time unit whether the first user is engaged in vocal communication or not; decode the encoded 3D representation per time unit according to the encoding level per time unit into a decoded 3D representation of the first user; and provide the decoded 3D representation towards a second user device (110b).

25. A decoder module (300) for decoding an encoded three-dimensional, 3D, representation of a user, the decoder module (300) comprising: an obtain module (310a) configured to obtain, over a data network and from an encoder module (200), an encoded 3D representation of a first user, wherein the encoded 3D representation is associated with a voice activity indicator dependent encoding level indicative of per time unit whether the first user is engaged in vocal communication or not; a decode module (310c) configured to decode the encoded 3D representation per time unit according to the encoding level per time unit into a decoded 3D representation of the first user; and a decode module (3iog) configured provide the decoded 3D representation towards a second user device (110b).

26. The decoder module (300) according to claim 24 or 25, further being configured to perform the method according to any of claims 13 to 19.

27. A computer program (1220a) for encoding a three-dimensional, 3D, representation of a user, the computer program comprising computer code which, when run on processing circuitry (210) of an encoder module (200), causes the encoder module (200) to: obtain (S102) a 3D representation of a first user from a first user device (110a), wherein the 3D representation is associated with a voice activity indicator that per time unit of the 3D representation indicates whether the first user is engaged in vocal communication or not; encode (S106) the 3D representation per time unit according to an encoding level, wherein the encoding level per time unit is a function of the voice activity indicator per said time unit; and provide (S108), over a data network and towards a decoder module (300), the encoded 3D representation.

28. A computer program (1220b) for decoding an encoded three-dimensional, 3D, representation of a user, the computer program comprising computer code which, when run on processing circuitry (310) of a decoder module (300), causes the decoder module (300) to: obtain (S202), over a data network and from an encoder module (200), an encoded 3D representation of a first user, wherein the encoded 3D representation is associated with a voice activity indicator dependent encoding level indicative of per time unit whether the first user is engaged in vocal communication or not; decode (S206) the encoded 3D representation per time unit according to the encoding level per time unit into a decoded 3D representation of the first user; and provide (S214) the decoded 3D representation towards a second user device (110b). 29. A computer program product (1210a, 1210b) comprising a computer program

(1220a, 1220b) according to at least one of claims 27 and 28, and a computer readable storage medium (1230) on which the computer program is stored.

Description:
ENCODING OF A THREE-DIMENSIONAL REPRESENTATION OF A USER

AND DECODING OF THE SAME

TECHNICAL FIELD

Embodiments presented herein relate to a method, an encoder module, a computer program, and a computer program product for encoding a three-dimensional representation of a user. Embodiments presented herein further relate to a method, a decoder module, a computer program, and a computer program product for decoding an encoded three-dimensional representation of a user.

BACKGROUND For video communication services, there is always a challenge to obtain good performance and capacity for a given communications protocol, its parameters and the physical environment in which the video communication service is deployed.

As a non-limiting illustrative example, video conferencing has become an important tool of daily life. In the business environment, it enables a more effective collaboration between remote locations as well as the reduction of travelling costs. In the private environment, video conferencing makes possible a closer, more personal communication between related people. In general, although two-dimensional (2D) video conferencing systems provide a basic feeling of closeness between participants, the user experience could still be improved by supplying a more realistic/immersive feeling to the conferees. Technically, this could be achieved, among others, with the deployment of three-dimensional (3D) video techniques, which add depth perception to the user visual experience and also provide a better understanding of the scene proportions. In this respect, 3D video or 3D experience commonly refers to the possibility of, for a viewer, getting the feeling of depth in the scene or, in other words, to get a feeling for the viewer to be in the scene. In technical terms, this may generally be achieved both by the type of capture equipment (i.e. the cameras) and by the type of rendering equipment (i.e. the display) that are deployed in the system.

3D video conferencing maybe enabled in many different forms. To this effect, 3D equipment such as stereo cameras and 3D displays have been deployed. The incorporation of 3D video to video conferencing systems leads to new usage possibilities as well as to new scenarios where the immersive feeling provided could be beneficial. Also, it leads to potential problems in the video format and codec selections. For example, as the number of users participating in a 3D video conference increases, so does the amount of data that needs to be transmitted between the users. This could potentially result in bottlenecks being created in 3D video conferencing systems, or other types of 3D communication systems, such as peer-to-peer 3D communication systems with two or more participating users, 3D communication systems used in enterprise or industrial environments, 3D communication systems used in healthcare applications, and 3D communication systems for consumer electronics devices. Hence, there is a need for improved communication of 3D representations of user from one user device to another user device.

SUMMARY

An object of embodiments herein is to provide efficient communication of 3D representations of user from one user device to another user device where the above issues are avoided, or at least mitigated or reduced.

According to a first aspect there is presented a method for encoding a 3D representation of a user. The method is performed by an encoder module. The method comprises obtaining a 3D representation of a first user from a first user device. The 3D representation is associated with a voice activity indicator that per time unit of the 3D representation indicates whether the first user is engaged in vocal communication or not. The method comprises encoding the 3D representation per time unit according to an encoding level. The encoding level per time unit is a function of the voice activity indicator per said time unit. The method comprises providing, over a data network and towards a decoder module, the encoded 3D representation.

According to a second aspect there is presented an encoder module for encoding a 3D representation of a user. The encoder module comprises processing circuitry. The processing circuitry is configured to cause the encoder module to obtain a 3D representation of a first user from a first user device. The 3D representation is associated with a voice activity indicator that per time unit of the 3D representation indicates whether the first user is engaged in vocal communication or not. The processing circuitry is configured to cause the encoder module to encode the 3D representation per time unit according to an encoding level. The encoding level per time unit is a function of the voice activity indicator per said time unit. The processing circuitry is configured to cause the encoder module to provide, over a data network and towards a decoder module, the encoded 3D representation.

According to a third aspect there is presented an encoder module for encoding a 3D representation of a user. The encoder module comprises an obtain module configured to obtain a 3D representation of a first user from a first user device. The 3D representation is associated with a voice activity indicator that per time unit of the 3D representation indicates whether the first user is engaged in vocal communication or not. The encoder module comprises an encode module configured to encode the 3D representation per time unit according to an encoding level. The encoding level per time unit is a function of the voice activity indicator per said time unit. The encoder module comprises a provide module configured to provide, over a data network and towards a decoder module, the encoded 3D representation.

According to a fourth aspect there is presented a computer program for encoding a 3D representation of a user. The computer program comprises computer program code which, when run on processing circuitry of an encoder module, causes the encoder module to perform a method according to the first aspect. According to a fifth aspect there is presented method for decoding an encoded 3D representation of a user. The method is performed by a decoder module. The method comprises obtaining, over a data network and from an encoder module, an encoded 3D representation of a first user. The encoded 3D representation is associated with a voice activity indicator dependent encoding level indicative of per time unit whether the first user is engaged in vocal communication or not. The method comprises decoding the encoded 3D representation per time unit according to the encoding level per time unit into a decoded 3D representation of the first user. The method comprises providing the decoded 3D representation towards a second user device.

According to a sixth aspect there is presented a decoder module for decoding an encoded 3D representation of a user. The decoder module comprises processing circuitry. The processing circuitry is configured to cause the decoder module to obtain, over a data network and from an encoder module, an encoded 3D representation of a first user. The encoded 3D representation is associated with a voice activity indicator dependent encoding level indicative of per time unit whether the first user is engaged in vocal communication or not. The processing circuitry is configured to cause the decoder module to decode the encoded 3D representation per time unit according to the encoding level per time unit into a decoded 3D representation of the first user. The processing circuitry is configured to cause the decoder module to provide the decoded 3D representation towards a second user device.

According to a seventh aspect there is presented a decoder module for decoding an encoded 3D representation of a user. The decoder module comprises an obtain module configured to obtain, over a data network and from an encoder module, an encoded 3D representation of a first user. The encoded 3D representation is associated with a voice activity indicator dependent encoding level indicative of per time unit whether the first user is engaged in vocal communication or not. The decoder module comprises a decode module configured to decode the encoded 3D representation per time unit according to the encoding level per time unit into a decoded 3D representation of the first user. The decoder module comprises a decode module configured provide the decoded 3D representation towards a second user device. According to an eighth aspect there is presented a computer program for decoding an encoded 3D representation of a user. The computer program comprises computer program code which, when run on processing circuitry of a decoder module, causes the decoder module to perform a method according to the fifth aspect.

According to a ninth aspect there is presented a computer program product comprising a computer program according to at least one of the fourth aspect and the eighth aspect and a computer readable storage medium on which the computer program is stored. The computer readable storage medium could be a non-transitory computer readable storage medium.

Advantageously, these aspects provide efficient communication of 3D representations of user from one user device to another user device where the above issues are avoided, or at least mitigated or reduced. Advantageously, these aspects enable network resources and bandwidth to be reduced for real-time communication of 3D representations over a data network.

Advantageously, these aspects enable optimization of encoding and decoding of 3D representations using a voice activity indicator as an input. Advantageously, these aspects allow replacing parts of a 3D representation with 3D generated models based on voice activity.

Advantageously, these aspects can be implemented in an edge/cloud computation service, thereby making the herein disclosed embodiments agnostic to the type of user devices (and user interfaces), such as type of 3D camera, 3D display, etc. that is used.

Advantageously, these aspects can be implemented in 3D video conferencing systems, or other types of 3D communication systems, such as peer-to-peer 3D communication systems with two or more participating users, 3D communication systems used in enterprise or industrial environments, 3D communication systems used in healthcare applications, and 3D communication systems for consumer electronics devices.

Other objectives, features and advantages of the enclosed embodiments will be apparent from the following detailed disclosure, from the attached dependent claims as well as from the drawings. Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the element, apparatus, component, means, module, step, etc." are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, module, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.

BRIEF DESCRIPTION OF THE DRAWINGS

The inventive concept is now described, by way of example, with reference to the accompanying drawings, in which: Fig. l is a schematic diagram illustrating a communication system according to embodiments;

Fig. 2 is a schematic illustration of communication between different entities in a communication system according to embodiments; Figs. 3 and 4 are flowcharts of methods according to embodiments;

Figs. 5, 6, and 7 are signalling diagrams according to embodiments;

Fig. 8 is a schematic diagram showing functional units of an encoder module according to an embodiment;

Fig. 9 is a schematic diagram showing functional modules of an encoder module according to an embodiment;

Fig. 10 is a schematic diagram showing functional units of a decoder module according to an embodiment;

Fig. 11 is a schematic diagram showing functional modules of a decoder module according to an embodiment; and Fig. 12 shows one example of a computer program product comprising computer readable means according to an embodiment.

DETAILED DESCRIPTION

The inventive concept will now be described more fully hereinafter with reference to the accompanying drawings, in which certain embodiments of the inventive concept are shown. This inventive concept may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the inventive concept to those skilled in the art. Like numbers refer to like elements throughout the description. Any step or feature illustrated by dashed lines should be regarded as optional.

Fig. 1 is a schematic diagram illustrating a communication system 100 where embodiments presented herein can be applied. The communication system 100 could form part of a 3D video conferencing system. The communication system 100 comprises a first user device 110a at a first location, a second user device 110b at a second location, and a third user device 110c at a third location. Each user device 110a: 110c is operated by a respective user; a first user 140a, a second user 140b, and a third user 140c. At 150a is schematically illustrated that the first user 140a is engaged in vocal communication (e.g., speaking). The users of the user devices 110a: 110c could be participants of a 3D video conference. Each user device 110a: 110c could be any of: a portable wireless device, mobile station, mobile phone, user equipment (UE), smartphone, laptop computer, tablet computer, or the like and support any wireless and/ or wired communication protocol. As the skilled person understands, the communication system 100 could comprise a plurality of user devices noa:iioc. The user devices 110a: 110c are configured to communicate with each other (and optionally also with at least one cloud computation service 180a: 180c, 190) via links 160a: 160c and over a data network 170. The data network could be either a wireless network, a wired network, or a combination thereof. In case the data network is at least partly wireless, it might support any cellular or non-cellular access technology. The at least one cloud computation service 180a: 180c, 190 could be divided into several edge cloud computation services 180a: 180c and one centralized cloud computation service 190. Each user device 110a: 110c is equipped with, or operatively connected to, a 3D camera 130a: 130c for capturing a sequence of images of the user I40a:i40c of the user device noa:iioc. Each sequence of image provides a 3D representation of the user. The 3D representation of each user is from the user devices communicated over the data network 170 to the other user devices 110a: 110c for display at a user interface 120a: 120c. The user interface 120a: 120c could be a 3D display and be part of the user device noa:iioc or be operatively connected to the user device noa:iioc. The user interface I20a:i20c could be part of a display device, such as a 3D screen, or wearable computer-capable glasses (also referred to as smartglasses or smart glasses).

As disclosed above, there is a need for improved communication of 3D representations of user from one user device to another user device. This issue will be illustrated with reference to Fig. 2. Fig. 2 is a schematic illustration of communication between some of the entities in the communication system 100 of Fig. 1. It is assumed that a 3D representation of a user is to be communicated to at least one other user. For simplification of notation but without loss of generality, it is in Fig. 2 assumed that the 3D representation of the first user 140a is to be provided to the second user 140b. The 3D representation is communicated from the 3D camera 130a to the first user device 110a where encoding (and possible other processing) of the 3D representation is performed. A thus encoded 3D representation is then provided over the data network 170 to an edge cloud computation service 180b. At the edge cloud computation service 180b decoding (and possibly other processing) and then rendering of the encoded 3D representation is performed. The thus decoded and rendered 3D representation is then provided to the second user device 110b for display at the second user interface 120b. Hence, in this scheme, no consideration is made with respect to the content that is captured at the 3D camera 130a. The same compression parameters are used for the entire body of the first user 140a, which results in inefficient consumption of the network resources. Further, the same compression parameters are used regardless of the status of the participating user 140a: 140c. In this respect, in the example of a 3D video conferencing system, participating users 140a: 140c do not speak at the same time, but still the 3D representations of all users 140a: 140c are, according to the scheme in Fig. 2, encoded in the same way and using the same compression parameters. This scheme thus results in a waste of network resources. The embodiments disclosed herein therefore relate to mechanisms for encoding a 3D representation of a user and decoding an encoded 3D representation of a user. In order to obtain such mechanisms there is provided an encoder module 200, a method performed by the encoder module 200, a computer program product comprising code, for example in the form of a computer program, that when run on processing circuitry of the encoder module 200, causes the encoder module 200 to perform the method. In order to obtain such mechanisms there is further provided a decoder module 300, a method performed by the decoder module 300, and a computer program product comprising code, for example in the form of a computer program, that when run on processing circuitry of the decoder module 300, causes the decoder module 300 to perform the method.

The herein disclosed embodiments are based on that the encoding (and thus also the corresponding decoding) of the 3D representation is content-dependent. Especially, the herein disclosed embodiments are based on that the encoding (and thus also the corresponding decoding) of the 3D representation can be adapted based on whether the user is speaking or not. This enables network resources to be saved, and bottlenecks to be avoided (or at least reduced), since the encoding level can be decreased (and hence the compression ratio can be increased) when encoding 3D representations of a user that does not speak, or is otherwise not engaged in vocal communication.

Reference is now made to Fig. 3 illustrating a method for encoding a 3D representation of a user as performed by the encoder module 200 according to an embodiment. S102: The encoder module 200 obtains a 3D representation of a first user 140a from a first user device 110a. In this respect, the 3D representation might originate from a 3D camera 130a having captured a sequence of images of the first user 140a, or from a device that has obtained the sequence of images of the first user 140a from the 3D camera 130a. The 3D representation is associated with a voice activity indicator that per time unit of the 3D representation indicates whether the first user 140a is engaged in vocal communication or not. In this respect, vocal communication could be any of: speaking, singing, etc. Aspects of the voice activity indicator will be disclosed below.

The 3D representation is then encoded. How to encode the 3D representation is at least based on the voice activity indicator.

S106: The encoder module 200 encodes the 3D representation per time unit according to an encoding level. The encoding level per time unit is a function of the voice activity indicator per time unit. Hence, at which encoding level to encode the 3D representation is given by the voice activity indicator. S108: The encoder module 200 provides, over a data network 170 and towards a decoder module 300, the encoded 3D representation. In this respect, the data network 170 could be either a wireless network, a wired network, or a combination thereof. In case the data network 170 is at least partly wireless, it might support any cellular or non-cellular access technology. Embodiments relating to further details of encoding a 3D representation of a user as performed by the encoder module 200 will now be disclosed.

There could be different types of 3D representations. In some non-limiting examples, the 3D representation is any of: a point cloud, a polygon mesh, a triangulated mesh. As disclosed above, the voice activity indicator that per time unit of the 3D representation indicates whether the first user 140a is engaged in vocal communication or not. That is, the voice activity indicator is binary-valued, and one value of the voice activity indicator is provided per each such time unit. As further disclosed above, the encoder module 200 encodes the 3D representation per time unit according to an encoding level (where the encoding level per time unit is a function of the voice activity indicator per time unit). Hence, which encoding level to use when encoding the 3D representation for a certain time unit depends on the voice activity indicator for that certain time unit. Thus, the shorter the time unit is, the more often a new value of the voice activity indicator is obtained and hence the more often a new decision needs to be made regarding which encoding level to use when encoding the 3D representation. There could be different ways to determine the length of the time unit. In some embodiments, the time unit has a length that is dependent on any of: parameters of the data network 170, the number of second user devices 110b intended to obtain a decoded 3D representation of the 3D representation from the decoder module 300, data network conditions of the second user devices 110b. In some non-limiting examples, as the number of user devices 110a: 110c increases, a longer time unit can be used to reduce computational complexity of the communication system 100. As data network conditions are degraded for one user device 110a: 110c, a shorter time unit can be used for this user device noa:iioc, in order to make up for extra delay caused by degrading network conditions.

There could be different sources of the voice activity indicator. In some examples, the voice activity indicator could be obtained by image analysis of image frames (as captured by the 3D camera 130a and depicting the first user 140a) corresponding to the 3D representation. Such image analysis could involve identifying lip, or mouth, movement of the first user 140a and mapping the identified lip, or mouth, movement to the voice activity indicator. That is, if the image analysis of the lip, or mouth, movement reveals that the first user 140a is engaged in vocal communication (e.g., speaking) then the voice activity indicator is set and otherwise the voice activity indicator is not set. In some examples, the voice activity indicator could be obtained by audio analysis of an audio signal (as timewise synchronized with the 3D representation) of the first user 140a. Such audio analysis could involve identifying time instances when the first user 140a is engaged in vocal communication (e.g., speaking). That is, if the audio analysis of the lip movement reveals that the first user 140a is engaged in vocal communication (e.g., speaking) then the voice activity indicator is set and otherwise the voice activity indicator is not set. Thus, in some embodiments, the voice activity indicator is obtained from image frames corresponding to the 3D representation and/ or from an audio signal that is timewise synchronized with the 3D representation. The voice activity indicator might be determined by the encoder module 200 itself or be obtained together with the 3D representation from another device. As disclosed above, the encoding level per time unit is a function of the voice activity indicator per time unit. There may be different ways to select this function. In some aspects, the 3D representation is encoded with high quality when the voice activity indicator indicates that the first user 140a is speaking or in other ways engaged in vocal communication. Particularly, in some embodiments, for at least part of the 3D representation, the 3D representation is encoded with a higher quality when the voice activity indicator indicates that the first user 140a is engaged in vocal communication than when the voice activity indicator indicates that the first user 140a is not engaged in vocal communication. Hence, more features are preserved, for at least part of the 3D representation, during the encoding when the voice activity indicator indicates that the first user 140a is engaged in vocal communication than when the voice activity indicator indicates that the first user 140a is not engaged in vocal communication.

In some aspects, segmentation of the 3D representation is performed before the 3D representation is encoded. However, as will be further disclosed below, in other aspects, segmentation of the 3D representation is performed after the encoded 3D representation has been decoded. In general terms, segmentation refers to dividing the 3D representation into segments based on feature extraction. The different segments could be labelled to identify regions of interest. Segmentation thus enables different encoding levels to be used for different parts (as defined by the segments) of the 3D representation. In particular, in some embodiments, the encoder module 200 is configured to perform (optional) step S104:

S104: The encoder module 200 segments the 3D representation per time unit into segments according to a segmentation level. The segmentation level per time unit is a function of the voice activity indicator per time unit. The encoding is then performed on the 3D representation after having been segmented.

In some embodiments, the encoding level varies among the segments per time unit. This enables segments corresponding to regions of interest to be encoded with higher quality than other segments. The the encoding level can thus be individually determined for each segment. In this respect, different body segmentation sets can be defined, each set having a different level of detail (lips, eyes, head, full -body, etc.).

The segmentation level thus determines which body segmentation set to use (based on the required quality). Segments corresponding to features of the first user 140a such as “lips”, “mouth area”, and/ or “face”, etc. could thus be encoded with higher quality when the voice activity indicator indicates that the first user 140a is engaged in vocal communication than when the voice activity indicator indicates that the first user 140a is not engaged in vocal communication

In this respect, “lips”, “mouth area”, “face”, etc. are examples of features that are associated with speech articulation (movement of the lips, mouth, etc.). In particular, some embodiments, when voice activity indicator indicates that the first user 140a is engaged in vocal communication, the encoding level for segments associated with speech articulation of the first user 140a is higher than for segments not associated with speech articulation of the first user 140a. For an active speaker, the segments of the 3D representation representing body parts related to speech articulation might thus be encoded with higher quality compared to the segments of the 3D representation representing body parts not related to speech articulation. That is, in some embodiments, for at least part of the 3D representation, the 3D representation is segmented into a higher number of segments when the voice activity indicator indicates that the first user 140a is engaged in vocal communication than when the voice activity indicator indicates that the first user 140a is not engaged in vocal communication.

In some aspects, the decoder module 300 is made aware of parameters used by the encoder module 200 for encoding the 3D representation. In particular, parameters of the encoding might be provided as configuration data and be provided to the second user device 110b. In particular, in some embodiments, the encoder module 200 is configured to perform (optional) step S110:

S110: The encoder module 200 provides, over the data network 170, the configuration data towards the second user device 110b.

Further, also parameters relating to the segmentation might be provided to the second user device 110b as configuration data. In particular, in some embodiments, the configuration data (as provided in S110) further comprises parameters of the segmenting. Reference is now made to Fig. 4 illustrating a method for decoding an encoded 3D representation of a user as performed by the decoder module 300 according to an embodiment.

As disclosed above, the encoder module 200 provides, over a data network 170 and towards the decoder module 300, the encoded 3D representation. It is assumed that the decoder module obtains the encoded 3D representation.

S202: The decoder module 300 obtains, over the data network 170 and from the encoder module 200, the encoded 3D representation of a first user 140a. Properties of the data network 170 have been disclosed above. As disclosed above, the encoded 3D representation is associated with a voice activity indicator dependent encoding level indicative of per time unit whether the first user 140a is engaged in vocal communication or not.

The decoder module 300 then performs the corresponding decoding of the encoded 3D representation. S206: The decoder module 300 decodes the encoded 3D representation per time unit according to the encoding level per time unit into a decoded 3D representation of the first user 140a.

S214: The decoder module 300 provides the decoded 3D representation towards a second user device 110b.

The thus decoded 3D representation of the first user 140a can thereby be provided towards a user interface for display to another user (as interacting with the second user device 110b).

Embodiments relating to further details of decoding an encoded 3D representation of a user as performed by the decoder module 300 will now be disclosed.

As disclosed above, in some non-limiting examples, the 3D representation is any of: a point cloud, a polygon mesh, a triangulated mesh.

As disclosed above, in some aspects, the 3D representation is encoded with high quality when the voice activity indicator indicates that the first user 140a is speaking or in other ways engaged in vocal communication. The same principles apply to the decoding. Hence, in some embodiments, for at least part of the encoded 3D representation, the encoded 3D representation is decoded with a higher quality when the encoding level is indicative of that the first user 140a is engaged in vocal communication than when the encoding level is indicative of that the first user 140a is not engaged in vocal communication.

As disclosed above, in some embodiments, the encoder module 200 provides, over the data network 170, the configuration data towards the second user device 110b. Hence, in some embodiments, the decoder module 300 is configured to perform (optional) step S204: S204: The decoder module 300 obtains, over the data network 170 and from the encoder module 200, configuration data comprising the encoding level.

As disclosed above, in some aspects, segmentation of the 3D representation is performed before the 3D representation is encoded. That is, in some embodiments, the encoded 3D representation has been encoded into segments of the encoded 3D representation after having been segmented according to a voice activity indicator dependent segmentation level. Decoding the 3D representation is then performed per segment of the encoded 3D representation, resulting in decoded segments of the 3D representation. The decoder module 300 is then configured to perform (optional) step S208:

S208: The decoder module 300 combines the decoded segments of the 3D representation after having decoded all segments of the encoded 3D representation.

As further disclosed above, also parameters relating to the segmentation might be provided to the second user device 110b as configuration data. In particular, in some embodiments, the configuration data (as obtained in S204) further comprises the segmentation level.

As further disclosed above, in some aspects, segmentation of the 3D representation is performed after the 3D representation has been decoded. In particular, in some embodiments, the decoder module 300 is configured to perform (optional) step S210: S210: The decoder module 300 segments the decoded 3D representation per time unit into decoded segments according to a segmentation level per time unit being a function of the encoding level per time unit.

Similar principles as disclosed above for the segmentation when performed before the 3D representation is encoded can also be applied for the segmentation when performed after the 3D representation has been decoded. In this respect, segmentation as applied after the 3D representation has been decoded enables the post-processing to be selectively applied to different parts (as defined by the segments) of the 3D representation. In particular, in some embodiments, for at least part of the decoded 3D representation, the decoded 3D representation is segmented into a higher number of segments when the segmentation level is indicative of that the first user 140a is not engaged in vocal communication than when the segmentation level is indicative of that the first user 140a is engaged in vocal communication. This, since post-processing will then be performed. Particularly, once segmentation has been performed, post -processing can be selectively performed on at least some of the decoded segments. S212: The decoder module 300 post-processes at least some of the decoded segments before providing the decoded 3D representation towards the second user device 110b.

There could be different ways in which the at least some of the decoded segments are post-processed. In some embodiments, the post-processing comprises replacing at least some of the decoded segments with content from a 3D model. In some examples, low-quality segments are thereby replaced with a high-quality 3D model (e.g. avatar parts) around regions of interest, for examples for segments of the 3D representation representing body parts related to speech articulation. This high- quality 3D model can be optimized by the decoder module 300 and does not need to be transferred from the encoder module 200 over the data network 170.

A first particular embodiment for encoding a 3D representation of a user and for decoding the encoded 3D representation of the user based on at least some of the above disclosed embodiments will now be disclosed in detail with reference to the signalling diagram of Fig. 5. In this embodiment, segmentation (as in step S104) and encoding (as in step S106) is performed at a first user device 110a, which thus implements at least part of the functionality of the encoder module 200. Decoding (as in step S206) and combining (as in step S208) is performed at an edge cloud computation service 180b, which thus implements at least part of the functionality of the decoder module 300. S301: A 3D representation, in terms of a point cloud, of a first user 140a is obtained at a first user device 110a from a 3D camera 130a.

S302: Image and/or audio frames as time-synchronized with the 3D representation are obtained at the first user device 110a from the 3D camera 130a.

S303: The image and/or audio frames are input to a voice activity detector at the first user device 110a to provide a voice activity indicator.

S304: Which encoding level to encode the 3D representation and which segmentation level to segment the 3D representation are at the first user device 110a determined as function of the voice activity indicator. S305: Configuration data specifying parameters of the encoding and parameters of the segmenting is provided from the first user device 110a over a data network 170 to an edge cloud computation service 180b.

S306: The 3D representation is segmented and then encoded at the first user device 110a according to the determined segmentation level and encoding level. The 3D representation is segmented into a higher number of segments when the voice activity indicator indicates that the first user 140a is engaged in vocal communication than when the voice activity indicator indicates that the first user 140a is not engaged in vocal communication. For at least part of the 3D representation, the 3D representation is encoded with a higher quality when the voice activity indicator indicates that the first user 140a is engaged in vocal communication than when the voice activity indicator indicates that the first user 140a is not engaged in vocal communication.

S307: The thus segmented and encoded 3D representation is provided, as a compressed point cloud, from the first user device 110a over the data network 170 to the edge cloud computation service 180b.

S308: The segmented and encoded 3D representation is decoded at the edge cloud computation service 180b. Decoded segments of the 3D representation are combined after all segments of the encoded 3D representation have been decoded. S309: Possible post-processing of the thus decoded 3D representation is performed at the edge cloud computation service 180b.

S310: The decoded (and possibly post-processed) 3D representation is provided, as a point cloud, from the edge cloud computation service 180b to a second user device 110b. S311: The second user device 110b provides the decoded 3D representation for rendering at a second user 140b interface for display of the 3D representation to a second user 140b.

A second particular embodiment for encoding a three-dimensional representation of a user and for decoding the encoded three-dimensional representation of the user based on at least some of the above disclosed embodiments will now be disclosed in detail with reference to the signalling diagram of Fig. 6. In this embodiment, segmentation (as in step S104) and encoding (as in step S106) is performed at a centralized cloud computation service 190, which thus implements at least part of the functionality of the encoder module 200. Decoding (as in step S206) and combining (as in step S208) is performed at an edge cloud computation service 180b, which thus implements at least part of the functionality of the decoder module 300.

S401: A 3D representation, in terms of a point cloud, of a first user 140a is obtained at a first user device 110a from a 3D camera 130a.

S402: Image and/or audio frames as time-synchronized with the 3D representation are obtained at a centralized cloud computation service 190 from the 3D camera 130a.

S403: The image and/or audio frames are input to a voice activity detector at the centralized cloud computation service 190 to provide a voice activity indicator.

S404: Which encoding level to encode the 3D representation and which segmentation level to segment the 3D representation are at the centralized cloud computation service 190 determined as function of the voice activity indicator.

S405: Configuration data specifying parameters of the encoding and parameters of the segmenting is provided from the centralized cloud computation service 190 over a data network 170 to the first user device 110a and an edge cloud computation service 180b.

S406: The 3D representation is segmented and then encoded at the first user device 110a according to the determined segmentation level and encoding level. The 3D representation is segmented into a higher number of segments when the voice activity indicator indicates that the first user 140a is engaged in vocal communication than when the voice activity indicator indicates that the first user 140a is not engaged in vocal communication. For at least part of the 3D representation, the 3D representation is encoded with a higher quality when the voice activity indicator indicates that the first user 140a is engaged in vocal communication than when the voice activity indicator indicates that the first user 140a is not engaged in vocal communication. S407: The thus segmented and encoded 3D representation is provided, as a compressed point cloud, from the first user device 110a over the data network 170 to the edge cloud computation service 180b.

S408: The segmented and encoded 3D representation is decoded at the edge cloud computation service 180b. Decoded segments of the 3D representation are combined after all segments of the encoded 3D representation have been decoded.

S409: Possible post-processing of the thus decoded 3D representation is performed at the edge cloud computation service 180b.

S410: The decoded (and possibly post-processed) 3D representation is provided, as a point cloud, from the edge cloud computation service 180b to a second user device 110b.

S411: The second user device 110b provides the decoded 3D representation for rendering at a second user 140b interface for display of the 3D representation to a second user 140b. A third particular embodiment for encoding a three-dimensional representation of a user and for decoding the encoded three-dimensional representation of the user based on at least some of the above disclosed embodiments will now be disclosed in detail with reference to the signalling diagram of Fig. 7.

In this embodiment, encoding (as in step S106) is performed at a first user device 110a, which thus implements at least part of the functionality of the encoder module

200. Decoding (as in step S206), segmenting (as in step S210) and post-processing (as in step S212) is performed at an edge cloud computation service 180b, which thus implements at least part of the functionality of the decoder module 300.

S501: A 3D representation, in terms of a point cloud, of a first user 140a is obtained at a first user device 110a from a 3D camera 130a.

S502: Image and/or audio frames as time-synchronized with the 3D representation are obtained at a centralized cloud computation service 190 from the 3D camera 130a. S503: The image and/or audio frames are input to a voice activity detector at the centralized cloud computation service 190 to provide a voice activity indicator.

S504: The voice activity indicator is provided from the centralized cloud computation service 190 over a data network 170 to the first user device 110a and an edge cloud computation service 180b.

S505: Which encoding level to encode the 3D representation is at the first user device 110a determined as function of the voice activity indicator.

S506: The 3D representation is encoded at the first user device 110a according to the determined segmentation level and encoding level. For at least part of the 3D representation, the 3D representation is encoded with a higher quality when the voice activity indicator indicates that the first user 140a is engaged in vocal communication than when the voice activity indicator indicates that the first user 140a is not engaged in vocal communication.

S507: The thus encoded 3D representation is provided, as a compressed point cloud, from the first user device 110a over the data network 170 to the edge cloud computation service 180b.

S508: The encoded 3D representation is decoded at the edge cloud computation service 180b.

S509: Which segmentation level to segment the 3D representation are at the edge cloud computation service 180b determined as function of the voice activity indicator.

S510: The decoded 3D representation is segmented into decoded segments at the edge cloud computation service 180b according to the determined segmentation level. Post-processing of at least some of the decoded segments is performed at the edge cloud computation service 180b. The post-processing comprises replacing at least some of the decoded segments with content from a 3D model. Low-quality segments are thereby replaced with a high-quality 3D model (e.g. avatar parts) around regions of interest, for examples for segments of the 3D representation representing body parts (e.g., the lips or mouth) related to speech articulation. S511: The decoded and post-processed 3D representation is provided, as a point cloud, from the edge cloud computation service 180b to a second user device 110b.

S512: The second user device 110b provides the decoded 3D representation for rendering at a second user 140b interface for display of the 3D representation to a second user 140b.

Fig. 8 schematically illustrates, in terms of a number of functional units, the components of an encoder module 200 according to an embodiment. Processing circuitry 210 is provided using any combination of one or more of a suitable central processing unit (CPU), multiprocessor, microcontroller, digital signal processor (DSP), etc., capable of executing software instructions stored in a computer program product 1210a (as in Fig. 12), e.g. in the form of a storage medium 230. The processing circuitry 210 may further be provided as at least one application specific integrated circuit (ASIC), or field programmable gate array (FPGA).

Particularly, the processing circuitry 210 is configured to cause the encoder module 200 to perform a set of operations, or steps, as disclosed above. For example, the storage medium 230 may store the set of operations, and the processing circuitry 210 may be configured to retrieve the set of operations from the storage medium 230 to cause the encoder module 200 to perform the set of operations. The set of operations maybe provided as a set of executable instructions. Thus the processing circuitry 210 is thereby arranged to execute methods as herein disclosed.

The storage medium 230 may also comprise persistent storage, which, for example, can be any single one or combination of magnetic memory, optical memory, solid state memory or even remotely mounted memory.

The encoder module 200 may further comprise a communications interface 220 for communications with other entities, functions, nodes, modules, and devices, such as schematically illustrated in Figs. 1 and Fig. 2. As such the communications interface 220 may comprise one or more transmitters and receivers, comprising analogue and digital components.

The processing circuitry 210 controls the general operation of the encoder module 200 e.g. by sending data and control signals to the communications interface 220 and the storage medium 230, by receiving data and reports from the communications interface 220, and by retrieving data and instructions from the storage medium 230. Other components, as well as the related functionality, of the encoder module 200 are omitted in order not to obscure the concepts presented herein. Fig. 9 schematically illustrates, in terms of a number of functional modules, the components of an encoder module 200 according to an embodiment. The encoder module 200 of Fig. 9 comprises a number of functional modules; an obtain module 210a configured to perform step S102, an encode module 210c configured to perform step S106, and a provide module 2iod configured to perform step S108. The encoder module 200 of Fig. 9 may further comprise a number of optional functional modules, such as any of a segment module 210b configured to perform step S104 and a provide module 2ioe configured to perform step S110. In general terms, each functional module 210a: 2ioe maybe implemented in hardware or in software. Preferably, one or more or all functional modules 2ioa:2ioe maybe implemented by the processing circuitry 210, possibly in cooperation with the communications interface 220 and/ or the storage medium 230. The processing circuitry 210 may thus be arranged to from the storage medium 230 fetch instructions as provided by a functional module 2ioa:2ioe and to execute these instructions, thereby performing any steps of the encoder module 200 as disclosed herein. Fig. 10 schematically illustrates, in terms of a number of functional units, the components of a decoder module 300 according to an embodiment. Processing circuitry 310 is provided using any combination of one or more of a suitable central processing unit (CPU), multiprocessor, microcontroller, digital signal processor (DSP), etc., capable of executing software instructions stored in a computer program product 1210b (as in Fig. 12), e.g. in the form of a storage medium 330. The processing circuitry 310 may further be provided as at least one application specific integrated circuit (ASIC), or field programmable gate array (FPGA).

Particularly, the processing circuitry 310 is configured to cause the decoder module 300 to perform a set of operations, or steps, as disclosed above. For example, the storage medium 330 may store the set of operations, and the processing circuitry 310 maybe configured to retrieve the set of operations from the storage medium 330 to cause the decoder module 300 to perform the set of operations. The set of operations maybe provided as a set of executable instructions. Thus the processing circuitry 310 is thereby arranged to execute methods as herein disclosed.

The storage medium 330 may also comprise persistent storage, which, for example, can be any single one or combination of magnetic memory, optical memory, solid state memory or even remotely mounted memory.

The decoder module 300 may further comprise a communications interface 320 for communications with other entities, functions, nodes, modules, and devices, such as schematically illustrated in Figs. 1 and Fig. 2. As such the communications interface 320 may comprise one or more transmitters and receivers, comprising analogue and digital components.

The processing circuitry 310 controls the general operation of the decoder module 300 e.g. by sending data and control signals to the communications interface 320 and the storage medium 330, by receiving data and reports from the communications interface 320, and by retrieving data and instructions from the storage medium 330. Other components, as well as the related functionality, of the decoder module 300 are omitted in order not to obscure the concepts presented herein.

Fig. 11 schematically illustrates, in terms of a number of functional modules, the components of a decoder module 300 according to an embodiment. The decoder module 300 of Fig. 11 comprises a number of functional modules; an obtain module 310a configured to perform step S202, a decode module 310c configured to perform step S206, and a provide module 3iog configured to perform step S214. The decoder module 300 of Fig. 11 may further comprise a number of optional functional modules, such as any of an obtain module 310b configured to perform step S204, a combine module 3iod configured to perform step S208, a segment module 3ioe configured to perform step S210, and a process module 3iof configured to perform step S212. In general terms, each functional module 3ioa:3iog maybe implemented in hardware or in software. Preferably, one or more or all functional modules 3ioa:3iog may be implemented by the processing circuitry 310, possibly in cooperation with the communications interface 320 and/or the storage medium 330. The processing circuitry 310 may thus be arranged to from the storage medium 330 fetch instructions as provided by a functional module 3ioa:3iog and to execute these instructions, thereby performing any steps of the decoder module 300 as disclosed herein.

The encoder module 200 and/or the decoder module 300 may each be provided as a standalone device or as a part of at least one further device. Further, functionality of the encoder module 200 and/ or the decoder module 300 may be distributed between at least two devices, or nodes. Thus, a first portion of the instructions performed by the encoder module 200 and/or the decoder module 300 maybe executed in a first device, and a second portion of the instructions performed by the encoder module 200 and/or the decoder module 300 maybe executed in a second device; the herein disclosed embodiments are not limited to any particular number of devices on which the instructions performed by the encoder module 200 and/or the decoder module 300 maybe executed. For example, at least part of the functionality of the encoder module 200 and/or the decoder module 300 maybe implemented in the centralized cloud computation service 190, and/or at least part of the functionality of the encoder module 200 and/ or the decoder module 300 may be implemented in each of the edge cloud computation services i8oa:i8oc. Hence, the methods according to the herein disclosed embodiments are suitable to be performed by an encoder module 200 and/or the decoder module 300 residing in a cloud computational environment. Therefore, although a single processing circuitry 210, 310 is illustrated in Figs. 8 and 10 the processing circuitry 210, 310 may be distributed among a plurality of devices, or nodes. The same applies to the functional modules 2ioa:2ioe, 3ioa:3iog of Figs. 9 and 11 and the computer programs 1220a, 1220b of Fig. 12.

Fig. 12 shows one example of a computer program product 1210a, 1210b comprising computer readable means 1230. On this computer readable means 1230, a computer program 1220a can be stored, which computer program 1220a can cause the processing circuitry 210 and thereto operatively coupled entities and devices, such as the communications interface 220 and the storage medium 230, to execute methods according to embodiments described herein. The computer program 1220a and/or computer program product 1210a may thus provide means for performing any steps of the encoder module 200 as herein disclosed. On this computer readable means 1230, a computer program 1220b can be stored, which computer program 1220b can cause the processing circuitry 310 and thereto operatively coupled entities and devices, such as the communications interface 320 and the storage medium 330, to execute methods according to embodiments described herein. The computer program 1220b and/or computer program product 1210b may thus provide means for performing any steps of the decoder module 300 as herein disclosed.

In the example of Fig. 12, the computer program product 1210a, 1210b is illustrated as an optical disc, such as a CD (compact disc) or a DVD (digital versatile disc) or a Blu-Ray disc. The computer program product 1210a, 1210b could also be embodied as a memory, such as a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or an electrically erasable programmable read-only memory (EEPROM) and more particularly as a non-volatile storage medium of a device in an external memory such as a USB (Universal Serial Bus) memory or a Flash memory, such as a compact Flash memory. Thus, while the computer program 1220a, 1220b is here schematically shown as a track on the depicted optical disk, the computer program 1220a, 1220b can be stored in any way which is suitable for the computer program product 1210a, 1210b. The inventive concept has mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, other embodiments than the ones disclosed above are equally possible within the scope of the inventive concept, as defined by the appended patent claims.