Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND DEVICE FOR DISPLAYING VISUAL CONTENT AND FOR PROCESSING AN IMAGE
Document Type and Number:
WIPO Patent Application WO/2019/034556
Kind Code:
A1
Abstract:
A method or device for displaying visual content to provide part of an image with other devices, the method comprising receiving location data representative of the display location of at least one user device within a zone; determining, responsive to the received location data by the at least one user device, visual content for display by each user device; and transmitting to the user device(s) data representative of the visual content for display.

Inventors:
GUILLOTEL PHILIPPE (FR)
FLEUREAU JULIEN (FR)
BENCSIK ANDREI (FR)
ALLIE VALERIE (FR)
Application Number:
PCT/EP2018/071760
Publication Date:
February 21, 2019
Filing Date:
August 10, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INTERDIGITAL CE PATENT HOLDINGS (FR)
International Classes:
G06T7/11; G06F3/14; G06T3/00
Foreign References:
US20160306602A12016-10-20
US20130109364A12013-05-02
Other References:
TAKAOKA T ED - BOGOMOLOV SERGIY ET AL: "Efficient Algorithms for the Maximum Subarray Problem by Distance Matrix Multiplication", ELECTRONIC NOTES IN THEORETICAL COMPUTER SCI, ELSEVIER, AMSTERDAM, NL, vol. 61, 1 January 2002 (2002-01-01), pages 1 - 10, XP004847027, ISSN: 1571-0661
Attorney, Agent or Firm:
HUCHET, Anne et al. (FR)
Download PDF:
Claims:
CLAIMS

1. A method of providing visual data for display by a user device within a zone to form an image with other devices within the zone, the method comprising:

receiving location data representative of the display location of at least one user device within a zone;

determining, responsive to the received location data by the at least one user device, visual content for display by each user device;

transmitting to the user device(s) data representative of the visual content for display.

2. A device for providing visual data for display by a user device within a zone to form an image, the device comprising memory and at least one processor configured to

receive location data representative of the display location of at least one user device; determine, responsive to the received location data by the at least one user device, visual content for display by each user device; and

transmit to the user device(s) data representative of the visual content for display.

3. A method according to claim 1 or a device according to claim 2, wherein the location data is representative of a position in a row of a set of rows, the method comprising allocating to each position of each row a value representative of whether or not the position is occupied by a user receiving visual content for display.

4. A method according to claim 3 or a device according to claim 3, wherein a border of the image is determined from the allocated rows of values, wherein an edge comprises a position at the end of a set of consecutive positions providing a maximum sum of values.

5. A method or device according to claim 3 or 4, wherein a border of the image is determined from the allocated columns of values, wherein an edge comprises a position at the end of a set of consecutive positions in a column providing a maximum sum of values.

6. A method or device according to any one of claims 3 to 5, wherein for each row a set of columns is determined corresponding to the determined borders for the respective row.

7. A method or device according to any of claims 3 to 6, wherein the maximum sum is determined using Kadane's algorithm.

8. A method or device according to any one of claims 3 to 7, further comprising:

detecting a pair of borders corresponding to the extreme occupied positions in a row.

9. A method or device according to any one of claims 3 to 8, further comprising:

capturing an image of a scene of visual content displayed by a plurality of user devices; and

obtaining a geometrical transformation based on at least one of a model of the captured scene, the location of a camera capturing the scene and the location of the plurality of user devices.

10. A method according to claim 9 comprising, or a device according to claim 9 configured for:

performing projective mapping to obtain the desired shape of the image when captured from a viewpoint.

1 1. A method or device according to claim 12, comprising:

warping the captured image and reshaping according to the desired image shape.

12. A method according to any one of claims 1 and 3 to 1 1 , or the device of any one of claims 2 to 1 1 , wherein the location of the at least one user device is detected as a reference location and the location of other user devices is determined with respect to the location of the at least one user device.

13. A method according to any one of claims 1 and 3 to 12, or the device of any one of claims 2 to 12, further comprising:

capturing an image of a scene of visual content displayed by a plurality of user devices; and

obtaining a geometrical transformation based on at least one of a model of the captured scene, the location of a camera capturing the scene and the location of the plurality of user devices.

14. A method or device according to claim 13, comprising:

performing projective mapping to obtain the desired shape of the image when captured from a viewpoint.

15. A method or device according to claim 14, comprising:

warping the captured image and reshaping according to the desired image shape.

16. A method or device according to any one of claims 13 to 15, comprising:

obtaining a gamut transformation to align the differences of colors and light displayed by the user devices within the scene.

17. A method or device according to any one of claims 13 to 16, comprising:

deforming the image according to the changes in the scene.

18. A method or device according to any one of claims 13 to 17, comprising:

mapping detected pixels of displayed visual content to known positions of user devices in the scene.

19. A method or device according to any one of claims 13 to 18, comprising:

remapping detected pixels of displayed visual content to known positions of user devices in the scene when the camera capturing the scene is moved to a different viewpoint.

20. A computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to any one of claims 1 and 3 to 19 when loaded into and executed by the programmable apparatus.

Description:
METHOD AND DEVICE FOR DISPLAYING VISUAL CONTENT AND FOR PROCESSING

AN IMAGE

TECHNICAL FIELD

The present disclosure relates to a method and device for displaying visual content to provide part of an image. The present disclosure further relates to a method and device for providing visual data for display by a user device to form an image according to a processed image. BACKGROUND

Choreographed images can be generated for example at a sports stadium or the like by persons located at particular positions displaying a color, for example, to form an image, text or any other form of display. A problem with such generated images is that they are static and cannot be easily modified.

Embodiments of the present disclosure have been devised with the foregoing in mind.

SUMMARY

In a general form the disclosure concerns a method or a device configured for displaying visual content to form part of an image based on the location of the user device within a zone. An aspect of the disclosure concerns rendering an image based on the location of a plurality of user devices with respect to one another within a geographical zone.

According to a first aspect of the present disclosure there is provided a method of providing visual data for display by a user device within a zone to form an image with other devices within the zone, the method comprising: receiving location data representative of the display location of at least one user device within a zone; determining, responsive to the received location data by the at least one user device, visual content for display by each user device; transmitting to the user device(s) data representative of the visual content for display.

Another aspect provides a device for providing visual data for display by a user device within a zone to form an image, the device comprising memory and at least one processor configured to receive location data representative of the display location of at least one user device; determine, responsive to the received location data by the at least one user device, visual content for display by each user device; and transmit to the user device(s) data representative of the visual content for display. In an embodiment, the location data is representative of a position in a row of a set of rows, the method comprising allocating to each position of each row a value representative of whether or not the position is occupied by a user receiving visual content for display.

In an embodiment, a border of the image is determined from the allocated rows of values, wherein an edge comprises a position at the end of a set of consecutive positions providing a maximum sum of values.

In an embodiment a border of the image is determined from the allocated columns of values, wherein an edge comprises a position at the end of a set of consecutive positions in a column providing a maximum sum of values.

In an embodiment, for each row a set of columns is determined corresponding to the determined borders for the respective row.

In an embodiment, the maximum sum is determined using Kadane's algorithm.

In an embodiment, the method includes or the device is configured for detecting a pair of borders corresponding to the extreme occupied positions in a row.

In an embodiment, the method includes or the device is configured for capturing an image of visual content displayed by a plurality of user devices; and obtaining a geometrical transformation based on at least one of a model of the captured scene, the location of a camera capturing the scene and the location of the plurality of user devices.

In an embodiment, the method includes or the device is configured for performing projective mapping to obtain the desired shape of the image when captured from a viewpoint.

In an embodiment, the method includes or device is configured for warping the captured image and reshaping according to the desired image shape.

In an embodiment, the method or the device wherein the location of the at least one user device is detected as a reference location and the location of other user devices is determined with respect to the location of the at least one user device.

In an embodiment, the method includes or the device is configured for capturing an image of a scene of visual content displayed by a plurality of user devices; and obtaining a geometrical transformation based on at least one of a model of the captured scene, the location of a camera capturing the scene and the location of the plurality of user devices.

In an embodiment, the method includes or the device is configured for performing projective mapping to obtain the desired shape of the image when captured from a viewpoint.

In an embodiment, the method includes or the device is configured for warping the captured image and reshaping according to the desired image shape. In an embodiment, the method includes or the device is configured for obtaining a gamut transformation to align the differences of colors and light displayed by the user devices within the scene.

In an embodiment, the method includes or the device is configured for deforming the image according to the changes in the scene.

In an embodiment, the method includes or the device is configured for mapping detected pixels of displayed visual content to known positions of user devices in the scene.

In an embodiment, the method includes or the device is configured for remapping detected pixels of displayed visual content to known positions of user devices in the scene when the camera capturing the scene is moved to a different viewpoint.

Some processes implemented by elements of the invention may be computer implemented. Accordingly, such elements may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, microcode, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system'. Furthermore, such elements may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.

Since elements of the present invention can be implemented in software, the present invention can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid-state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which:

FIG. 1 is a schematic block diagram of system architecture in which one or more embodiments may be implemented;

FIG. 2 is a communication diagram illustrating steps of a method to provide visual content in accordance with an embodiment; FIG. 3 is a communication diagram illustrating steps of a method implemented by a manager device in accordance with an embodiment;

FIG. 4 schematically illustrates a sub-array representative of place occupation in accordance with an embodiment;

FIG. 5 graphically illustrates an example of occupied place layout and corresponding image in accordance with an embodiment

FIG. 6A is an example of an image border and a user border in accordance with an embodiment;

FIG. 6B is an example of two image/user borders in accordance with an embodiment; FIGS. 7-8 are examples of output images in accordance with an embodiment;

FIG. 9 is an image of a stadium in which embodiments may be implemented;

FIG. 10A illustrate output images before and after correction in accordance with an embodiment;

FIG. 1 1 is a communication diagram illustrating steps of a method in accordance with embodiments;

FIG. 12A is an output image before correction in accordance with an embodiment

FIG. 12B is an output image after correction in accordance with an embodiment;

FIG. 13 is an example of modules of a device according to an embodiment;

FIG. 14 is an example of displayed reference images to detect devices;

FIG. 15 is a communication diagram illustrating steps of a method of locating user devices in accordance with an embodiment;

FIG. 16 is a communication diagram illustrating steps of a method in accordance with an embodiment; and

FIG. 17 is an example of an output image displayed by two devices.

DETAILED DESCRIPTION

Figure 1 is a schematic block diagram of a system architecture in which one or more embodiments of the disclosure may be implemented.

The system of Figure 1 comprises a server 100 connected by a network, for example, a wireless network 50, to a plurality of mobile user devices 200 and a manager device 300.

The server 100 in the present embodiment comprises a web server that includes or has access to a database to store data. The stored data may include, for example identification information concerning the user devices 200. Some stored data may include data representative of credentials of a manager 300, managing the image to be displayed by the mobile user devices 200. In some embodiments, the stored data may include visual content for transmission to the user devices 200. The server 100 also includes one or more processors for performing processes to provide visual content to the user devices 200, for display of an output image. Processes may be performed in response to actions made by the User devices 200 and/or Manager devices 300. Such processes may include determining the visual content, for example, video or images to be displayed at time t by the user devices 200. In some embodiments, the server 100 uses UDP to stream pixel colors to user terminals for display. A platform implementing a language such as Golang may be used.

The Manager device 300 is an entity that can determine the output image to be displayed by a group of user devices in a given zone. Fixed or mobile terminals may be used to implement the functions of a manager device. In some embodiments the Manager device 300 may use a smart phone application to sign up, login and select content, then upload it. Once an output image for display is selected and up-loaded, the Manager device 300 can adjust certain parameters that affect the output image or upload new visual content. As mentioned, data identifying mangers can be stored in the database of the server 100.

It will be appreciated that in some embodiments of the disclosure, the manager device may not be provided as a separate entity to the server transmitting the visual content. In some embodiments, the manager device may be part of the server system.

User devices 200 are configured to display visual content to form an overall output image with the other User devices 200 in the zone. In some embodiments, the visual content forming the output image is determined by the manager device. The visual content may be any displayable visual content such as an image, text, colors, etc. The visual content displayed by a User device 200 may represent one or more pixels of an overall output image to be viewed. The user device may be a mobile terminal device with a display, such as, a smartphone or tablet. To implement the process the user can log into an application such as an application on a smart phone device. The user may choose a manager, for example, via http, from a list, and start receiving a stream of content from the server, for example, color data, representing the part of the image the particular user device is being used to display. Users can change Managers at will. For example, a manager may be associated with supporters of a sports team that may be at a sports stadium and the user devices of the supporters at the stadium can be used to generate a given overall image. In other embodiments, the user device may connect directly to the server 100 without the intermediary of the manager device. In order to generate the overall desired output image from the user devices, factors such as viewing distance and lighting may be taken into account.

Figure 2 illustrates steps of a method of providing visual content to form an output image using a plurality of user display devices in accordance with an embodiment of the invention.

In an initial step, a user device 200 provides location data to the server 100 identifying the location of the user device 200, in a given viewing zone. In this particular embodiment, the protocol applied is based on the following types of information provided by the user: seat number, row number, and in some cases Manager ID. The seat number and row number provides location information, if the user is in a stadium type environment, for example. This information may also be available automatically through other means (ticket, tag scan, calibration phase using AROCO tags...). In response to this information visual data is sent from the server to the user terminal 200 for display by the user terminal, for example color data.

The user device 200 can display the received visual content after reception, and while no further information is received from the server 100, or from the manager 300.

In more detail with reference to Figure 2, at step S201 the user device 200 provides location information to the server 100. The location information is input through an application on the user device 200 and transmitted to the server 100, at step S202.

The transmitted location information, in this example for a user in a sport stadium, includes seat number, row number and manager id. It will be appreciated that in some embodiments the manger id is not needed and in other embodiments any type of location information can be sent to the server to enable the server to identify the location of the user device 200 in the given zone. Based on the received location information from the user device 200, the server transmits visual content, in this example color data, to the user device 200 for display by the user device 200 at the identified location in the stadium. The initial color data may be followed by further color data, e.g. by streaming color data in step S204.

Figure 3 illustrates steps of a method implemented by a manager device in accordance with embodiments.

The protocol in this embodiment is based on information to identify the manager, for example: a username and password input by the manager.

An application provides a selection of available output images to be displayed by user devices in a viewing zone during an event. The output images may be previously created beforehand. The manager may select the visual content for display. Once selected, the server 100 processes the selected image accordingly and defines which color should be sent to each user device 200 and sends it accordingly to each user device.

In detail, with reference to Figure 3, in step S301 the manager accesses an application for implementing the method. The application may be provided on a mobile user device or any other type of communication data device. Access may be made by using a username and password for authentication. In step S302, an HTTP call is made for selection of visual content for display of the output image. In step S303, the request is dispatched to a database which returns the requested visual content to the server 100. It will be appreciated that the database may be remote or local to the server 100.

It will be appreciated that the visual content for selection may include an image, a video, color formed by an image or video, a sound, a haptic signal (vibration for instance) or any combination of the above.

A task of the server is to compute an image for display by the user devices 200 depending on the positions of the user devices. Factors that may be taken into account are a sparse grid layout (for example stadium seat positioning), empty seats, or locations taken by users not participating in the image formation. A larger output image with some holes may be preferable to a smaller more perfect output image.

A cost or weighting is attributed to each place (e.g. an identifiable stadium seat) to form a usable image. The weighting attributed to a seat is a variable that a manager device can choose, or in some embodiments it may be determined by the server and may be based on how sensitive their current image is. For example, an image of a flag would be able to accept a sparser crowd, e.g., include more empty places (seats), than a textual visual content in the same scenario.

In this embodiment, a row (or column) in a stadium is represented as a 1 D array of integer values (a sub array), as illustrated in Figure 4. Each unoccupied/not participating place is represented as a negative value in the array (for example -1 ), and an occupied/participating place is represented as a positive value (for example 3).

Preferably the maximum numbers of consecutive participating places would be taken to provide the desired effect: an image that spans as large as possible, accepting some empty places here and there.

Thus, looking at how the places are modelled, the maximum sum of a contiguous subarray (consecutive places), would be preferred. Positive values, i.e. occupied-participating place, would add to the sum. Thus, the idea would be to take as many as possible, but an unoccupied or non-participating place would penalize the algorithm, so they would only be considered if they are surrounded by taken/participating places.

Various methods exist for getting the desired effect, but a simple and efficient solution - 0(n) - is provided by applying Kadane's algorithm.

The steps are shown in an example of simple pseudo-code: func Kadane:

max_so_far = 0

max_ending_here = 0

// loop for each element of the array Loop el in array:

max_ending_here = max_ending_here + el

if (max_ending_here < 0) // reset max_ending_here = 0

if (max_so_far < max_ending_here) // take new max max_so_far = max_ending_here return max_so_far The algorithm can be used to provide the start and end values of a sub-array of values, providing a maximum sum of values, i.e., identify the first and last position of the contiguous sub-array forming the maximum sum.

func Kadane_extended:

max_so_far=0 max_ending_here=0 start = 0

end = 0

s = 0 // s is a "barrier" Loop cnt in array:

if (max_so_far < max_ending_here) max_so_far = max_ending_here start = s

end = cnt

if (max_ending_here < 0) // move away from negative sums max_ending_here = 0 s = cnt + 1 // next position after the barrier return sum, start, end

The start and end values can be used to indicate borders of the overall image, i.e., where the output image starts and ends. The algorithm is extended to a 2D array of values representing whether places are occupied or unoccupied by participating users apt to display visual content.

The left and right columns are selected one by one and the maximum contiguous sum of rows for every left and right column pair is found. The top and bottom row numbers for every fixed left and right column pair are to be determined. To find the top and bottom row numbers, the sum of elements in every row from left to right is computed and these sums are stored in a temporary array - temp. Thus temp[i] has the sum of elements from left to right in row i. If Kadane's algorithm is applied on temp , to get the maximum sum subarray of temp, this maximum sum would be the maximum possible sum with left and right as boundary columns. To get the overall maximum sum, this sum is compared to the maximum sum so far. For brevity, just the important part of the code is listed:

// create an array to store the sum until that row tmp = array () // initialize number of rows slice for rightCol in range (leftCol, maxCols)

// take each row and compute the sum for i in range(rows)

tmp[i] += mat[i][rightCol]

// get the max subarray for the tmp using Kadane curSum, begin, end MaxSumSubArray(tmp)

// if we have a higher max sum if curSum > maxSum

maxSum = curSum rowBegin = begin rowEnd = end colBegin = leftCol colEnd = rightCol

Users adhering to managers can be kept track of by creating lists of user pointers in each Manager object. Positions of users (row/seat) may also be kept track of so that the image may be enlarged when more participating users arrive at places in the given zone. The idea is to keep a border that can uniquely position the image (rectangles may be used).

Two borders may be followed: 1 - for the users, referred to as User Border, and one for the image, referred to as Image Border.

Figure 5 graphically illustrates an example of an imaginary layout (filled in signifies a participating occupied place) and how an image, for example, the French national flag would be mapped to such places. The inner border (B1 ) corresponds to the Image Border and the outer border (B2) corresponds to a User Border. Tracking the image border helps to resize the image being shown. Thus, in Figure 5, if the algorithm finds a better configuration, the Image Border may be expanded. The surface formed by the Image Border is less or equal to that of the User Border.

The User Border can be used to determine if users associated with different Managers overlap. This may be detected by checking their borders. Then, the sending of visual content to those user devices that cause the overlap may be disabled, to reduce the risk of overlap. Figure 6A illustrates a user border and an image border in a stadium output image, and Figure 6B illustrates two groups of image user borders in stadium output images associated with two different managers

Feedback may be sent to the user (for example, haptic or sound) to alert the problem to the user.

For the client side, the application on the user device can be easy to setup and use.

The client does not need to know where the other participating user terminals are located and what they are doing, the only thing he needs to know is where he is and what image he wants to be part of.

After indicating their place and Manager, users may proceed to contribute to the formation of an output image. The mood of the image shown may be changed. For example, the display could start with a video, then maybe change to another one if the context changes. An image may be generated to congratulate a winning team or to encourage a losing one.

Figures 7 and 8 illustrate examples of resulting output images of the algorithm simulated in a stadium environment. The simulated image is from 1467 user devices having 98% of places being occupied/participating and viewed at a distance of 52m in a stadium of 1500 seats. The images are taken at night. In Figure 7 tablets are used as user devices. In Figure 8 smart phones are used as user devices. It will be appreciated that instead of one color, the information could be extended to an image or a video.

A further embodiment sets out to address problems relating to the shape of environments where the output images are being generated, such as, stadiums and the light levels. An organized grid may be easily modelized as a 2D array but problems may be generated due to space in between places and some misalignment on one of the axes (say the Z axis). With reference to the layout of Figure 9, it can be seen that as we climb along the rows the screens will be moved backwards and the image will be deformed (warped). Another issue is the angle from which the scene is captured for the output image. The view will be warped if the viewing point is not perfectly aligned in front of the image.

Another problem is the different colors that come with different user devices. Not all phones or tablets screens provide the same colors and brightness. The screens can have different color spaces and may or may not be calibrated. Brightness may be an issue when the user devices are used next to one another to display visual content.

For such an embodiment, a proposed solution is to compute i) a geometry transformation based on a 3D model of the scene being captured with a pre-defined position of the camera from which an image of the scene is being captured, and ii) a gamut transformation to align the different adjacent screen colors and light as much as possible. The process may involve warping the image and giving it a different shape to obtain the desired shape when viewed from a certain view point (angle and distance). This is called projective mapping. Moving the camera would determine deformations of the image. Consequently, if the position of the camera is known, inverse mapping can be used to account for that movement. To do this, rules may be set for the application and model.

On the server side, a model of a stadium created beforehand is used. A file can be created in the positions of the seats and the camera are indicated. For example: Seat 5A is at (X: 1 , Y: 5, Z: 13) and the camera is at (X: 25, Y: 10, Z: -52). Seats are unoccupied and which are occupied and participating can be known to indicate which pixels are ON and which are OFF by looping through a pixel matrix.

Using the given information, a camera may be used to take a snapshot of how the image is "rendered" at a given time. This render can be of a lower resolution than the main camera's (for example by using a second camera), but should be at least higher than the number of users in the grid. The latter constraint is very easy to surpass, since a VGA (640x480) camera could be used which would still provide more than enough pixels to fit hundreds of thousands of people. Using this render, each pixel can be inspected (we know where the pixels are projected).

Even without using the camera and a renderer, a given mapping may be created in this particular situation because everything is considered fixed and known (stadium, seats and even camera position for an initial prototype).

In a second step, the camera can be moved and the mapping dynamically updated accordingly. This would allow for a higher flexibility. The information required by the server for changing the mapping is the camera world position, everything else being fixed. The frequency of the update may be selected so as to reduce the processor usage. The information can be conveyed, for example, using a simple HTTP POST request handled by the server.

There may also be an option in the application for the Manager to choose whether or not he desires such a projection mapping to happen, depending on the content (some content might not be suitable for warping).

Figure 10A illustrates an example of an image before correction, and Figure 10B illustrates the image after correction.

In some embodiments, a calibration step may be included in which all active screens display a pattern (for example a checked board, aruco, etc..) that helps the server locate the beginning of the image (e.g., a very bright white image, or a checker board pattern). The projective mapping is part of a feedback loop that can adjust different parameters, as illustrated in Figure 1 1. This closed loop of steps S1 101 to S1 103 uses the camera and the fixed stadium parameters to change how the image looks in a dynamic fashion.

Of course, the image should change each time the camera moves. However, this would place a strain on the performance of the server. A decision can be made on the frequency of the update. There may also be a threshold for how much a camera must move, to determine an update.

Screens of mobile terminals are very diverse due to different technologies (LCD, LED) and various improvements over the years, glass fabrication processes, factory calibration (or lack thereof), maximum screen brightness, etc. There are so many variations that a single color can look very different on two different screens.

Figure 12A illustrates an image taken of a number of phones with display screens. Phones grouped (1 ,2 and 3) should show the same color. Looking at the captured image of the phones it can be seen that in group 1 , the two left-most phones have LED screens so they appear very similar in the captured image. The other phone of group 1 is an LCD and the difference is very notable.

Such an occurrence would make output images differ from the desired overall output image. A camera can take a snapshot of the current image (such as shown in Figure 12A) being displayed and using this information the screens may be corrected and mapped to a common gamut to address the problem. It is supposed that the approximate distance between seats is known. Since the layout of places is known a search area for a certain pixel can be found.

An image of the zone with the user devices displaying visual content e.g. color, can be captured and sent to the server. On the server side, the important image can be cropped out. A comparison of a desired image and the real image can then be performed on a pixel level. By comparing each pixel, it is known how much the color being sent should be adjusted.

The corrected image of Figure 12A is shown in Figure 12B, for example.

For further improvements, some information from the user's phone (panel type, for example) may be obtained. This would allow for an even more capable correction. Taking the example of the tablet, knowing it's size, its brightness could be adjusted to be a bit lower than an adjacent tablet, knowing its surface would compensate. Another possibility is to perform some pre-calibration off line. The transformations can be dynamically adapted using another camera capturing the rendered scene at a given position and comparing it to the original image. The applied deformation will be done so that the image is correctly displayed at the camera position.

The position of the multi-screens can be dynamic requiring the transformation to be dynamically adapted

Figure 13 is a schematic block diagram of a processing system 1300 for implementing one or more embodiments of the invention. The processing system may be implemented at the server 100, the user device 200 or the manager device 300.

The processing system comprises memory 1350, a memory controller (not shown) and processing circuitry 1340 comprising one or more processing units (CPU(s)) for processing data. The one or more processing units 1340 run various software programs and/or sets of instructions stored in the memory 1350 to perform various functions for the processing 1300 and to process data.

Other modules may be included such as an operating system module, for controlling general system tasks (e.g., power management, memory management) and for facilitating communication between the various hardware and software components of the processing system, and a communication module 1320 for controlling and managing communication with other devices via I/O interface ports or wireless communications by antenna.

In Figure 13, the illustrated modules correspond to functional modules, which may or may not correspond to distinguishable physical units. For example, a plurality of such modules may be associated in a unique component or circuit, or correspond to software functionalities. Moreover, a module may potentially be composed of separate physical entities or software functionalities.

A functional module may be implemented as a hardware circuit comprising for example VLSI circuits or gate arrays, discrete semiconductor devices such as logic chips, transistors, etc. A module may also be implemented in a programmable hardware device such as field programmable gate arrays, programmable array logic, programmable logic devices, or the like.

Modules may also be implemented in software for execution by various types of processors. A module or component of executable code may comprise one or more physical or logical blocks of computer instructions and may for instance, be organized as an object, procedure or function. Executables of a module may not necessarily be located together. Moreover, modules may also be implemented as a combination of software and one or more hardware devices. For example, a module may be embodied in the combination of a processor that operates a set of operational data. Moreover, a module may be implemented in the combination of an electronic signal communicate via transmission circuitry.

User devices 200 (FIG. 1 ) are also configured to render multimedia data together to form a coordinated multi-media output, for example, display visual content to from an overall output image. In some embodiments, the visual content is determined by the manager device. The visual content may be any displayable visual content such as an image, text, colors, etc. The visual content displayed by a user device may represent one or more pixels of an overall image to be viewed. In some embodiments, the multi-media content to be rendered comprises audio content or audio/video content.

The idea is to use a set of available screens to form a bigger screen, each screen being part of a bigger image. It is assumed that the screens can moved in 3D space and pose/orientation can be changed. However, distortion of the image/video as viewed from one position should be minimized. In embodiments, the position of the screens may be detected by a computer vision algorithm using a specific visual pattern (fiducial markers) to localize each screen dynamically. In embodiments, the method includes detecting position of user devices rendering the content. In some embodiments, the size of a screen is detected and in some embodiments the orientation/pose of the screen of the user device is determined.

To determine the orientation of a display device a process of pose estimation is applied.

The process of estimating an object's pose is used for many purposes, including robotics, augmented reality and computer vision. The idea behind this is finding the relationship between world coordinates and pixel coordinates (3D -> 2D) and the solution is to use fiducial markers to simplify the task. Using computer vision techniques and knowing the markers shape, it is possible to compute the pose of those markers. These markers are of different forms, for example square and or other unique shapes can be designed. The shape allows for some level of error correction because of the simplicity of the drawings (as illustrated in Figure 14). One example of a library that uses such markers is ArUco.

In a particular embodiment, a user of a user terminal can log into an application such as an application on a smart phone device, and may choose a manager. Users open the application and send a request to connect to an available Manager. In other embodiments, this step is not necessary in order to receive the multimedia content, for example, the multimedia content may be provided directly by the server.

Markers are displayed on each user device participating in the rendering of the overall multimedia content. These markers can be selected by the server for example, in response to a request. In some embodiments, the markers also uniquely identify each device in the scene. The manager device can be used to capture a photo of the current layout of the user devices. An image of the displayed markers is obtained and sent to the server 100 (Figure 1 ). In some embodiments, the image is processed directly by the manager device.

Processing of the image includes detecting the markers in the image, and in some embodiments computing the intrinsic and extrinsic parameters of the camera. This is an initial step to determine how many devices there are and where they are located. An example is illustrated in Figure 14. The layout illustrated in Figure 14 is just an example, and it will be appreciated that the devices do not require to be next to each other. For accurate results the camera is first calibrated. This is a one-time step and should be done using all devices that are going to participate.

Figure 15 illustrates steps of a method in accordance with an embodiment of the invention to provide pose estimation.

In step S1501 , an image of the devices displaying the markers is captured by the manager device and transmitted to the server for processing in step S1502. The server accesses an image processing algorithm in step S1503 to perform pose estimations. The pose estimation step S1504 comprises computing camera parameters and the positions of the user devices which are returned to the server in step S1505.

From the user side the user can connect to the server, and select a Manager for a certain content and, wait for the content to be streamed. Before receiving the visual content forming the image, the user device will display the marker for pose estimation and then start listening on a predefined port for receiving the relevant image.

Figure 17 illustrates an example, the French national flag being displayed on two devices adjacent one another.

Once the server receives the image containing the markers, a marker, for example, the biggest or most prominent marker, may be selected as a landmark (origin) and its pixel coordinates computed. If ArUco, is used for example, image coordinates of the four corners can be obtained. Then, the same information can be computed for the other markers in the scene. Using this information, the coordinates of all the screens can be obtained in the same coordinates space. It is thus possible to split the original content for display according to the screens position, orientation and size. A convex hull is obtained and can be adapted to any image. Once the image fitted the appropriate pixels are transmitted to the appropriate device.

Once the calibration and pose estimation are done, the Manager application may be stopped. The content of choice will be streamed and the users can enjoy a larger screen. At the same time, if there is a change in layout, the Manager device 300 may track it in realtime, film the scene, take another snapshot, etc... This would allow for free movement and could allow for interesting effects.

Another way to track the phone is to use its IMU (Integrated Motion Unit) or any other external tracking technology (optical, radio...).

An application may provide a selection of available images to be displayed by user devices at a location during an event. That information being previously created beforehand. The manager may select the visual content for display. Once selected the server application processes the image accordingly and defines which visual content should be sent to each user and sends it accordingly to each user device.

In detail, with reference to Figure 16, in step S1601 the manager accesses the application. The application may be provided on a mobile user device or any other type of communication data device. The user may sign in using a username and password for authentication. In step S1602, an HTTP call is made to select visual content for display. In S1603, the request is dispatched to a database which returns the requested visual content to the server 100 (Figure 1 ). It will be appreciated that the database may be remote or local to the server.

It will also be appreciated that the visual content for selection may include an image, a video, color formed by an image or video, a sound, a haptic signal (vibration for instance) or any combination of the above.

A task of the server is to compute an image for display by the user terminals depending on the positions of the users. Factors that may be taken into account are a sparse grid layout (for example, stadium seat positioning), empty seats, or locations taken by users not participating in the image formation. A larger image with some holes may be preferable to a smaller more perfect image.

After indicating their place and Manager, users are one click away from contributing to the formation of an image. The mood of the image shown may be changed. For example, the display could start with a video, then maybe change to another video, if the context changes. An image may be generated to congratulate a winning team, or to encourage a losing one.

The position of the multi-screens can be dynamic requiring the transformation to be dynamically adapted.

Although the present disclosure has been described hereinabove with reference to specific embodiments, the present disclosure is not limited to the specific embodiments, and modifications will be apparent to a skilled person in the art which lie within the scope of the present disclosure.

For instance, while the foregoing examples have been described with respect to visual content, it will be appreciated that embodiments may apply to other forms of multimedia content.

Moreover, while the examples in the present disclosure relate to a sports stadium environment, it will be appreciated that embodiments may be applied to other environments such as a concert, a theater, an arena sport of a show, a public live event and the like.

Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular, the different features from different embodiments may be interchanged, where appropriate.