Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A SYSTEM
Document Type and Number:
WIPO Patent Application WO/2021/191126
Kind Code:
A1
Abstract:
A system (300) comprising: an output device (302) configured to provide an audio and / or visual stimulation (114) to a user (304); and one or more biometric sensors (306) that are configured to provide biometric-signalling (308), which is representative of body measurements of the user (304) while they are exposed to the audio and / or visual stimulation (114). The system (300) further comprises a processor (310) configured to: process the biometric-signalling (308) in order to determine an interest-level-score (222); and provide a control-signal (312) to the output device (302) based on the interest-level- score (222), wherein the control-signal (312) is for adjusting the audio and / or visual stimulation (114) that is provided by the output device (302).

Inventors:
GHADIRZADEH ALI (SE)
BJÖRKMAN MÅRTEN (SE)
JENSFELT DANICA KRAGIC (SE)
Application Number:
PCT/EP2021/057214
Publication Date:
September 30, 2021
Filing Date:
March 22, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CROSEIR AB (SE)
International Classes:
G06F3/01; A61B5/24; G06F18/00; G06N3/02; G06V40/16
Foreign References:
US20140347265A12014-11-27
US20140223462A12014-08-07
US20110159467A12011-06-30
Other References:
YANG EUIJUNG ET AL: "The Emotional, Cognitive, Physiological, and Performance Effects of Variable Time Delay in Robotic Teleoperation", INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS, SPRINGER NETHERLANDS, DORDRECHT, vol. 9, no. 4, 8 May 2017 (2017-05-08), pages 491 - 508, XP036306876, ISSN: 1875-4791, [retrieved on 20170508], DOI: 10.1007/S12369-017-0407-X
CHEN, X.DUAN, Y.HOUTHOOFT, R.SCHULMAN, J.SUTSKEVER, I.ABBEEL, P.: "Infogan: Interpretable representation learning by information maximizing generative adversarial nets", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2016, pages 2172 - 2180
"Eye Tracking in User Experience Design"
Attorney, Agent or Firm:
CLARK, David Julian (GB)
Download PDF:
Claims:
CLAIMS

1. A system (100) comprising: an output device (102) configured to provide an audio and / or visual stimulation (114) to a user (104); one or more biometric sensors (106) that are configured to provide biometric signalling (108), which is representative of body measurements of the user (104) while they are exposed to the audio and / or visual stimulation (114); and a processor (110) configured to: process the biometric-signalling (108) in order to determine an interest- level-score (222); and provide a control-signal (112) to the output device (102) based on the interest-level-score (222), wherein the control-signal (112) is for adjusting the audio and / or visual stimulation (114) that is provided by the output device (102).

2. The system of claim 1, wherein the processor (210) is configured to iteratively process the biometric-signalling (208) and provide an updated control-signal (212) until: a target interest-level-score is reached; a pre-defined number of iterations are performed; or a rate of change of the interest-level-score between multiple iterations is less than a target value.

3. The system of claim 2, wherein the processor (210) is configured to provide an output-signal (224) that is representative of: the audio and / or visual stimulation that is provided by the output device as part of the last iteration; or the audio and / or visual stimulation that is associated with the highest interest- level-score.

4. The system of claim 1 , wherein the processor (210) is configured to: iteratively process the biometric-signalling (208) to determine a plurality of interest- level-scores (222), wherein each interest-level-score (222) is associated with an instance of the audio and / or visual stimulation; iteratively provide a plurality of control-signals (212) to the output device based on associated ones of the plurality of interest-level-scores (222); determine one of the interest-level-scores (222) as a selected-interest-level-score by applying a function to the plurality of interest-level-scores (222); and provide an output-signal (224) that is representative of the instance of the audio and / or visual stimulation that is associated with the selected-interest-level-score.

5. The system of claim 3 or claim 4, wherein the output-signal (224) comprises: an identifier of the instance of the audio and /or visual stimulation that is associated with the selected-interest-level-score; and / or the instance of the audio and / or visual stimulation that is associated with the selected-interest-level-score.

6. The system of any preceding claim, wherein: the one or more biometric sensors comprise one or more electroencephalography (EEG) sensors (306), and the biometric-signalling comprises EEG-signalling (308), which is representative of electrical activity in a user’s brain while they are exposed to the audio and / or visual stimulation.

7. The system of any preceding claim, wherein: the one or more biometric sensors (106) comprise one or more pupil size sensors, and the biometric-signalling (108) comprises pupil-size-signalling, which is representative of the size / dilation of the user’s pupil while they are exposed to the audio and / or visual stimulation

8. The system of any preceding claim, wherein the processor (210) is configured to iteratively process the biometric-signalling (208) and provide a new control-signal periodically, for example every 0.5 seconds.

9. The system of any preceding claim, wherein: the system is a facial reconstruction system; the output device (502) is configured to provide a visual stimulation to the user (504) that represents a person’s face; and the control-signal is for adjusting one or more features of the person’s face that is provided by the output device (502).

10. The system of any one of claims 1 to 8, wherein: the system is a machine support system; the output device (602) is configured to provide a visual stimulation to the user (604) that comprises a visual representation of how a subsequent operation can be performed by at least one machine (638) in the machine support system; and the control-signal is for adjusting one or more features of the visual representation of how the subsequent operation can be performed.

11. The system of any claim 10, wherein the processor is configured to: determine a selected-visual-stimulation as: the visual stimulation that is provided by the output device as part of the last iteration; or the visual stimulation that is associated with the highest interest-level- score; and provide a machine-control-signal (640) for automatically controlling the at least one machine (638) based on the selected-visual-simulation.

12. The system of any preceding claim, wherein the processor comprises: a biometric processor (316) that is configured to process the biometric-signalling (108) in order to determine the interest-level-score (222) by applying a deep artificial neural network.

13. The system of any preceding claim, wherein the processor comprises: an optimizer (326) that is configured to generate a latent variable value (327) based on the interest-level-score (222) as part of providing the control-signal (312).

14. The system of claim 13, wherein the optimizer (326) is configured to: receive descriptive data (334); and process the descriptive data (334) when determining the latent variable value (327), such that the latent variable is prohibited from taking one or more values that are inconsistent with the descriptive data (334).

15. The system of claim 13 or claim 14, wherein the processor further comprises: a latent-variable generative adversarial network model (328) that is configured to generate the control signal based on the latent variable (327).

16. A computer implemented method comprising: processing (750) biometric-signalling (108) in order to determine an interest-level- score (222), wherein the biometric-signalling (108) is representative of body measurements of a user (104) while they are exposed to audio and / or visual stimulation (114); and providing (752) a control-signal (112) to an output device (102) based on the interest-level-score (222), wherein the control-signal (112) is for adjusting the audio and / or visual stimulation that is provided by the output device (102).

Description:
A SYSTEM

The present disclosure relates to a system for providing audio and / or visual stimulation to a user.

According to a first aspect of the present disclosure there is provided a system comprising: an output device configured to provide an audio and / or visual stimulation to a user; one or more biometric sensors that are configured to provide biometric-signalling, which is representative of body measurements of the user while they are exposed to the audio and / or visual stimulation; and a processor configured to: process the biometric-signalling in order to determine an interest-level- score; provide a control-signal to the output device based on the interest-level- score, wherein the control-signal is for adjusting the audio and / or visual stimulation that is provided by the output device.

Such a system can advantageously adjust audio-visual content in response to a determined interest-level-score to iteratively optimize the audio-visual content. The system can enable new problems to be solved, such as to search for visual contents in a user’s brain, which was not possible before.

The processor may be configured to iteratively process the biometric-signalling and provide an updated control-signal until: a target interest-level-score is reached; a pre-defined number of iterations are performed; or a rate of change of the interest-level-score between multiple iterations is less than a target value.

The processor may be configured to provide an output-signal that is representative of: the audio and / or visual stimulation that is provided by the output device as part of the last iteration; or the audio and / or visual stimulation that is associated with the highest interest- level-score.

The processor may be configured to: iteratively process the biometric-signalling to determine a plurality of interest-level- scores, wherein each interest-level-score is associated with an instance of the audio and / or visual stimulation; iteratively provide a plurality of control-signals to the output device based on associated ones of the plurality of interest-level-scores; determine one of the interest-level-scores as a selected-interest-level-score by applying a function to the plurality of interest-level-scores; and provide an output-signal that is representative of the instance of the audio and / or visual stimulation that is associated with the selected-interest-level-score.

The output-signal may comprise: an identifier of the instance of the audio and / or visual stimulation that is associated with the selected-interest-level-score; and / or the instance of the audio and / or visual stimulation that is associated with the selected-interest-level-score.

The output device may be a multimedia device, such as a multimedia interface.

The one or more biometric sensors comprise one or more electroencephalography (EEG) sensors. The biometric-signalling may comprise EEG-signalling, which is representative of electrical activity in a user’s brain while they are exposed to the audio and / or visual stimulation. The one or more EEG sensors may be attachable to the user’s scalp or forehead.

The one or more biometric sensors may comprise one or more pupil size sensors. The biometric-signalling may comprise pupil-size-signalling, which is representative of the size / dilation of the user’s pupil while they are exposed to the audio and / or visual stimulation

The processor may be configured to iteratively process the biometric-signalling and provide a new control-signal periodically, for example every 0.5 seconds.

The system may be a facial reconstruction system. The output device may be configured to provide a visual stimulation to the user that represents a person’s face. The control- signal may be for adjusting one or more features of the person’s face that is provided by the output device. The system may be a machine support system. The output device may be configured to provide a visual stimulation to the user that comprises a visual representation of how a subsequent operation can be performed by at least one machine in the machine support system. The control-signal may be for adjusting one or more features of the visual representation of how the subsequent operation can be performed.

The processor may be configured to: determine a selected-visual-stimulation as: the visual stimulation that is provided by the output device as part of the last iteration; or the visual stimulation that is associated with the highest interest-level- score; and provide a machine-control-signal for automatically controlling the at least one machine based on the selected-visual-simulation.

The output device may be provided as part of a smart phone.

The processor may comprise: a biometric processor that is configured to process the biometric-signalling in order to determine the interest-level-score by applying a deep artificial neural network.

The processor may comprise: a generative model that is configured to generate the control signal (which is used for adjusting the audio and / or visual stimulation) by processing a latent variable input which uniquely or stochastically characterizes the generated output stimuli. Examples of these generative modes are generative adversarial networks and Variational Autoencoders. In this way, the processor further comprises: a latent-variable generative adversarial network model (or other generative model) that is configured to generate the control signal based on the latent variable.

The processor may further comprise: an optimizer that is configured to generate the latent variable values based on the interest-level-score as part of providing the control-signal. The optimizer may be configured to: receive descriptive data; and process the descriptive data when determining the latent variable. In this way, the latent variable can be prohibited from taking one or more values that are inconsistent with the descriptive data.

There is also disclosed a computer implemented method comprising: processing biometric-signalling in order to determine an interest-level-score, wherein the biometric-signalling is representative of body measurements of a user while they are exposed to audio and / or visual stimulation; and providing a control-signal to an output device based on the interest-level-score, wherein the control-signal is for adjusting the audio and / or visual stimulation that is provided by the output device.

There may be provided a computer program, which when run on a computer, causes the computer to configure any apparatus, including a system, processor, controller, robot, or device disclosed herein or perform any method disclosed herein. The computer program may be a software implementation, and the computer may be considered as any appropriate hardware, including a digital signal processor, a microcontroller, and an implementation in read only memory (ROM), erasable programmable read only memory (EPROM) or electronically erasable programmable read only memory (EEPROM), as non limiting examples. The software may be an assembly program.

The computer program may be provided on a computer readable medium, which may be a physical computer readable medium such as a disc or a memory device, or may be embodied as a transient signal. Such a transient signal may be a network download, including an internet download. There may be provided one or more non-transitory computer-readable storage media storing computer-executable instructions that, when executed by a computing system, causes the computing system to perform any method disclosed herein.

One or more embodiments will now be described by way of example only with reference to the accompanying drawings in which:

Figure 1 shows an example embodiment of a system that provides audio and / or visual stimulation to a user;

Figure 2 shows an example embodiment of a processor, which can be used in the system of Figure 1 ;

Figure 3 illustrates a further example embodiment of a system;

Figures 4A, 4B and 4C illustrate three views of an example embodiment of a headset that can be used part of a system described herein;

Figure 5 shows an example embodiment of a system that can search for faces;

Figure 6 shows an example embodiment of a system that can function as a human- machine interface and control a robot; and Figure 7 illustrates schematically an example embodiment of a computer implemented method.

Figure 1 shows an example embodiment of a system 100. The system 100 includes an output device 102 that can provide an audio and / or visual stimulation 114 to a user 104. In some examples, the output device 102 can be a portable electronic device such as a smart phone, tablet computer or laptop computer. The output device 102 may provide stimulation 114 to the user 104 using a single medium of expression (such as audio or visual), or stimulation using a plurality media expressions.

The system 100 also includes one or more biometric sensors 106. Each biometric sensor 106 can provide biometric-signalling 108, which is representative of body measurements of the user 104 while they are exposed to the audio and /or visual stimulation 114. Various examples of biometric-signalling 108 are described below. In an example where the output device 102 is a display screen that provides an image of a person’s face as visual stimulation 114, the biometric-signalling 108 can be representative of the user’s response to the face that is being displayed to them.

The system further includes a processor 110. The processor 110 can process the biometric-signalling 108 in order to determine an interest-level-score (not shown). An example of how such an interest-level-score can be determined is provided below. The interest-level-score is representative of the user’s 104 interest in the audio and / or visual stimulation 114 to which they are being exposed. The processor 110 can then provide a control-signal 112 to the output device 102 based on the interest-level-score. The control- signal 112 is for adjusting the audio and / or visual stimulation that is provided by the output device 102. In this way, the system 100 can automatically adjust the audio and / or visual stimulation 114 that is provided to the user 104 based on the user’s interest in the previously presented stimulation. This can result in the audio and / or visual stimulation being iteratively adjusted based on the user’s 104 interest.

As will be discussed in detail below, the system 100 can iteratively search in the user’s 104 brain for audio-visual associations that maximize a given objective function of the interest-level-score. This can be achieved by the system 100 providing stimulus 114 to the user 104, and then refining that stimulus 114 to match an unknown target. Figure 2 shows an example embodiment of a processor 210, which can be used in the system of Figure 1. The processor 210 includes an interest calculator 216, a control-signal calculator 218 and a loop controller 220.

The processor 210 receives biometric-signalling 208, such as from the one or more biometric-sensors that are shown in Figure 1. The interest calculator 216 iteratively processes the biometric-signalling 208 to determine a plurality of interest-level-scores 222. As discussed with reference to Figure 1 , each interest-level-score 222 is associated with an instance of the audio and / or visual stimulation to which they were being exposed when the biometric-signalling 208 was recorded. The interest calculator 216 iteratively provides the plurality of interest-level-scores 222 to the control-signal calculator 218. The control- signal calculator 218 can then iteratively provide a plurality of control-signals 212 to the output device (not shown) based on associated ones of the plurality of interest-level-scores 222. In this way, for each iteration, the processor 210 calculates an interest-level-score 222 for a current instance of the audio and / or visual stimulation, and determines a control- signal 212 for adjusting the audio and / or visual stimulation for the next iteration.

In this example, the processor 210 also includes a loop controller 220. The loop controller 220 receives the plurality of interest-level-scores 222, one for each iteration. In some examples, the loop controller 220 can store the plurality of interest-level-scores 222 for subsequent processing. In other examples the loop controller 220 can process the received interest-level-scores 222 “on the fly” as they are received. The loop controller 220 can perform one or more of the functionalities that are described below. It will be appreciated that the functionality of the loop controller 220 may be provided by a single processor or may be distributed across a plurality of processors.

The loop controller 220 can automatically control the interest calculator 216 and / or the control-signal calculator 218 such that the processor 210 iteratively processes the biometric-signalling 208 and provides an updated control-signal 212 until: a target interest-level-score is reached; a pre-defined number of iterations are performed; or a rate of change of the interest-level-score between multiple iterations is less than a target value.

For example, the loop controller 220 can compare a current interest-level-score 222 with a target interest-level-score. If the interest-level-score 222 is less than the target interest- level-score, then the loop controller 220 may control the control-signal calculator 218 and the interest calculator 216 such that they generate a new control-signal 212 and calculate a new interest-level-score 222. That is, the processor 210 performs another iteration if the interest-level-score is less than a target interest-level-score. If the interest-level-score is greater than or equal to the target interest-level-score, then the loop controller 220 may determine that no further iterations are required. Therefore, the loop controller 220 can control the control-signal calculator 218 and / or the interest calculator 216 such that no further iterations are performed once the target interest-level-score is reached.

Optionally, the loop controller 220 can maintain a count of the number of iterations that have been performed in a current operation. The loop controller 220 can compare the count with a pre-defined number of iterations. If the count is less than the pre-defined number of iterations, then the loop controller 220 may control the control-signal calculator 218 and the interest calculator 216 such that they generate a new control-signal 212 and calculate a new interest-level-score 222. If the count equals the pre-defined number of iterations, then the loop controller 220 can control the control-signal calculator 218 and / or the interest calculator 216 such that no further iterations are performed.

In some examples, the loop controller 220 may calculate a rate of change of the interest- level-score 222 between multiple iterations. This may simply be the difference between a current interest-level-score 222 and an immediately preceding interest-level-score 222, or may involve a more sophisticated function based on a plurality of preceding interest-level- scores 222. The loop controller 220 can compare the determined rate of change of the interest-level-score with a target value, and take similar action to that discussed above based on the result of the comparison. In this way the loop controller 220 can control the control-signal calculator 218 and / or the interest calculator 216 such that they stop performing further iterations when the rate of change drops below a target level.

In the example of Figure 2, the loop controller 220 determines one of the interest-level- scores 222 for a plurality of iterations as a selected-interest-level-score by applying a function to the plurality of interest-level-scores 222. Applying such a function may involve selecting the highest interest-level-score as the selected-interest-level-score. Alternatively, applying such a function may involve selecting: the lowest interest-level- score as the selected-interest-level-score; or the interest-level-score that is closest to a target-interest-score, as the selected-interest-level-score. It will be appreciated that the nature of the function will depend on the particular application with which the processor 210 is being used. The loop controller 220 can optionally provide an output-signal 224 that is representative of the instance of the audio and / or visual stimulation that is associated with the selected- interest-level-score. The output-signal 224 may comprise an identifier (for example a filename or a description of the stimulation) of the instance of the audio and / or visual stimulation that is associated with the selected-interest-level-score. Additionally or alternatively, the output-signal 224 may include the instance of the audio and / or visual stimulation. In this way the output-signal 224 can include the results of the search for a previously unknown target, which in some examples is the audio and /or visual stimulation that is associated with the highest interest-level-score.

In some examples, the processor 210 can provide an output-signal 224 that is representative of the audio and / or visual stimulation that is provided by the output device as part of the last iteration. In examples where the iterations cease when an acceptable interest-level-score 222 is reached, such an output-signal 224 can correspond to a desired stimulation.

Figure 3 illustrates a further example embodiment of a system 300. The system 300 can perform a closed-loop iterative brain search for audio-visual associations using EEG devices.

The system 300 includes a wearable product that has Electroencephalography (EEG) sensors 306. The EEG sensors are examples of biometric sensors. As shown in the figure, and as well known in the art, the EEG sensors 306 can be placed on a user’s 304 head to measure his / her brain activity and provide EEG-signalling 308 (which can also be referred to as EEG-data). The EEG-signalling 308 is an example of biometric-signalling (biometric-data), and is representative of electrical activity in the user’s 304 brain while they are exposed to the audio and / or visual stimulation. The system 300 also includes a multimedia (visual and audio) interface 302. The multimedia interface 302 is an example of an output device. The multimedia interface 302 can be a smart phone, a virtual or augmented reality device, for example, to provide audio-visual stimuli to the user 304.

The system 300 includes a processor, which in this example is the processor 310. The processor 310 is an Artificial Intelligence (Al) based system which searches for audio visual associations in the user’s 304 brain by maximizing the positive response (interest level) to iteratively produced synthetic audio-visual stimuli, responses that are decoded from the EEG-signalling 308. In neuroscience, such positive response signals can be referred to, or determined from, event-related potentials (ERPs) or P300 waves. Advantages of using EEG sensors 306 to measure positive responses can include (1) fast feedback from the EEG-signalling 308, for instance in less than a second while beneficially not distracting the user 304 as it does not require attention, and (2) the non-invasive nature of the process.

In Figure 3, three software modules of the embedded processor 310 are shown: (1) an Al- based biometric processor 316 which processes the EEG-signalling 308 to infer the user’s 304 response to the stimulation provided by the multimedia interface 302; (2) an Al-based generative model 328, which generates a control-signal 312 to cause the multimedia interface 302 to provide synthetic audio-visual content to the user 304; and (3) an Al-based optimizer 326, which optimizes for the human response by feeding the generator model 328 with an appropriate input.

The EEG sensors 306 (hardware) measure brain signals to generate the EEG-signalling 308. The EEG-signalling 308 is processed by the biometric processor 316 (software) to calculate a measured interest level 322 (which is an example of an interest-level-score). Such an interest level 322 can also be referred to as a positive biometric response. The biometric processor 16 can continuously calculate a measured interest level 322 by decoding the EEG-signalling 308, or it can periodically calculate the measured interest level 322.

The biometric processor is a deep artificial neural network (deep network) which consists of a number of trainable parameters. In some examples, the biometric processor implements a regression model. The trained model receives the raw biometric signals, e.g., raw EEG data, over a short period of time (such as a few hundred milliseconds) as an input. The trained model can then output an interest score value, which reflects how close the current stimuli (that the person is exposed to) is to the target stimuli that the system is searching for.

The model of such a biometric processor can be trained using a training dataset that includes pairs of true input-output data. The training dataset can be constructed by asking a number of human participants to remember a given stimuli. For the example of a facial reconstruction task, the stimuli would be a human face. Therefore, in this example, each participant will remember a given face (target face). The participant will then be exposed to a set of synthetic human faces, some of which are similar to the target face. The similarity is quantified based on the distance between a latent variable value corresponding to the given face and a latent variable value corresponding to the target face. The latent space here is the latent space of the generative model 328 that generates the synthetic facial images. Therefore, each training pair in the training dataset is constructed as the following: (i) the input data is the sequence of biometric measures, such as a few hundreds of milliseconds of EEG-signalling 308, while the participant is exposed to the stimuli; and (ii) the output data is a similarity measure between the generated stimuli and the target stimuli, e.g., quantified by a distance metric, e.g., Euclidean distance, between the latent variable values corresponding to the generated and the target stimuli. Alternatively, the output data for the training can be provided by an operator based on their subjective opinion of the similarity between the stimuli (such as the similarity between: (i) the target face; and (ii) the synthetic human face that was displayed to the participant when the input data was recorded). Once a sufficiently large training dataset is constructed, the trainable parameters of the network are updated such that for every training pair, setting the training input as the input of the network, the produced output of the network is as close as possible to the output of the corresponding training data. This paradigm is known as supervised learning.

The optimizer model 326 (software) receives the measured interest level 322, and iteratively optimizes the stimuli to increase this response (interest level 322). The optimizer model 326 then feeds the generative model 328 (software) (in this example with a latent variable 327) such that it generates an appropriate control-signal 312 for the multimedia interface 302. In this way, the generative model 328 can generate and adjust audio-visual synthetic data for providing to the user 304 as stimulation. The multimedia interface 302 (hardware) then displays the generated content to the user 304. In this example, all the software processes are performed by the processor 310 (hardware).

As will be discussed below, the optimizer 326 can optionally receive descriptive data, in this example textual data 334. The optimizer can also process such textual data 334 when determining the latent variable value, for instance to prohibit the latent variable from taking one or more values that are inconsistent with the textual data 334. That is, the textual data 334 can be used to apply restrictions to the position in a latent space that can be represented by the latent variable 327.

The optimizer 326 can optimize the generated stimuli to get closer to the target stimuli based on a gradient-based approach (such as a stochastic gradient descent), or based on a gradient-free approach (such as the Nelder-Mead optimization algorithm), or using Reinforcement Learning (RL), as non-limiting examples. It will be appreciated that any algorithm can be used that optimizes the measured interest level 322 for any specific application.

In some examples the generative model 328 can be a latent-variable generative adversarial networks (GAN) model. The system can generate synthetic images by training the latent-variable generative model. The latent variable of the generative model is a parameter to generate different faces. The generative model itself can be implemented based on generative adversarial networks (GAN), or Variational Autoencoders (VAE), or flow-based generative models, or generally any approach that can train a generative model to generate high-dimensional data (such as images, as represented by the control signal 312), based on a low-dimensional latent variable (such as the latent variable 327 that is output by the optimizer 326). An example of a generative adversarial network (GAN) is the InfoGAN method (Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I. and Abbeel, P., 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems (pp. 2172-2180).)

Every time a new face is shown to the user, the user’s response is measured by reading from the user’s brain while he / she is looking at the image. The response is treated as a measure that quantifies how similar the generated (synthetic) face is to the target face, i.e. , the face that the user is remembering. In line with the above description, in order to measure the response the biometric processor 316 (which may be implemented as a deep neural network) has been trained such that it calculates a single output value (the interest level / positive biometric response 322) corresponding to the user’s response for the measured EEG-signalling 308.

The closed-loop and iterative nature of the process that is shown in Figure 3 can increase the positive response of the user’s 304 brain (as represented by the measured interest level 322) in several steps. In this way the closed-loop system can search in the user’s 304 brain for some audio-visual content. The loop can be repeated, for example periodically such as every 0.5 second by performing the following steps: (1) the user 304 observes the new audio-visual content that is provided by the multimedia device 302; (2) the user’s brain signals are recorded as EEG-signalling 308; (3) the user’s 304 brain response is measured and quantified from the EEG-signalling 308 to provide a measured interest level 322; (4) an optimization algorithm is applied to the interest level 322 to generate a latent variable 327 that is intended to optimize the interest level 322; and (5) a control-signal 312 for the new content is generated by the generative model 328 based on the latent variable 327, and provided to the multimedia interface 302 such that it displays the new content.

One example application for the systems described herein is to search for images of faces in a user’s brain as he / she tries to remember a person. The user can be asked to look at random pictures of a person. Then, a system described herein can generate visual stimulation that reconstructs the face image merely by measuring the user’s EEG brain data while the user views different synthetic faces generated by the system. In this way, a measured interest level can be optimized in order to result in a synthetically generated face that should be a good match for the face that the user is remembering.

As discussed above, one or more of the examples disclosed herein can advantageously generate synthetic audio-visual content using state-of-the-art Al methods, such as generative adversarial networks (GAN). A latent variable generative model can be considered as a function that maps a latent random variable into an output audio/image. When combined with an optimization algorithm, e.g., reinforcement learning, they can be used to iteratively optimize the generated content. Such examples can enable new problems to be solved, such as to search for visual contents in one’s brain, which was not possible before.

Figures 4A 4B, and 4C illustrate three views of example embodiments of a headset that can be used part of a system described herein. The headset can include two EEG sensors that include contact electrodes that are attached to the user’s scalp or face (such as the participant’s forehead). In this way, a non-invasive system for providing EEG-signalling can be used. In this example the electrodes are in contact with the user such that they can monitor the activity in the brain.

Figure 5 shows an example embodiment of a system that can search for faces, in order to match a synthetically generated image of a face with a user’s 504 recollection of the face. Features of Figure 5 that are also shown in Figure 3 have been given corresponding reference numbers in the 500 series, and will not necessarily be described in detail here.

The system includes a display device 502 (which is an example of an output device) that sequentially provides synthetically generated images of a face to the user 504. In the same way as discussed below, the display device provides the image based on a control signal 512 received from a generative model 528. It has been found that images of a face can be represented by a latent variable 527, which is a relatively low number of dimensions and therefore can be processed sufficiently quickly to enable the processes to run in real-time. In this way, at least some of the systems described herein can advantageously process very high-dimensional biometric signalling (such as EEG-signalling), determine a one-dimensional interest level 522, calculate a relatively low-dimensional latent variable (for instance 20-dimensional), and convert it into a high dimensional audio or visual stimulation (such as the image of a face). In this way, beneficially, the optimization and generation of audio / video stimulation can be performed on low-dimensional input signalling such that the system can operate efficiently in terms of required processing resources and processing time.

In this example, a text description 534 of a face is provided to the optimization algorithm 526. The text description 534 can be used as part of a start-up routine so that the initially displayed image on the display device 502 represents a good starting point for the subsequent iterations. For instance, a text description 534 of a “40-year-old man” can be provided; in which case a stock image of a 40-year-old man’s face can be provided as an initial image on the display device 502. Additionally or alternatively, the optimization algorithm 526 can use the text description 534 such that the determined latent variable

527 cannot be given a value that is inconsistent with the text description 534. For instance, if the text description indicates that the target face is a man’s face, then the optimization algorithm may only generate latent variables 527 that are used by the generative model

528 to generate an image of a man’s face, and not a woman’s face. This may be implemented by the optimization algorithm 526 restricting the latent space in which the optimizer can operate to exclude all latent variables that are characterised as women’s faces.

In this way, the system of Figure 5 can be considered as a facial reconstruction system (or a facial composite system), which includes a display device 502 that provides a visual stimulation that represents a person’s face to the user 504. The system can then generate a control-signal for adjusting one or more features of the person’s face that is provided by the display device 502.

Figure 6 shows an example embodiment of a system that can function as a human- machine interface in order to control a robot 638. The control is based on matching images that are displayed to a user 604 with the user’s thoughts about what they would like the robot 638 to do. Again, features of Figure 6 that are also shown in Figure 3 have been given corresponding reference numbers in the 600 series, and will not necessarily be described in detail here.

In a similar way to that described above, the system includes an interest detector 616 that provides an interest-level-score 622 to an optimizer 626. The optimizer 626 provides a signal (such as a latent variable) to a generative model 628, which causes a display device 602 to display an image (or a sequence of images, such as a video) to the user 604. The image or video is of an operation that can be performed by the robot 638. Such an operation may be to pick up a screwdriver, as shown in Figure 6. It will be appreciated that any potential operation, or sequence of operations, that can be performed by the robot can be displayed to the user 604. For instance, the system may be able to determine a finite list of operations that can be performed by the robot 638 based on its current operational state. The generative model 628 can then sequentially display images of the potential operations to the user.

In this example, the generative model 628 provides an output-signal 624 to a robot controller 636. In the same way as described above with reference to Figure 2, the output- signal 624 may correspond to the image or video that attracted the highest interest of the user 604, according to the recorded EEG-signalling 608. The output-signal 624 can represent the operation or operations that were displayed to the user 604 on the display device 602 when the highest interest-level-score was detected.

The robot controller 636 can process the output-signal 624 and provide a robot-control- signal 640 to the robot 638. In one example, the robot controller 636 may use a database or look-up table to determine an appropriate robot-control-signal 640 based on the received output-signal 624. For instance the output-signal 624 may represent an identifier of a robotic operation, and the database / look-up table may provide the required robot- control-signal 640 that will cause the robot 638 to perform the intended robotic operation.

In this way, the system of Figure 6 can be considered as a machine support system, which includes a display device 602 that provides a visual stimulation of how a subsequent operation can be performed by at least one machine (such as the robot 638) in the machine support system. The system also generates a control-signal 612 for adjusting one or more features of the visual representation of how the subsequent operation can be performed. Optionally, the system can determine a selected-visual-stimulation (as represented by the output-signal 624 in Figure 6) as: (i) the visual stimulation that is provided by the display device 602 as part of the last iteration; or the visual stimulation that is associated with the highest interest-level-score. The system can then provide a machine-control-signal (as represented by the robot-control-signal 640) for automatically controlling the at least one machine / robot 638 based on the selected-visual-simulation.

In this way, if a machine (robot 638) cannot determine how to do its task, the system can formulate a question by forming a visual image of different ways in which the task can be performed. The visual image is displayed to the human user 604. Based on the interest- level feedback from the human, the system can update the visual image that is displayed to the user 604 until a satisfactory image is found. The resulting image should indicate the correct way of performing the task that the machine (robot 638). Advantageously, the system can then select the visual image from the plurality of images that were displayed to the user that attracted the highest interest, and automatically control the machine (robot 638) based on the selected image.

In this example, a text description 634, which can be for example voice commands, can be provided to the optimization algorithm 626. Such a text description 634 can be used in the same way as described above with reference to Figure 5.

By way of non-limiting example: assume a recycling robot 638 is not sure whether an object should go to a plastic bin or to an iron bin. A system associated with the robot 638 formulates these options as a plurality of visual images. The images in this case can be: (i) the object being thrown to the plastic bin; and (ii) the object being thrown to the iron bin. The human user 604 views several of those images very quickly and the system can determine the action with the highest interest score. Then the system can generate an appropriate robot-control-signal 640 that is sent to the robot 638 so that it can take the appropriate action.

In some examples, the display device 602 of Figure 6 that displays potential operations that can be performed by the robot 638 can be an augmented reality (AR) display device. For instance, AR glasses can be used to display the potential operations to the user 604.

In some examples, the robot 638 can be a prosthetic limb that is attached to the user 604. In this way, the user’s response to information that is presented to them by an output device can be used to automatically control the prosthetic limb.

It will be appreciated from the above description that the systems described herein can be used in one or more of the following applications: (a) Searching for most satisfying visual designs (industrial design, fashion, web, etc.) considering a group of participants (a focus group).

(b) Searching for images of human faces in a person’s brain for criminology applications such as identifying a suspect.

(c) Searching for audio-visual synthetic patterns to regulate brain state as a non- invasive therapy tool.

(d) Human-robot collaboration to control behavior of a robot by maximizing the response of the human user continually during the collaboration.

(e) Gaming applications to control the game using EEG brain signals by maximizing the interest level.

The examples that are described above mainly relate to use of EEG-signalling as the biometric-signalling. However, it will be appreciated that any biometric-signalling that can be used to determine an interest-level-score that is representative of a person’s interest can be used. For instance, the one or more biometric sensors may include a pupil size sensor that provides pupil-size-signalling. The pupil size sensor may include a camera that obtains images of a user’s eye. The pupil-size-signalling can be representative of the pupil size / dilation of the user’s eye while they are being exposed to the audio and / or visual stimulation. One example of a known way of determining such pupil parameters is described in Chapter 4 (Pupil dilation) in the book “Eye Tracking in User Experience Design” (Paperback ISBN: 9780124081383). In some examples, the processor of a system described herein can use both EEG-signalling and pupil-size-signalling to determine an interest-level-score.

Figure 7 illustrates schematically an example embodiment of a computer implemented method.

At step 750, the method involves processing biometric-signalling in order to determine an interest-level-score. As discussed in detail above, the biometric-signalling is representative of body measurements of a user while they are exposed to audio and / or visual stimulation.

At step 752, the method involves providing a control-signal to an output device based on the interest-level-score. Again, as discussed in detail above, the control-signal is for adjusting the audio and / or visual stimulation that is provided by the output device. This automatic adaption of the audio and / or visual stimulation, based on the interest-level- score, can advantageously enable sophisticated searching operations to be performed that can be used to dynamically determine an output-signal that is not necessarily one of a predetermined, finite, group of candidate audio and / or visual stimulations.