Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND DEVICE FOR UPDATING A MODEL
Document Type and Number:
WIPO Patent Application WO/2022/207680
Kind Code:
A1
Abstract:
The present invention relates to a computer-implemented method of updating a model for determining a property of a sensory stimulus and/or a measured recording of brain activity is disclosed. The method comprises the steps of providing an initial model based on a set of parameters comprising at least one supervised parameter and at least a measured recording of brain activity either or not in response to a sensory stimulus or at least one feature derived from a measured recording of brain activity either or not in response to a sensory stimulus; predicting at least one output label or quantity related to the sensory stimulus and/or the measured recording using the initial model; generating an updated model by recalculating at least one supervised parameter of the initial model, wherein the recalculation is based on at least the at least one predicted output label or quantity; and repeating steps i) to iii) at least once with the updated model in place of the initial model.

Inventors:
BERTRAND ALEXANDER (BE)
FRANCART TOM (BE)
GEIRNAERT SIMON (BE)
HEINTZ NICOLAS (BE)
Application Number:
PCT/EP2022/058357
Publication Date:
October 06, 2022
Filing Date:
March 30, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV LEUVEN KATH (BE)
International Classes:
A61B5/12; A61B5/00; A61B5/38; G10L17/00; G10L21/0208; G10L21/0272; G10L25/03; G10L25/30; G10L25/51; G10L25/66; H04R25/00
Foreign References:
US20100257128A12010-10-07
US20100257128A12010-10-07
Other References:
SINA MIRAN ET AL: "Real-Time Tracking of Selective Auditory Attention From M/EEG: A Bayesian Filtering Approach", FRONTIERS IN NEUROSCIENCE, vol. 12, 1 May 2018 (2018-05-01), CH, XP055768360, ISSN: 1662-4548, DOI: 10.3389/fnins.2018.00262
MASOUD GERAVANCHIZADEH ET AL: "Selective auditory attention detection based on effective connectivity by single-trial EEG", JOURNAL OF NEURAL ENGINEERING, vol. 17, no. 2, 1 April 2020 (2020-04-01), pages 026021, XP055768371, ISSN: 1741-2560, DOI: 10.1088/1741-2552/ab7c8d
S. MIRAN ET AL.: "Real-Time Tracking of Selective Auditory Attention From M/EEG: A Bayesian Filtering Approach", FRONTIERS IN NEUROSCIENCE, vol. 12, May 2018 (2018-05-01), XP055768360, DOI: 10.3389/fnins.2018.00262
M. GERAVANCHIZADEH ET AL.: "Selective auditory attention detection based on effective connectivity by single-trial EEG", J. NEURAL ENG., vol. 17, 2020, pages 026021
GEIRNAERT S ET AL.: "Fast EEG-based decoding of the directional focus of auditory attention using common spatial patterns", BIORXIV 2020.06.16.154450
M.J MONESI ET AL.: "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings", vol. 2020, May 2020, IEEE INC., article "An LSTM Based Architecture to Relate Speech Stimulus to EEG", pages: 941 - 945
W.BIESMANS ET AL.: "IEEE Transactions on Neural Systems and Rehabilitation Engineering", vol. 25, April 2017, article "Auditory-inspired speech envelope extraction methods for improved EEG-based auditory attention detection in a cocktail party scenario", pages: 402 - 412
Attorney, Agent or Firm:
DENK IP BV (BE)
Download PDF:
Claims:
Claims

1. A computer-implemented method of updating a model for determining a property of a sensory stimulus and/or a measured recording of brain activity, the method comprising: i) providing an initial model based on a set of parameters comprising at least one supervised parameter and at least a measured recording of brain activity either or not in response to a sensory stimulus or at least one feature derived from a measured recording of brain activity either or not in response to a sensory stimulus, ii) using the initial model to predict at least one label or quantity related to the sensory stimulus and/or the measured recording; iii) generating an updated model by recalculating at least one supervised parameter of the initial model, wherein the recalculation is based on at least the at least one predicted label or quantity of step ii), iv) repeating steps i) to iii) at least once with the updated model in place of the initial model.

2. The method according to claim 1 wherein the model is a model for determining a direction of an attended speaker relative to a subject and wherein the at least one predicted label or quantity used to recalculate at least one parameter of the initial model comprises a directionality label.

3. The method according to claim 1 wherein step i) further comprises providing a sensory stimulus or at least one feature derived therefrom and wherein step ii) comprises predicting at least one label or quantity of the measured recording and/or the sensory stimulus using the initial model.

4. The method according to claim 3 wherein step ii) comprises using the initial model to determine at least one predicted feature of the sensory stimulus and/or the measured recording and associating at least one label or quantity with the sensory stimulus and/or the measured recording based on the at least one predicted feature.

5. The method according to claim 4, wherein the model is a model for determining an attended speaker or at least one property thereof among multiple speakers included in the sensory stimulus.

6. The method according to claim 5, wherein step i) comprises providing the sensory stimulus or at least one feature derived from or related to the sensory stimulus, wherein step ii) comprises predicting the attended speaker or features thereof based on the measured recording of brain activity and the sensory stimulus or features derived from or related to the sensory stimulus, and wherein the label denotes the attended speaker.

7. The method according to claim 1, wherein the model is a model for determining whether the measured recording of brain activity is a response to a given sensory stimulus, wherein step i) comprises providing a sensory stimulus or stimulus feature derived therefrom, wherein step ii) comprises predicting whether the measured recording is a response to the provided sensory stimulus, and wherein the label denotes whether the measured recording is a response to the provided sensory stimulus.

8. The method according to claim 7 wherein, if the result of step ii) is a prediction that the measured recording is a response to the given sensory stimulus, step iii) comprises modifying one or more parameters of the initial model, and wherein, if the result of step ii) is a prediction that the measured recording is not a response to the given sensory stimulus, step iii) comprises setting the updated parameters of the updated model to be the same as the parameters of the initial model.

9. The method according to any preceding claim, wherein the initial model is a subject-independent model or a random model or a subject-dependent model.

10. The method according to any preceding claim wherein a plurality of iterations of steps i) to iii) are performed and wherein, for a given iteration in the plurality of iterations, the measured recording of brain activity and the sensory stimulus, if provided, is different to at least one recording of brain activity and sensory stimulus, if provided, which were provided in previous iteration(s).

11. The method according to any preceding claim wherein a plurality of iterations of steps i) to iii) are performed and wherein the measured recording of brain activity and the sensory stimulus, if provided, used in a given iteration at least partially overlaps with a measured recording of brain activity and sensory stimulus, if provided, which were provided in a previous iteration.

12. The method according to any preceding claim, wherein the model is a decoder or wherein the model is an encoder or wherein the model is a joint encoder and decoder.

13. The method according to any preceding claim, wherein step ii) comprises determining a confidence metric for the prediction and, if the measure of confidence is above a predetermined threshold value, step iii) comprises recalculating at least one parameter of the initial model and, if the measure of confidence is below the predetermined threshold value, step iii) comprises setting the parameters of the updated model to be the same as the parameters of the initial model.

14. The method according to any preceding claim, wherein said sensory stimulus is an audio stimulus.

15. A device for updating a model for determining a property of a sensory stimulus and/or a measured recording of brain activity comprising a data storage and a processor, wherein the data storage is configured to store an initial model based on a set of parameters comprising at least one supervised parameter and at least a measured recording of brain activity either or not in response to a sensory stimulus or at least one feature derived from a measured recording of brain activity either or not in response to a sensory stimulus; wherein the processor is configured to: - retrieve the initial model and at least a measured recording of brain activity either or not in response to a sensory stimulus or at least one feature derived from a measured recording of brain activity either or not in response to a sensory stimulus from the data storage;

- predict at least one label or quantity related to the sensory stimulus and/or the measured recording of brain activity using the initial model;

- generate an updated model by recalculating at least one supervised parameter of the initial model, wherein the recalculation is based on at least the at least one predicted label or quantity;

- predict at least one label or quantity related to the sensory stimulus and/or the measured recording of brain activity using the updated model, to generate a second updated model by recalculating at least one parameter of the updated model, wherein the recalculation is based on at least the at least one predicted label or quantity.

16. A computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the steps of the method of any of claims 1 to 14.

Description:
METHOD AND DEVICE FOR UPDATING A MODEL

Field of the invention

[0001] The invention relates to a method and device for updating a model for determining a property of a sensory stimulus and/or a measured recording of brain activity, based on a measured recording of brain activity either or not in response to an audio stimulus.

Background of the invention

[0002] People with hearing impairment can have major difficulties in understanding speech in noisy environments, which is why current hearing aids and cochlear implants are generally equipped with speech enhancement algorithms to suppress background noise. However, these algorithms generally have no information about the speech source the hearing aid user intends to attend. In so-called 'cocktail party problems' (i.e., listening scenarios with multiple speech sources), this leads to malfunctioning of these speech enhancement algorithms. One way to solve this is by analysing the brain activity of the listener to detect to which speaker the user is attending, which is often referred to as the 'auditory attention decoding' (AAD) problem.

[0003] Existing AAD algorithms traditionally employ a stimulus reconstruction approach via a decoder on neural recordings such as, e.g., electroencephalography (EEG). These AAD algorithms need to be trained off-line on EEG and audio data, using 'ground truth' information about which speakers are attended and unattended. As such, two strategies can be followed: the decoder can be trained on subject-specific data or on subject-independent data (data recorded from other subjects). The performance accuracy is much greater for subject-specific trained decoders. However, there is a cost for the higher performance of a subject-specific decoder: it requires subject-specific labelled data obtained via, e.g., a per-patient calibration session. Put differently, ground truth labels are required for the training data in order to train the supervised parameters in the decoder model, i.e. parameters that depend on labelled data to set them to meaningful values that improve the predictive power of the model. In a practical setting, this means that each neuro-steered hearing aid that is provided to a new user must be tuned to that specific user during such a calibration session, which is neither time-, nor cost-effective. A subject-independent decoder could be used in a 'plug- and-play' fashion, pre-installed on each neuro-steered hearing aid, leading to a generic hearing aid. In this way, neuro-steered hearing aids can be used on a new patient without any a priori calibration session. These decoders, however, suffer from lower AAD performance.

[0004] A second shortcoming is the fact that AAD decoders are at least partially fixed once they have been trained. In particular, the supervised parameters in the decoder remain unchanged due to the absence of ground truth labels during operation. However, EEG signal statistics may vary in the long term due to changes in electrode impedances, changes in brain state, etc. Due to the supervised training of these decoder parameters and the absence of 'ground truth' labels at run time, the accuracy can decrease over time.

[0005] These disadvantages also apply to other models which can be used in hearing aids, cochlear implant applications, or other devices, for example models for predicting the direction of the auditory attention or whether a particular audio signal corresponds to a particular brain response. [0006] “Real-Time Tracking of Selective Auditory Attention From M/EEG: A Bayesian Filtering

Approach" (S. Miran et al., Frontiers in Neuroscience, May 2018, Vol. 12, Art. 262) describes an algorithmic pipeline for real-time decoding of the attentional state, consisting of three main modules: real-time and robust estimation of encoding or decoding coefficients, achieved by sparse adaptive filtering, extracting reliable markers of the attentional state, and devising a near real-time state-space estimator that translates the attention markers to estimates of the attentional state. Miran et al describes dynamic updating of encoding/decoding models; however, the method requires ground truth labels on the attentional state for a supervised pre-training of some of the model parameters (i.e. the supervised parameters of the model), and Miran et al. do not describe a method to update the pre-trained supervised model parameters, i.e. they cannot be changed during the dynamic updating due to lack of ground truth labels. Although this is described as a "minimal amount of labelled data", there is still the requirement for a calibration session to obtain ground-truth labelled data to train these parameters and get meaningful results. Furthermore, even if such a calibration session is performed to pre-train these supervised parameters, the method lacks updating capabilities for these supervised parameters at run time (until a new calibration session is set up).

[0007] US2010/257128 describes a method of establishing a hearing ability model which models the frequency-dependent hearing thresholds of a particular person. The model may comprise an initial step of determining an initial model based on the representation of the distribution of hearing ability for a population of individuals and the first iteration of the method may include determining a hearing ability model representing the hearing ability of the person tested based on the observation related to the hearing evaluation event and the initial model. Each subsequent iteration may include determining an updated model based on the latest hearing ability model and the latest, or set of latest, observations. The method thus adapts a model for the hearing abilities of a person and does not determine properties of an audio stimulus or recorded brain activity. The model updating also requires an external source to provide ground truth labels on whether the subject can hear a certain sound or not (e.g. via behavioural responses of the subject). [0008] "Selective auditory attention detection based on effective connectivity by single-trial

EEG" (M. Geravanchizadeh et al., 2020 J. Neural Eng. 17 026021) describes a method for the selective auditory attention detection (SAAD) from single-trial EEG signals using the brain effective connectivity and complex network analysis for two groups of listeners attending to the left or right ear. For each subset of features the classification accuracy is determined with a model to perform feature selection. 100 different parameter settings are tested and the optimal settings are chosen for training of the model. A new model is recomputed with the selected features. Ground truth labels are required for the feature selection and the recomputation of the model.

[0009] There is still a need for methods of accurately updating one or more supervised parameters of a model at run time without need for ground truth labels.

Summary of the invention

[0010] According to a first aspect of the present invention there is provided a computer- implemented method of updating a model for determining a property of a sensory stimulus and/or a measured recording of brain activity, the method comprising the steps of: i) providing an initial model based on a set of parameters comprising at least one supervised parameter, trained either or not with data with ground truth labels, and at least a measured recording of brain activity either or not in response to a sensory stimulus or at least one feature derived from a measured recording of brain activity either or not in response to a sensory stimulus; ii) predicting at least one label or quantity related to the sensory stimulus and/or the measured recording of brain activity using the initial model; iii) generating an updated model by recalculating at least one supervised parameter of the initial model, wherein the recalculation is based on at least the at least one predicted label or quantity of step ii); and iv) repeating steps i) to iii) at least once with the updated model in place of the initial model.

[0011] It is an advantage of embodiments of the present invention that an updating method is provided which does not require prior knowledge of the subject. It is a further advantage of embodiments of the present invention that an updating method is provided which is capable of adapting model parameters (in particular, supervised parameters) to changes in the neural recording setup, the environment of the subject or in their behaviour or reaction to auditory stimuli.

[0012] The inventors discovered a surprising self-leveraging effect that when a model is updated according to embodiments of the present invention by using labels or quantities that were predicted by the model itself, rather than updating the model by training the model on externally provided high-quality labels or quantities, it is still possible to improve the accuracy of the model. This holds even in the case when the initial model is a random model that is not trained based on ground truth labels. A quantity related to the stimulus signal in this context is to be construed as something predicted by the model as an alternative for a label. The quantity may for example be a confidence metric quantifying the model's confidence in its own prediction, without being limited thereto. In some embodiments a quantity may be a prediction of a feature of the sensory stimulus. For example, if the decoder tries to predict the envelope of the attended speaker, the quantity may be a continuous variable describing the predicted location of a speaker.

[0013] It is an advantage of embodiments of the present invention that real-time updating of a model can be implemented, as there is no need to verify predicted labels or quantities. Unlabelled data can be used to iteratively update the model, with no need for ground-truth labels.

[0014] In advantageous embodiments the sensory stimulus is an audio stimulus. The model may be a model for determining a direction of an attended speaker relative to a subject and the at least one label may comprise a directionality label.

[0015] Step i) may further comprise providing a sensory stimulus or at least one feature derived therefrom and step ii) may further comprise predicting at least one label or quantity of the measured recording and/or the sensory stimulus using the initial model.

[0016] Step ii) may comprise using the initial model to determine at least one predicted feature of the sensory stimulus and/or the measured recording and associating at least one label with the sensory stimulus and/or the measured recording based on the at least one predicted feature. [0017] The model may be a model for determining an attended speaker or at least one property thereof among multiple speakers included in the audio stimulus.

[0018] Step i) may comprise providing the sensory stimulus or at least one feature derived from or related to the sensory stimulus, and, with an audio stimulus as sensory stimulus, step ii) may comprise predicting the attended speaker or features thereof based on the measured recording of brain activity and the audio stimulus or features derived from or related to the audio stimulus, wherein the label denotes the attended speaker.

[0019] The model may be a model for determining whether the measured recording of brain activity is a response to a given sensory stimulus, wherein step i) comprises providing an sensory stimulus or stimulus feature derived therefrom, wherein step ii) comprises predicting whether the measured recording is a response to the provided sensory stimulus, and wherein the label denotes whether the measured recording is a response to the provided audio stimulus. [0020] The method may comprise, if the result of step ii) is a prediction that the measured recording is a response to the given sensory stimulus, in step iii), modifying one or more parameters of the initial model, and, if the result of step ii) is a prediction that the measured recording is not a response to the given sensory stimulus, in step iii), setting the updated parameters of the updated model to be the same as the parameters of the initial model.

[0021] The initial model may be a subject-independent model or a random model.

[0022] The initial model may be a subject-dependent model.

[0023] A plurality of iterations of steps i) to iii) may be performed and, for a given iteration in the plurality of iterations, the measured recording of brain activity and the sensory stimulus, if provided, may be different to at least one recording of brain activity and audio stimulus, if provided, which were provided in previous iteration(s).

[0024] A plurality of iterations of steps i) to iii) may be performed and the measured recording of brain activity and the sensory stimulus, if provided, used in a given iteration may at least partially overlap with a measured recording of brain activity and sensory stimulus, if provided, which were provided in a previous iteration.

[0025] The model may be a decoder, or backward model. The model may be an encoder, or forward model. The model may be a joint encoder and decoder.

[0026] Step ii) may comprise determining a confidence metric for the prediction and the recalculation in step iii) may result in the same parameters of the original model in case the measure of confidence is below a predetermined threshold value.

[0027] According to a second aspect of the present invention there is provided a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the steps of the method as previously described.

[0028] According to a third aspect of the present invention there is provided a computer- readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the steps of the method as previously described.

[0029] According to a fourth aspect of the present invention there is provided a system for updating a model for determining a property of a sensory stimulus and/or a measured recording of brain activity based on a set of parameters according to the first aspect. The system may comprise a data storage configured to store an initial model and at least a measured recording of brain activity either or not in response to a sensory stimulus or at least one feature derived from a measured recording of brain activity either or not in response to a sensory stimulus, and a processor configured to retrieve the initial model and at least a measured recording of brain activity either or not in response to a sensory stimulus or at least one feature derived from a measured recording of brain activity either or not in response to a sensory stimulus from the data storage, wherein the processor is further configured to predict at least one label or quantity of the sensory stimulus and/or the measured recording of brain activity using the initial model, to generate an updated model by recalculating at least one supervised parameter of the initial model, wherein the recalculation is based on at least the at least one predicted label or quantity; and to repeat steps i) to iii) at least once with the updated model in place of the initial model, provided that communication is possible between the storage and the processor by wired or wireless means.

[0030] For purposes of summarizing the invention and the advantages achieved over the prior art, certain objects and advantages of the invention have been described herein above. Of course, it is to be understood that not necessarily all such objects or advantages may be achieved in accordance with any particular embodiment of the invention. Thus, for example, those skilled in the art will recognize that the invention may be embodied or carried out in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other objects or advantages as may be taught or suggested herein.

[0031] The above and other aspects of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter.

Brief description of the drawings

[0032] The invention will now be described further, by way of example, with reference to the accompanying drawings, wherein like reference numerals refer to like elements in the various figures. [0033] Fig.1 is a flow chart of a method according to embodiments of the present invention.

[0034] Fig.2 is a plot of the accuracy Φ(p i ) of a model updated according to embodiments of the present invention, as a function of the probability pi of correctly predicting the attended speaker with the initial model.

[0035] Fig.3a is plot shown the fix-point iteration paths followed by a first representative subject from Dataset I, wherein the accuracy Φ(p i ) is plot as a function of the probability p,. Fig.3b is plot shown the fix-point iteration paths followed by a second representative subject from Dataset I, wherein the accuracy Φ(p i ) is plot as a function of the probability p i , Fig.3c is plot shown the fix-point iteration paths followed by a third representative subject from Dataset I, wherein the accuracy Φ(p i ) is plot as a function of the probability pi.

[0036] Fig.4a is a plot related to Dataset I showing the unsupervised subject-specific decoder, wherein both types of initialization (random: rand-init, subject-independent information, Sl-info) clearly outperforms a subject-independent decoder, while approximating the performance of a supervised subject-specific decoder especially on short decision windows. Fig.4b is a plot similar to Fig.4a but related to Dataset II showing the same trend occurs for Dataset II, although the unsupervised subject-specific decoder with random initialization outperforms the subject- independent decoder less apparent. Fig.4c plots the MESD values per-subject (each subject = one dot) of Dataset I. Fig.4d plots the MESD values per-subject (each subject = one dot) of Dataset I.

[0037] Fig.5a shows the AAD accuracy convergence plots for all subjects of Dataset I using a random initialization, on 60 s decision windows. Fig.5b shows the AAD accuracy convergence plots for all subjects of Dataset I using a random initialization, on 60 s decision windows.

[0038] Fig.6a shows the CCA accuracy as a function of the probability of predicting the match or mismatch, for 10s window length of the speech envelope. Fig.6b shows the CCA accuracy as a function of the probability of predicting the match or mismatch, for 60s window length of the speech envelope.

[0039] Fig.7 shows the accuracy of various CCA models as a function of window length.

[0040] The drawings are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes. Any reference signs in the claims shall not be construed as limiting the scope.

Detailed description of illustrative embodiments

[0041] The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto but only by the claims. [0042] Furthermore, the terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequence, either temporally, spatially, in ranking or in any other manner. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein.

[0043] It is to be noticed that the term "comprising", used in the claims, should not be interpreted as being restricted to the means listed thereafter; it does not exclude other elements or steps. It is thus to be interpreted as specifying the presence of the stated features, integers, steps or components as referred to, but does not preclude the presence or addition of one or more other features, integers, steps or components, or groups thereof. Thus, the scope of the expression "a device comprising means A and B" should not be limited to devices consisting only of components A and B. It means that with respect to the present invention, the only relevant components of the device are A and B. [0044] Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments. [0045] Similarly it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

[0046] Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.

[0047] It should be noted that the use of particular terminology when describing certain features or aspects of the invention should not be taken to imply that the terminology is being re- defined herein to be restricted to include any specific characteristics of the features or aspects of the invention with which that terminology is associated.

[0048] In the description provided herein, numerous specific details are set forth. Flowever, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

[0049] Referring to Fig.1, a flowchart of a method according to embodiments of the present invention is shown. The method is a method of updating a model for determining a property of a sensory stimulus and/or a measured recording of brain activity. [0050] The sensory stimulus is in preferred embodiments an audio stimulus. In other embodiments the sensory stimulus may be a visual stimulus. For example, one could design a model to reconstruct the temporal modulation of the pixel intensity, motion, or optical flow of an attended object in a video stimulus based on the brain responses to this video, fully analogous to a model that reconstructs the temporal modulations of the speech envelope of an attended speaker based on the brain responses to this speech. In the case of a video stimulus, the label may relate to whether the subject is paying attention to this moving object or not. In the description below an audio stimulus is taken as example stimulus to explain the invention in detail. The skilled person will, however, readily understand that the invention is not limited thereto.

[0051] The method comprises the following steps. In step S 1, the following are provided: i) an initial model; and ii) a measured recording of brain activity either or not in response to an audio stimulus or at least one feature derived from the measured recording of brain activity. In some embodiments an audio stimulus or at least one feature derived therefrom is also provided in step S 1. The measured recording of brain activity may be in response to the provided audio stimulus or may be in response to an unknown audio stimulus. In some embodiments according to the present inventions, the measured recording of brain activity may be in response to an unknown stimulus, without being limited thereto, e.g. speech, or may be unrelated to any stimulus.

[0052] In step S2, at least one label of the audio stimulus and/or the measured recording is predicted using the initial model. The audio stimulus in question may be an audio stimulus provided in step S 1. The audio stimulus in question may be an audio stimulus which is not provided in step S 1 and the at least one label of the audio stimulus may be predicted based only on the provided measured brain activity recording (or feature derived therefrom) provided in step S 1. Therefore if at least one label of an audio stimulus is predicted in step S2, the audio stimulus does not necessarily need to be known in advance.

[0053] In step S3, an updated model is generated by recalculating at least one supervised parameter of the initial model, wherein the recalculation is based on at least the at least one self- predicted label in step S2 and/or a confidence metric or other quantity corresponding to the at least one self-predicted label in step S2. With supervised parameter is meant a model parameter which requires training data with ground truth labels to compute a meaningful value for it, thereby allowing the model to generate more predictive labels that better correlate with the ground truth labels. A self-predicted label is a quantity generated by the model, which is expected to be (or become) predictive for the ground truth (i.e. a quantity having a correlation with the ground truth labels). Hence, by using the self-predicted label(s) for updating instead of ground-truth labels, there is no need for ground truth labels in the updating procedure. This is advantageous, as it enables the model to be used without pre-training and without the need to label features of the audio stimulus or recording of brain activity, or to perform a calibration session where the brain activity is actively steered to comply with pre-defined labels.

[0054] In step S4, the method returns to step S 1 and steps S 1 to S3 are repeated at least once with the updated model in place of the initial model.

[0055] The steps of the method are described in more detail in the following.

Step SI

[0056] The measured recording of brain activity is a recording which is in response to an audio stimulus or not. In some embodiments the measured recording is a response to the audio stimulus provided in step S 1. In some embodiments, for example match-mismatch classification applications, it may not be known in advance whether the measured recording is a response to the audio stimulus provided in step S 1. According to at least one embodiment of the invention, the measured recording of brain activity may be in response to an unknown stimulus, in which case the aim is to determine whether the measured recording of brain activity is a response to a given audio stimulus or not. If it is determined, according to at least one embodiment of the present invention, that the measured recording of brain activity is not in response to a given audio stimulus, it could be a response to another audio stimulus or no response at all to any stimulus.

[0057] The measured recording may be an EEG recording of a subject either or not in response to the audio stimulus, for example as measured using EEG electrodes on the scalp. The minimum number of EEG electrodes required is one probe electrode and one corresponding reference electrode, which together provide one EEG channel measurement. Thus, in this case the EEG response consists of one channel measurement. However, multiple pairs of EEG electrodes can be used for the EEG response measurement and in this case the EEG response may include multiple EEG channel measurements. In some embodiments the measured recording may be an average of each of the EEG channels comprising the EEG response.

[0058] The measured recording generally reflects a specific (attentional) state of the brain, for example whether attention is paid to a particular speaker, or to a specific audio stimulus. The attentional state can be thought of as a property of the measured recording; it can be derived therefrom. Therefore, according to some embodiments of the present invention, the method is a method of updating a model for determining a property of a measured recording of brain activity, wherein the property is an attentional state of a subject.

[0059] The measured recording of brain activity may be a response measured using implanted electrodes placed on the brain (electrocorticography). Electrodes may be surface electrodes attached to the skin, subcutaneous electrodes, implanted electrodes. The measured recording may be a recording measured from the brain without using electrodes, for example using magnetoencephalography, near-infrared spectroscopy, (functional) magnetic resonance imaging. [0060] In some embodiments the method comprises a measured recording processing step which may occur before step S 1 or between steps S 1 and S2, and which can comprise filtering, resampling, re-referencing, normalisation, averaging across recording channels, averaging across repetitions, denoising, decomposition into components, component selection.

[0061] In some embodiments an audio stimulus or feature derived therefrom is also provided in step S 1.

[0062] The audio stimulus may be a recording of speech, music, or other sounds, or a combination thereof, either recorded or synthetically generated. The audio stimulus may be a real- time recorded or previously recorded audio signal.

[0063] In some embodiments, a feature derived from an audio stimulus is provided in step S 1 instead of providing the audio stimulus. For example, the envelope of the stimulus, a time-domain waveform, a spectrogram, a sequence of phonemes derived from an audio stimulus which comprises speech, a sequence of phonetic features derived from an audio stimulus which comprises speech, note onsets derived from an audio stimulus which comprises music, may be provided as a feature derived from an audio stimulus.

[0064] In some embodiments, the stimulus may be provided in a raw format and the method may further comprise a stimulus pre-processing step occurring before step S 1 or between steps S 1 and step S2, the stimulus pre-processing step comprising processing the stimulus in the raw format to extract data, for example to obtain one or more vectors of amplitudes and times. For example, in cochlear implant or hearing aid applications, the stimulus may be provided as a sound file format for example mp3 or wav, which is suitable for outputting in order to obtain an electrical response from the subject for example by playing through a loudspeaker; directly streaming from a computer to a sound processor of a cochlear implant or hearing aid through an audio cable; directly streaming from a computer to a sound processor of a cochlear implant or hearing aid using a direct interface without audio cable. The audio stimulus may be an electrical stimulation pattern for the electrode array of a cochlear implant.

[0065] In some embodiments, a set of joint statistics between an audio stimulus (or feature thereof) and a measured recording (or feature thereof) may be provided in step S 1. In such embodiments, the at least one feature derived from the audio stimulus and the at least one feature derived from the measured recording are comprised in the set of joint statistics. [0066] For example, in embodiments wherein the model is a linear model, the joint statistics may comprise the autocorrelation of the EEG (or feature derived therefrom), and the cross- correlation(s) between the EEG (or feature derived therefrom) and audio stimuli (or feature derived therefrom), for each audio stimulus, for example determined as described hereinafter in the example. By computing the autocorrelation and cross-correlation(s) for each new segment of incoming data which is received in step S 1, all the information needed to compute the updated decoders is available and the audio stimulus or EEG files do not need to be stored. To be able to update predictions on previous data windows received during step S 1 of previous iterations, an updated decoding model can be described as a linear combination of a predefined basis, or codebook, of decoders. Then only the correlations between these basis decoder outputs and the different speech envelopes have to be computed and stored in each iteration. For a linear model, the predicted labels for data from previous iterations can be updated via linear combinations of these previously computed correlations related to the basis decoders. In this case it is sufficient to only keep the second-order statistics of the data, avoiding the need to store the original audio stimulus or brain activity recordings from previous iterations.

[0067] Advantageously, by providing the joint statistics, it can be avoided that a large audio stimulus or measured recording file needs to be stored, as the joint statistics require less memory space than the audio stimulus and measured recording from which they are derived.

[0068] The initial model is a model for determining a property of an audio stimulus and/or a measured recording of brain activity. Examples of such models are described in more detail hereinafter.

[0069] The initial model, and updated models as generated using methods according to the present invention, can be thought of as a set of parameters and a rule for transforming an input to the model into an output, based on the set of parameters. The set of parameters comprises at least one supervised parameter, as already mentioned previously, i.e., a parameter that allows making the model more predictive in case it can be trained with access to ground-truth labels. In methods according to the present invention, the parameters are updated while the rule remains the same.

It is possible,

[0070] In some embodiments the initial model is a random model which requires no knowledge of the subjects and how they may respond to audio stimuli. The present invention allows such random models to be used as an initial, or starting, version of the model which is then iteratively improved over time to produce a personalized model which takes into account the subject's response to audio stimuli. Such a model with randomly set supervised parameters is still capable of generating labels that correlate with the ground truth. This allows the method according to embodiments of the present invention to be used for many different subjects without requiring any prior measurement of their neural responses or audio processing ability. Thus, an initial model used in methods according to embodiments of the present invention does not need to be a pre-trained model.

[0071] In some embodiments the initial model is a subject-independent model, being a model which has been previously trained on audio stimulus-brain response data of subjects other than the subject for whom the model is to be used. For example, a subject-independent model may be pre-trained using stimulus-response data from a collection of subjects who have characteristics in common with or differing from the subject for whom the model is to be used, such as age, hearing impairment characteristics such as type or severity. This allows the method according to embodiments of the present invention to be used with an improved starting model (relative to a random model), which may result in faster convergence to an acceptably accurate model when updated according to embodiments of the present invention, while still not requiring prior knowledge of the subject's auditory response. Thus, an initial model used in methods according to embodiments of the present invention does not need to be a pre-trained model which is trained on data specific to the subject.

[0072] In some embodiments the initial model is a subject-dependent model, being a model which has been previously trained on stimulus-response data of the same subject for whom the model is to be used. This allows the method according to embodiments of the present invention to be used with a further improved starting model (relative to the subject-dependent model), which may result in even faster convergence to an acceptably accurate model when updated according to embodiments of the present invention.

[0073] Generally, a subject-independent model before updating according to embodiments of the present invention performs better than a random model and not as well as a subject- dependent model.

[0074] In some embodiments the initial model is a decoder, that is, a model for predicting an audio stimulus or features thereof based on the measured brain response or features thereof. A decoder is also known as a backward model. For example, a decoder may predict a speech envelope based on a measured response. A decoder may predict a direction of an attended speaker relative to the subject based on a measured response. A decoder may predict the magnitude spectrum of a speech or music signal, the onsets of notes of a music signal, the onset of phonemes, or other features of the audio stimulus.

[0075] In some embodiments the initial model is an encoder, that is, a model for predicting a measured brain response or features thereof based on an audio stimulus or features thereof. An encoder is also known as a forward model. The audio stimulus or features thereof may be any of the examples disclosed herein. For example, an encoder may predict a response to a speech envelope, a response to a time-domain waveform, a response to a spectrogram, etc.

[0076] In some embodiments the initial model is a joint encoder and decoder, that is, a model in which an encoder determines a first intermediate signal from the audio stimulus or features thereof, and a decoder determines a second intermediate signal from the brain response or features thereof. The encoder and decoder model are constructed in such way that the two intermediate signals are then for example maximally correlated. Instead of trying to fully predict the audio stimulus (decoder) or brain response (encoder), the model attempts to find optimal in-between responses by jointly employing an encoder and decoder. With linear models for the encoder and decoder, such a joint approach is a canonical correlation analysis (CCA).

[0077] In such embodiments the model is a model for determining a property of an audio stimulus and/or measured recording of brain activity, and the property is the in-between representation for which the correlation is maximized.

[0078] It will be understood that, in examples herein wherein a decoder is referred to, it is also within the scope of the present invention to implement such examples with an encoder or joint encoder-decoder, and vice versa.

Step S2

[0079] In step S2, at least one label of the audio stimulus (or feature derived therefrom) and/or the measured recording (or feature derived therefrom) is predicted using the initial model. The audio stimulus may be an audio stimulus as provided in step S 1 or may be an (unknown) audio stimulus which resulted in the measured recording provided in step S 1.

[0080] A label indicates whether the audio stimulus and/or the measured recording has a particular property. For example, the model may be a model for determining whether a given audio stimulus is a match with a given measured brain response (a match/mismatch classification). In such embodiments, the label output by the model indicates whether the given audio stimulus is a response to the measured recording and this is predicted by the initial model using the provided inputs in step S 1. The label may be a numerical label, for example a zero to indicate a mismatch and a one to indicate a match of the audio stimulus and measured response. The model may be a model for determining whether an attended speaker has a particular location relative to the subject, for example to the left or to the right, in front or behind, within one of a plurality of angular zones, and may predict, for example, a different label for each possible location.

[0081] In some embodiments step S2 additionally comprises a step of using the initial model to predict a feature of the audio stimulus and/or the measured recording, and then assigning a label to the audio stimulus and/or the measured recording based on the predicted feature. Since the feature is still a predicted feature, the label determined based on the predicted feature is a predicted label which is predicted making use of the initial model.

[0082] For example, in embodiments wherein the initial model is a model for predicting a speech envelope of an attended speaker ('decoding'), step S 1 may comprise providing speech envelopes of speech associated with different speakers. Step S2 may comprise using the initial model to predict the attended speech envelope based on the measured recording, correlating the predicted attended speech envelope with the speech envelopes provided in step S 1, and assigning a label of one to the speech envelope which has the highest correlation with the predicted attended speech envelope and a label of zero to the other speech envelopes.

[0083] As a further example, in embodiments wherein the initial model is a model for predicting a brain response from an attended speech envelope ('encoding'), step S 1 comprises providing a measured recording of brain activity and a plurality of speech envelopes, being features derived from an audio stimulus. S2 comprises using the initial model to predict a brain response starting from the speech envelopes, correlating each predicted brain response with the measured recording provided in S 1, and assigning a label of one to the speech envelope which provides a predicted brain response which has the highest correlation with the measured brain response and a label zero to the other provided speech envelopes. In some embodiments, a multitude of the aforementioned features or labels can be combined in a joint model. For example, a model can provide a label on the direction of the attended speaker, while also providing a predicted brain response which can be correlated to the different speech envelopes and label them as attended or unattended speech.

Step S3

[0084] In step S3, an updated model is generated by recalculating at least one supervised parameter of the initial model, wherein the recalculation is based on at least the at least one self- predicted label in Step S2 and/or a confidence metric corresponding to the at least one self-predicted label in Step S2. It is noted that the recalculation is based on the self-predicted labels generated by the model in the previous step The model generates its own labels to update itself, resulting in a self- supervised approach.

[0085] Recalculating one or more supervised parameters of the initial model may comprise one or more of: recalculating all or a subset of the supervised parameters of the initial model; adding a small increment to all or a subset of the supervised parameters of the initial model; discarding all or a subset of supervised parameters of the initial model and recalculating those supervised parameters; adding or removing one or more supervised parameters of the initial model.

[0086] In some embodiments the model is a linear model and the updating may take the form of updating the cross-correlation vector and recomputing the decoder. In some embodiments, updating a linear model may comprise the additional step of generating a hybrid model which is a combination of a model from a previous iteration of the method and the model as updated by modifying one or more parameters, among which at least one supervised parameter.

[0087] In some embodiments the model may be a deep neural network, which may be pre- trained on a training set which is added to during a method according to embodiments of the present invention. In some embodiments, the training set may be initially empty (i.e. no pre-training is done). When a label is computed for an audio stimulus and brain response, the label and the associated data can be added to the training set. Updating the model may take the form of updating one or more layers of the network using an updated training set which includes the label determined in step ii). [0088] In some embodiments step S3 comprises determining a confidence metric for the at least one label predicted in step S2 and, if the confidence level is below a predetermined threshold value, setting the parameters of the updated model to be the same as the parameters of the initial model. This allows for labels with a low confidence level to be rejected and not used for updating the model. If the confidence level is above a predetermined threshold value, one or more supervised parameters of the updated model can be recalculated based on the at least one label predicted in step S2. A confidence metric can be based on, for example, a difference between the correlation of a predicted envelope with the envelope of an attended speaker and with the envelope of an unattended speaker. The confidence metric could be modelled probabilistically or thresholded.

[0089] In some embodiments the label value in step S3 is such that updating the model comprises setting the parameters of the updated model (comprising at least one supervised parameter) to be the same as the parameters of the initial model. For example, in a match-mismatch application, if the audio stimulus is determined to not be a match for the measured response, the updated model may be set to be the same as the initial model.

Step S4

[0090] In step S4, the method returns to step S 1 and steps S 1 to S3 are repeated with the updated model in place of the initial model. A new updated model is generated based on the updated model generated in the previous iteration of steps S 1 to S3.

[0091] The measured response and optional audio stimulus provided in a subsequent iteration of steps S 1 to S3 may be completely different to an audio stimulus and measured response provided in the previous iteration of steps S 1 to S3. For example, the measured response and optional audio stimulus provided in a first iteration of steps S 1 to S3 may correspond to a first speech fragment and the measured response and optional audio stimulus provided in a subsequent iteration of steps S 1 to S3 may correspond to a second speech fragment which is different to the first speech fragment. [0092] The measured response and optional audio stimulus provided in a subsequent iteration of steps S 1 to S3 may be the same as an audio stimulus and/or measured response provided in the previous iteration of steps S 1 to S3.

[0093] The measured response and optional audio stimulus provided in a subsequent iteration of steps S 1 to S3 may at least partially overlap with a measured response and optional audio stimulus provided in a previous iteration of steps S 1 to S3. For example, in a real-time application, a constant audio stream can be received (for example, through a cochlear implant or hearing aid), and the audio stimulus and measured response for a first iteration can comprise a section of the audio stream and corresponding measured response from time t = 0 to time t = t 1 ; and the audio stimulus and measured response for a second, subsequent iteration can comprise a section of the audio stream and corresponding measured response from time t = 0 to time t = t , where t is greater than ti, or for example from time t = t 3 to time t = t 2 , where t 2 is greater than ti and t is between t 1 and t 2 . Thus, there may be at least partial overlap of the input data for the first and the second iterations. [0094] Original or compressed versions of the measured recording or audio stimulus, or features thereof, may be stored for use in future iterations. For example, a method according to embodiments of the present invention may be implemented on a computer which includes a memory and a processor and is adapted to receive input data, for example through a wired or wireless connection. The processor may be adapted to carry out steps of a method as described herein and the memory may store original or compressed versions of the measured recording or audio stimulus, or features thereof, and/or sets of parameters for the model as determined in previous iterations.

[0095] In the following, examples of models to which the present invention may be applied are described. Flowever, it will be understood that other models are possible and the present invention is not limited to the presented models or combinations thereof.

Direction of attended speaker

[0096] Determining the position of an attended speaker relative to a subject is important, for example, when a subject is using a cochlear implant or a hearing aid. Knowledge of the position of an attended speaker relative to a subject allows beamforming to be applied in order to improve the signal-to-noise ratio of the received signal. [0097] The model may thus be a model for determining whether an attended speaker is located in a particular section of space relative to the subject. For example, the model may determine whether an attended speaker is located to the left or to the right of the subject. The model may determine whether an attended speaker is located in a particular quadrant of a space centred on the subject.

[0098] In a situation where two competing speakers are present, the model may be a model for determining whether the left-most or right-most speaker was attended. Using direction-of-arrival estimation techniques, the location of the attended speaker can then be located. This could also be used in combination with the above-described approach, in case multiple speakers are present within in angular range.

[0099] The label determined in step S2 denotes the direction of the attended speaker, for example by assigning a numerical value to each of a plurality of segments of space centred on the subject, and assigning a numerical value to the audio stimulus or measured response which corresponds to that of the space segment which was determined to be the location of the attended speaker.

Attended speaker detection

[0100] Audio attention decoding, or determining which speaker among multiple speakers is attended to by a subject, can help facilitate hearing in hearing aid and cochlear implant users by allowing the attended speaker's speech to be identified and amplified. This can help to increase intelligibility of speech where multiple speakers are present.

[0101] The model may thus be a model for determining an attended speaker by identifying a speech envelope of an attended speaker within an audio stimulus which includes multiple speech envelopes.

[0102] For example, the audio stimulus or feature derived therefrom which is provided in step S 1 may comprise a set of speech envelopes comprised in the audio stimulus to which the measured recording of brain activity is a response. In step S 1, the model may predict an attended speech envelope based on the measured recording and correlate the predicted attended envelope with each of the speech envelopes provided in step S 1. The correlation may be, for example, a Pearson correlation. A label may be assigned to each of the provided speech envelopes depending on the result of the correlation. For example, the provided speech envelope having the highest correlation with the predicted envelope may be assigned a label of one, denoting that this provided envelope is that of the attended speaker, and the remaining provided speech envelopes may be assigned a label of zero, denoting that these provided envelopes are those of other, not-attended speakers.

Match-mismatch [0103] In some situations it is helpful to know whether a given audio stimulus (or feature derived therefrom) corresponds to a given measured response.

[0104] The model may thus be a model for determining whether the audio stimulus provided in step S 1, or the audio stimulus from which the feature of an audio stimulus provided in step S 1 is derived, matches with the measured recording provided in step S 1, or the measured recording from which the feature of a measured recording provided in step S 1 is derived. By "match" is meant that the audio stimulus can be thought of as causing the measured brain response, as compared to a mismatch, where an audio stimulus cannot be thought of as causing the measured brain response. [0105] In step S2 the model predicts a label for the audio stimulus which specifies whether the audio stimulus corresponds to, or matches, the measured recording. For example, the label may be one for a match and zero for a mismatch.

[0106] In step S3, if the audio stimulus or feature derived therefrom is labelled as a match, the initial model is updated by recalculating at least one parameter of the model.

[0107] If the audio stimulus or feature derived therefrom is labelled as a mismatch, the updating may alternatively take place by setting the parameters of the updated model to be the same as the parameters of the initial model.

First Example implementation

[0108] In a stimulus reconstruction approach for auditory attention decoding (AAD), an attended speech envelope is predicted based on the measured brain response, as described hereinbefore.

[0109] A linear minimal mean-square error filter, or decoder, can be trained to reconstruct the attended speech envelope from a measured EEG response according to equation 1: where is the optimized decoder, x c (t ) is a sample of the c th EEG channel at time t, d c (l ) is the decoding coefficient corresponding to channel c and lag /, and s att (t ) is the attended speech envelope. While in this example s att (t ) is a speech envelope, it could also represent any other temporal signal related to the sensory stimulus that generates the EEG responses. For example, if the sensory stimulus is a video, s att (t ) can represent the optical flow or integrated pixel intensity of an object in that video.

[0110] The first sum in equation 1 is a sum over channels, corresponding to the spatial integration of EEG channels, while the second sum is over time lags, corresponding to the temporal integration of time lags. This therefore represents the spatio-temporal character of the decoder. This corresponds to a linear filter per EEG channel and a summation across all these filter outputs.

[0111] Equation 1 can be reformulated in the least-squares formulation according to equation 2:

In equation 2, the squared error between the attended envelope s att and the reconstructed envelope = Xd is minimized. In equation 2, the following notation is used: is a vector, containing all training samples of the signal envelope of the attended speaker; is a Toeplitz matrix, with in each row the L future EEG samples relative to the t-th sample, for each of the C channels of the EEG; is the unknown set of filter coefficients (or decoder coefficients) which would match the EEG data in X with the attended speech envelope in s att . The vector d is thus a stacked vector of all decoder coefficients, per channel and time lag.

[0112] In a conventional approach to AAD, equation 2 is solved by solving the normal equations according to equation 3: where and is the estimated autocorrelation matrix of the EEG and is the estimated cross-correlation between the EEG and the attended speech envelope. Thus, the labels of the speech envelopes - which corresponds to the attended speaker and which do not - need to be known in order to solve equation 2 in the conventional approach, in order to determine S att which is required to compute the cross-correlation vector.

[0113] The present invention does not require these labels to be known. Assume now the availability of a training set of K segments (for example, corresponding to different trials in an experiment) of EEG data and corresponding speech envelopes, without knowledge of the attended speaker, i.e. the labels y k are not available. Only the presented competing speech envelopes (s 1k ,s 2k ) are known, of which one corresponds to the attended speaker, while the other corresponds to the unattended one. Note that in practice, these speech envelopes can be extracted from the recorded speech mixtures in a hearing device. This means that training a decoder to reconstruct the attended speech envelope boils down to an unsupervised problem. Embodiments of the present invention thus remove the requirement of subject-specific ground-truth labels.

[0114] The following paragraphs refer to the description below on an preferred embodiment of a method of updating a model for determining a property of an audio stimulus and/or a measured recording of brain activity based on a set of parameters, and in particular an unsupervised training or adaption of a stimulus reconstruction decoder method according to embodiments of the present invention. The method may use as an input an aforementioned training set of K segments of EEG data and corresponding speech envelopes an initial autocorrelation matrix and cross-correlation vector a regularization parameter l and updating hyperparameters a and b, and a maximal number of iterations i max . The output of the method according to embodiments of the present invention is a stimulus reconstruction decoder as expressed in equation 3.

[0115] According to embodiments of the present invention, the method comprises a step of computing or updating the autocorrelation matrix and compute an initial decoder:

The method further comprises a step of predicting the labels on a training set (equation 5) while i £ i-maxi wherein the cross-correlation vector is updated using the predicted labels (equation 6), and wherein the initial decoder is updated using the updated cross-correlation vector (equation 6).

[0116] According to embodiments of the present invention, the autocorrelation matrix is estimated using the subject-specific EEG data. This autocorrelation matrix is independent of the ground-truth labels, which are only required for the cross-correlation vector. It is thus always possible to perform this update. If desired, the estimated and regularized autocorrelation matrix can be linearly combined with an initially provided autocorrelation matrix controlled with the user-defined hyperparameter 0 ≤ α ≤ 1 (and 1 — α). This hyperparameter can be interpreted as the amount of confidence in the a priori available autocorrelation matrix This initial autocorrelation matrix can be estimated on, for example, subject-independent data and can be considered as an extra regularization term (e.g., as used in Tikhonov regularization). If no such a priori autocorrelation matrix is available, a is simply set to 0. Using the updated autocorrelation matrix, the decoder is estimated in combination with an initially provided cross-correlation vector This cross-correlation vector can again be estimated in a subject-independent manner, but could also be generated fully random. It is recommended to normalize the initial autocorrelation matrix and cross-correlation vector such that they have a Frobenius norm equal to the estimated auto-/cross-correlation matrix/vector, improving the interpretability of the hyperparameters. In case more than two speech envelopes are provided, equation 5 becomes equation 5a:

The initial decoder will then bootstrap the iterative procedure to update the decoder weights. Starting from this initial decoder, the labels of the training segments are predicted based on the maximal Pearson correlation coefficient between the reconstructed envelope and the speech envelopes of the competing speakers. These predicted labels are then used to select the attended speech envelope in each of the K segments, which is afterwards used to update the cross-correlation vector. Note that it is crucial that the updating is performed not using the reconstructed envelope from the EEG, but with the speech envelope of one of the two competing speakers identified as the attended one. Again, some prior knowledge can be introduced in the updating of the cross- correlation vector using an initially provided cross-correlation vector and hyperparameter 0 ≤ β ≤ 1. The updated cross-correlation vector can then be used to re-estimate the decoder. Multiple iterations of predicting the labels and updating the decoder can be performed until the decoder has converged or a maximal number of iterations has been reached. It is expected that this iterative process initiates a self-leveraging effect, in which the decoder leverages its own predictions to improve.

[0117] Using a method according to embodiments of the inventions and in particular a method comprising the steps as described in relation to equations 4 to 6, a stimulus reconstruction decoder can be trained.

[0118] The presented unsupervised AAD algorithm is validated on two separated datasets.

The first one (Dataset I) consists of EEG recordings of 16 normal-hearing subjects. EEG recordings were made in a soundproof, electromagnetically shielded room at ExpORL, KU Leuven. The BioSemi ActiveTwo system was used to record 64-channel EEG signals at 8196 Hz sample rate. The audio signals, low pass filtered at 4 kHz, were administered to each subject at 60 dBA through a pair of insert phones (Etymotic ER3A). The experiments were conducted using the APEX 3 program developed at ExpORL.

[0119] Four Dutch short stories, narrated by different male speakers, were used as audio stimuli. All silences longer than 500 ms in the audio files were truncated to 500 ms. Each story was divided into two parts of approximately 6 minutes each. During a presentation, the subjects were presented with the six-minutes part of two (out of four) stories played simultaneously. There were two stimulus conditions, i.e., 'HRTF' or 'dry' (dichotic). An experiment here is defined as a sequence of 4 presentations, 2 for each stimulus condition and ear of stimulation, with questions asked to the subject after each presentation. All subjects sat through three experiments within a single recording session. An example for the design of an experiment is shown in Table 1.

The first two experiments included four presentations each. During a presentation, the subjects were instructed to listen to the story in one ear, while ignoring the story in the other ear. After each presentation, the subjects were presented with a set of multiple-choice questions about the story they were listening to in order to help them stay motivated to focus on the task. In the next presentation, the subjects were presented with the next part of the two stories. This time they were instructed to attend to their other ear. In this manner, one experiment involved four presentations in which the subjects listened to a total of two stories, switching attended ear between presentations. The second experiment had the same design but with two other stories. Note that Table 1 was different for each subject or recording session, i.e., each of the elements in the table were permuted between different recording sessions to ensure that the different conditions (stimulus condition and the attended ear) were equally distributed over the four presentations. Finally, the third experiment included a set of presentations where the first two minutes of the story parts from the first experiment, i.e. a total of four shorter presentations, were repeated three times, to build a set of recordings of repetitions. Thus, a total of approximately 72 minutes of EEG was recorded per subject. [0120] The second dataset (Dataset II) consists of EEG recordings of eighteen (18) normal- hearing subjects, attending to one out of two competing speakers, located at ± 60° along the azimuth direction. Per subject, 50 min of EEG and audio data are available. Different acoustic room settings are used: anechoic, mildly reverberant, and highly reverberant. Both data sets are recorded using a 64-channel BioSemi ActiveTwo system.

[0121] For the preprocessing of the EEG and audio data, the audio signals are first filtered using a gammatone filterbank. From each subband signal, the envelope is extracted using a power- law operation with exponent 0.6, after which one final envelope is computed by summing the different subband envelopes. Both the EEG data and speech envelopes are filtered between 1 and 9 Hertz (Hz) and downsampled to 20 Hz. Note that it is here assumed that the clean speech envelopes are readily available and need not be extracted from the microphone recordings. For Dataset II, the 50s segments are normalized such that they have a Frobenius norm equal to one across all channels. [0122] In the design of the stimulus reconstruction decoder, L = 250 ms is chosen such that the filter spans a range of 0 — 250 ms post-stimulus. Furthermore, the regularization parameter λ is analytically determined. A maximum i max = 10 iterations is used, which in practice showed to be sufficient.

Cross-validation and evaluation

[0123] For the supervised subject-specific decoder, a random ten-fold cross-validation scheme is used to train and test the decoders. The supervised subject-independent decoders are evaluated using a leave-one-subject-out cross-validation scheme where a decoder is trained on the data of all other subjects and tested on the left-out subject. The ground truth labels along with the associated audio stimulus and measured response were generated by asking the subject to listen to a specific speaker and recording their brain response. The presented unsupervised subject-specific decoder is tested in a random ten-fold cross-validation manner as well, where the updating happens on the training set (without knowledge of the labels) and the testing on the left-out data. The partitioning of the data is performed on segments of 60 seconds (s) for Dataset I and 50 s for Dataset II. Per subject, the continuous recordings are thus first split into these segments and then randomly distributed over a training and test set. At test time, the left-out 60/50 s segments are split into smaller sub-segments, also referred to as the "decision windows" described before. The accuracy per fold is then defined as the ratio of correctly decoded decision windows, expressed as a number between 0 and 1, where 0 represents no labels correctly classified and 1 represents all labels correctly classified. Finally, the accuracies across all folds are averaged to obtain a single accuracy for a particular subject. These shorter decision windows are only used in the test folds, in order to evaluate the trade-off between the AAD accuracy and the decision window length (longer decision windows provide more accurate correlation coefficients, yielding higher AAD accuracies at the cost of slower decision-making). However, the prediction and updating according to embodiments of the present ββvention, as for example implemented by equations 4 to 6, are always performed on the longer 60/50 s segments, in order to maximize the accuracy of the unsupervised labels.

[0124] The following paragraphs explain the self-leveraging method by setting up a recursive mathematical model for the updating, which is then also validated in Dataset I.

[0125] Assume that at iteration i < i max in embodiments of the present invention as explained in relation to equation 4 to equation 6, a decoder may be obtained with an (unknown) AAD test accuracy of pi £ [0,100]%. This means that there is a probability of p i that the reconstructed envelope using this decoder will have a higher correlation with the attended envelope than with the unattended envelope. Correspondingly, there is a 100% — p i probability that the unattended envelope will show the highest correlation. Assume for simplicity that α = 0 and β = 0. Due to the linearity of the computation of the cross-correlation vector the updated cross-correlation vector will then be, on average, equal to: with f xs the cross-correlation vector using all attended envelopes and the cross-correlation vector using all unattended envelopes. Similarly, and again due to the linearity in the computations, the corresponding updated decoder may become: with d a the decoder trained with all attended speech envelopes (which would correspond to the supervised subject-specific decoder with accuracy p a ) and the unattended decoder that would be trained with all unattended speech envelopes, having an accuracy equal to p u on the unattended labels, and thus 100% — p u on the attended labels. As a result, the reconstructed envelope using this updated decoder is a linear combination of the reconstructed envelope obtained using the (supervised) attended decoder and the (supervised) unattended decoder

The goal is now to find the AAD accuracy p i+1 of the updated decoder d i+1 in equation 7 in iteration i+1. A mathematical model is proposed for the function p i+1 = Φ(p i ) , which determines the accuracy p i+1 of the updated decoder as a function of the accuracy p ; of the previous decoder. If p i+1 > p i , this implies a self-leveraging effect in which the accuracy improves from one iteration to the next. Given that the speech envelope that exhibits the highest Pearson correlation coefficient with the reconstructed envelope is identified as the attended speaker, this implies that: with s a and s u the speech envelopes of the attended and unattended speaker. Using equation 9 and the definition of the Pearson correlation coefficient of two random vectors x and y: with the mean μ x/y and standard deviation σ x/y , equation 10 becomes:

To simplify this expression, and without loss of generality (this can always be obtained by normalizing the (reconstructed) envelopes), both speech envelopes are assumed to have a similar energy content such that it is safe to assume that, on average, . Furthermore, and are independent of pi and can be considered as random variables These random variables represent the correlation coefficients between the reconstructed envelopes using the attended/unattended decoders and the speech envelopes of the attended/unattended speakers, computed over a pre-defined window length. As such, equation 12 becomes:

Define now the new random variables and It is assumed that these random variables are normally distributed. For none of the 16 subjects in Dataset I, the Kolmogorov-Smirnov test indicates a deviation from a normal distribution, which provides empirical support for this assumption, in addition to the validation of the embodiments of the present invention. Said normally distributed random variables, having a mean and equal standard deviation, can be derived a priori from the supervised subject-specific decoders and experiments. R represents the difference between the correlation coefficients of both competing speakers when using the (supervised) attended decoder, while R 2 would be used when making AAD decisions based on the (supervised) unattended decoder. As the standard deviation of R and R 2 is mostly determined by the noise, which is the same for the attended and unattended decoder, one can assume that they have the same standard deviation s. This standard deviation can be estimated across the mean-centred variables.

Finally, one can define which is again normally distributed: with assuming that R 1 and R 2 are uncorrelated.

Equation 13 then becomes equal to P(Z > 0), or equivalently: By numerically evaluating equation 15 for p i ∈ [0,100]%, one has modelled the AAD accuracy p i+1 in iteration i + 1 as a function of the AAD accuracy pi in iteration i. Note that pi and p i+1 = Φ(p i ) refer here to the test accuracy, as the model parameters will be computed from the correlation coefficients resulting from applying the subject-specific attended/unattended decoders to left-out test data.

[0126] Fig.2 shows the modelled curve Φ(p i ) where μ 1 , μ 2 and σ are estimated from Dataset

I. The modelling is performed per subject based on the correlation coefficients of the attended and unattended decoders tested on 60 seconds (60 s) decision windows with ten-fold cross-validation. The modeled curves are then averaged across all subjects to obtain one 'universal' updating curve in Fig.2.

[0127] With respect to the verification of the Φ(p i ) model, the following should be noted: the updating curve in Fig.2 can be verified using simulations. Consider an oracle that can produce any mixture (p i , 100% — p i ) of correct and incorrect labels. Using this oracle, one can perform a sweep ofp i values and compute a decoder based on this particular ratio of correct and incorrect labels. For each p i , the corresponding decoder can be applied to the test set to evaluate p i+1 , which should be approximately equal to Φ(p i ) if the model is correct. The simulated curve shown in Fig.2 is generated using random ten-fold cross-validation, repeated five times per subject, and averaged over subjects, folds, and runs. As the simulated curve closely resembles the theoretical curve, one can confirm that the assumptions are sensible and that the theoretical updating curve, based on equation 15, is valid and useful for interpretation and analysis.

[0128] The following paragraphs aim to explain the updating method with reference to the indicated points and/or regions in Fig.2.

[0129] Point 51 in Fig.2 corresponds to p; = p * , i.e., the cross-over point. For initial accuracy p * , the updated accuracy remains the same, i.e., Φ(p * ) = p * . This cross-over point thus corresponds to the fixed/invariant point of Φ(p i ) .

Point 52 in Fig.2 corresponds to p i = 0%, i.e., the decoder is trained using only the unattended ground-truth labels and is thus equal to The updated accuracy then corresponds to 100% — p u , as the unattended decoder is used to predict attended labels. The unattended decoder generally performs worse than the attended decoder, obtaining accuracies below 100%, such that 0(0%) > 0%, ergo, an increase in accuracy.

Region 53 in Fig.2 corresponds to 0% ≤ ≤i < p * . In this region, the accuracy increases after updating, i.e., Φ(p * ) > p * . Even when using a majority of unattended speech envelopes to train the attended decoder, the accuracy increases. A possible explanation is that the resulting correlation vector still conveys information about which channels and which time lags are best suited to decode speech from the EEG, albeit unattended speech. It seems that there is still information to gain from unattended speech to compensate for the limited amount of attended speech. However, when pi increases, the increase in accuracy in general decreases (i.e., the distance to the identity line decreases), possibly because there is less and less information to gain from the unattended speech. Furthermore, it is expected that the cross-correlation of the EEG with the attended speech envelopes is on average larger than of the EEG with the unattended speech This reduces the relative weight of the unattended cross-correlation vector (e.g., see equation 7) and could make the attended cross-correlation vector more prominent in the estimated one, even when more unattended labels are used, enabling the self-leveraging effect.

Point 54 in Fig.2 corresponds to p i = 100%, i.e. the decoder corresponds to the supervised subject- specific decoder from Fig.4a, with accuracy p a . As even the attended decoder is not perfect, Φ(1 ) < 100%, which results in a decrease in accuracy. This could be due to modelling errors (limited capacity of a linear model), the low signal-to-noise ratio of the stimulus-response in the EEG, and a small amount of incorrect ground-truth labels, for example, due to the subject's attention wandering off to the wrong speaker.

Region 55 in Fig.2 corresponds to p * < p i < 100%, where the accuracy decreases after updating, i.e.,Φ(p i ) < p i - The presence of unattended labels does not add information as in region 53, suffering from the same limitations as in point 54.

[0130] The following paragraphs are dedicated to a fixed-point iteration algorithm according to embodiments of the present invention.

[0131] Using the theoretical model in Fig.2, it may be shown that the unsupervised AAD algorithm, as explained in relation to equations 4 to 6, as a so-called "fixed-point iteration" p i+1 =Φ(p i ) on this curve. Before analyzing the uniqueness and convergence properties based on the model as expressed in equation 15, an intuitive explanation is first provided of why there could only be one fixed point p * . First of all, it is safe to assume that Φ (0%) > 0%, as the unattended decoder is never perfect. Furthermore, it is very unlikely that regions 53 and 55 in Fig.2 would alternate, as this would mean that, when using more attended labels to train the decoder, there is an increase-decrease- increase of AAD accuracy (or the other way around) with respect to the initial accuracy. This implies that there is a unique fixed point of the theoretical model. It may be demonstrated that, based on the model as presented by equation 15, the existence, uniqueness, and convergence of/to the fixed point are indeed guaranteed when three reasonable conditions on the accuracy p a of the (supervised) attended decoder and the accuracy p u of the (supervised) unattended decoder (on the unattended speech) are satisfied. Furthermore, it may be demonstrated that these conditions are satisfied for all subjects in both datasets.

[0132] These properties of the fixed-point iteration are also intuitively apparent from Fig.2 and hold in practical examples. This means that the updating algorithm could be initialized with any decoder, as one would always arrive at the fixed point p * , explaining why the updating procedure is possible starting from a random decoder. Figs.3a, 3b and 3c show how the fixed-point paths (on average over all folds) follow the theoretical model for three representative subjects of Dataset I, starting from a random decoder.

[0133] The fixed point p * based on the theoretical model (where the means and standard deviation in equation 15 are computed per subject individually) should thus give a good approximation of the unsupervised AAD accuracy p * . Across all 16 subjects of Dataset I, on 60s decision windows, the mean absolute error between the predicted and actual unsupervised AAD accuracy is 3.45%. It can thus accurately be predicted how well the unsupervised updating will perform by computing the fixed point of equation 15, where the parameters μ 1 , μ 2 and σ in equation 15 can be easily computed from the corresponding supervised subject-specific decoders. Furthermore, as mentioned above, the model according to equation 15 also allows showing convergence to this fixed point when three reasonable conditions are satisfied.

[0134] The following paragraphs aim to further validate the unsupervised algorithm according to embodiments of the present invention based on the two datasets and compare it with a supervised subject-independent and supervised subject-specific decoder.

[0135] First the proposed unsupervised algorithm is evaluated using a random initialization and without using any prior knowledge. As such, referring to equations 4 to 6, one sets a = 0 and β = 0. The cross-correlation vector is initialized at random from a uniform distribution. Figs.4a and 4b show the AAD accuracy (mean over subjects; the shading represents the uncertainty in the data) as a function of decision window length and the MESD values per subject for the supervised subject-specific decoder, the subject-independent decoder, and the proposed unsupervised subject- specific decoder (with random initialization), for both datasets.

[0136] It is clear that a supervised subject-specific decoder outperforms a subject- independent decoder on both datasets, see Figs.4a and 4b. A Wilcoxon signed-rank test between the MESD values, with a Bonferroni-Holm correction for multiple comparisons, confirms this (Dataset I: n = 16, p = 0.0022, Dataset II: n = 18, p = 0.0030). On both datasets, the proposed unsupervised subject-specific decoder with random initialization outperforms the subject- independent decoder as well (although less clearly on Dataset II) and approximates the performance of the supervised subject-specific decoder, especially for the shorter decision window lengths, while not requiring ground-truth labels and thus retaining the ' plug-and-play' feature of the subject- independent decoder. A Wilcoxon signed-rank test between the MESD values, again with a Bonferroni-Holm correction, shows a significant difference between the unsupervised subject-specific decoder with random initialization and the supervised subject-independent decoder on Dataset I (n = 16, p = 0.0458), but not on Dataset II (n = 18, p = 0.5862). Lastly, there is a significant difference between the supervised and unsupervised subject-specific decoder with random initialization (Dataset I: n = 16, p = 0.0034, Dataset II: n = 18, p = 0.0010).

[0137] Note that this last result is not per se a negative result: it is not expected that an unsupervised subject-specific decoder, updated starting from a completely random decoder, performs as well as the supervised version. The most important result is that the proposed unsupervised algorithm outperforms a subject-independent decoder, even when starting from a random decoder, while not requiring subject-specific ground-truth labels as well. Furthermore, such an unsupervised algorithm could be implemented on a generic hearing device, which trains and adapts itself from scratch to a new user.

[0138] Referring to Fig.5a, there is depicted the AAD accuracy as a function of the iteration index for all subjects of Dataset I. Computing a decoder with the subject-specific autocorrelation matrix, but with a random cross-correlation vector, seems not to perform better than chance (iteration 0). Surprisingly, even after one iteration of predicting the labels using the decoder after iteration 0, which performs on chance level, and updating the cross-correlation vector, a decoder is obtained that on average performs with ≈ 75% accuracy on 60s decision windows. This implies that even using a random mix of attended and unattended labels results in a decoder that performs much better than chance. In the following iterations, the decoder keeps improving, settling after 4-5 iterations. This matches the aforementioned fixed-point iteration interpretation and Figs.2, 3a, 3b, and 3c, explaining the self-leveraging mechanism.

[0139] To use the information in the subject-independent decoder to our advantage, one can put α ≠0 and β ≠ 0 in equations 4 to 6. By adding subject-independent information to the estimation of both the autocorrelation matrix and the cross-correlation vector, the updating behaviour can further be improved when starting from a random initialization. Especially in the estimation of the cross-correlation vector, the subject-independent cross-correlation vector, which is estimated using ground-truth labels, can compensate for prediction errors.

[0140] The initial autocorrelation matrix and cross-correlation vector are determined using the (supervised) information of all other subjects. The hyperparameters α and β are determined empirically. For Dataset I, α = 0 is chosen, i.e., no subject-independent information is used in the autocorrelation estimation. Furthermore, β = 1/3, i.e., the subject-independent cross- correlation is half as important as the computed subject-specific one, is chosen.

[0141] The results on Dataset I of this unsupervised subject-specific decoder using subject- independent information are shown in Fig.4a and 4c. Remarkably, the unsupervised procedure here results in a decoder that very closely approximates the supervised subject-specific decoder, without requiring subject-specific ground-truth labels. Based on the MESD values, there is no significant difference to be found between the supervised and unsupervised subject-specific decoder with subject-independent information (Wilcoxon signed-rank test with Bonferroni-Holm correction: n = 16, p = 0.3259). For 6 subjects, the unsupervised decoder performs even better than the supervised subject-specific one, see also Fig.4c. Furthermore, note that using the subject- independent information with respect to a random initialization and no further information not only fixes poor updating results for some of the outlying subjects but also improves on most other subjects (12 out of 16).

[0142] For Dataset II, it turns out that α = 0.5 and β = 0.5, i.e., an equal weight to the subject-specific and subject-independent information, are good choices. Given that the unsupervised subject-specific decoder with random initialization performs worse compared to Dataset I, it is not unexpected that a larger weight b of the subject-independent information is required to improve on the unsupervised procedure.

[0143] Fig.4b and Fig.4d shows the results on Dataset II of the unsupervised procedure with subject-independent information and with the aforementioned choices of the hyperparameters. The usage of subject-independent information results here in an even larger improvement over the random initialization (e.g., both in MESD, for 15 out of 18 subjects, as spread around the median in Fig.4d) and again closely approximates the supervised subject-specific performance, without requiring subject-specific ground-truth labels. However, based on the MESD values in Fig.4d, there is still a significant difference to be found between the supervised and unsupervised subject-specific performance (Wilcoxon signed-rank test with Bonferroni-Holm correction: n = 18, p = 0.0498), albeit very close to the significance level of 0.05. This indicates again that the unsupervised procedure with subject-independent information closely approximates the supervised subject-specific performance without ground-truth labels. Furthermore, the unsupervised decoder has a higher performance for four subjects (out of 18) relative to the supervised subject-specific decoder. Lastly, there now is a clear significant difference between the MESD values of the unsupervised procedure and the subject-independent decoder (Wilcoxon signed-rank test with Bonferroni-Holm correction: n = 18, p = 0.0030). [0144] Using some information about other subjects, one can thus adapt a stimulus reconstruction decoder that performs almost as well as a supervised subject-specific decoder, but without requiring ground-truth information about the attended speaker during the training procedure.

[0145] Fig.5b shows the AAD accuracy as a function of the different steps of embodiments according to the present invention, and in particular in relation to equations 4 to 6, for all subjects of Dataset I. It appears that fully replacing (i.e., α = 0) the autocorrelation matrix in the subject- independent decoder with the subject-specific information, which is a fully unsupervised step, already results in a substantial increase in accuracy, despite the resulting mismatch between the auto- and cross-correlation matrix/vector ('after autocorrelation update' versus 'subj.-indep.' in Fig.5b). Further updating the cross-correlation vector with the predicted labels while using subject-independent information with results in a self-leveraging effect, leading to a further increase in accuracy, which converges after a few iterations similarly to Fig.5a.

[0146] Hence, the proposed unsupervised self-adaptive algorithm implemented in embodiments according to the present invention, paves the way for further extensions and applications. A batch-version of the algorithm was presented, i.e., the updating is performed on a large dataset of EEG and audio data. This enables the 'plug-and-play' capabilities of a stimulus reconstruction decoder for a new hearing device user. However, embodiments of the present invention could be extended to an adaptive version, tailored towards the application of neuro- steered hearing devices, where EEG and audio data are continuously recorded. As a result, the stimulus reconstruction decoder could automatically update itself in an unsupervised manner when new data comes in and adapt to changing conditions and situations (e.g., non-stationarities in neural activity, changing electrode-skin contact impedances, etc.).

[0147] The aforementioned adaptive implementation of the unsupervised procedure may also enable and improve the success of neurofeedback effects in a closed-loop implementation, of which preliminary studies have stressed the importance for AAD. The interplay of the subject and the adaptive updating algorithm as implemented in embodiments of the present invention in a closed- loop system may further improve the AAD performance, as the subject learns to control the updating procedure.

[0148] Hence, it has been shown that embodiments according to the present invention allow training a subject-specific stimulus reconstruction decoder for AAD using an unsupervised procedure, i.e., without requiring information about which speaker is the attended or unattended one. Training such a decoder on the data of a particular subject from scratch, even starting from a random decoder and without any prior knowledge, leads to a decoder that outperforms a subject-independent decoder. Unsupervised adaptation of a subject-independent decoder, trained on other subjects, to a new subject even leads to a decoder that closely approximates the performance of a supervised subject-specific decoder. The proposed updating algorithm, implemented in embodiments according to the present invention, thus combines the two main advantages of a supervised subject-specific and subject-independent decoder: (i) it substantially outperforms a subject-independent decoder, approximating the performance of a supervised subject-specific decoder, and (ii) it can be used in a ' plug-and-play' fashion, without requiring ground-truth labels and potentially automatically adapting to changing conditions without external intervention.

[0149] The unsupervised algorithm as implemented in embodiments according to the present invention can be seen as a fixed-point algorithm, which, without wishing to be bound by theory, may explain why there is a self-leveraging effect, even when starting from a random decoder. Furthermore, using these embodiments, accurate predictions could be made of the accuracy of the unsupervised decoder starting from the results of the supervised subject-specific decoder.

[0150] The proposed unsupervised self-adaptive algorithm as implemented in embodiments according to the present invention, can be used in an online and adaptive manner in a practical neuro-steered hearing device, allowing the decoder to automatically adapt to the non-stationary brain and changing environments and conditions. Furthermore, it avoids having a cumbersome a priori training stage for each new hearing device user, as it automatically adapts to the new user. The developed methods according to embodiments of the present invention, may enable stronger neurofeedback effects when using a closed-loop system, which is useful for the successful application of AAD.

Second Example implementation (CSP)

[0151] A model for determining the direction focus of attention of a subject can be updated according to embodiments of the present invention. Such a model allows locating the attended speaker relative to the subject and thus determine the direction of auditory attention.

[0152] One way of performing such a localization is via common spatial pattern (CSP) filtering, as described in ‘'Fast EEG-based decoding of the directional focus of auditory attention using common spatial patterns" (Geirnaert S et al., bioRxiv 2020.06.16.154450). In this case the measured recording of brain activity is required as input but the audio stimulus itself is not needed in the decoding process.

[0153] The model to be updated functions as follows. Consider two speaker locations (e.g.,

90 degrees at the left or the right of the subject). Using the C-channel EEG signal x(t) at time point t, this EEG signal can be classified into one of two different classes: C 1 (e.g., representing the case where the attended speaker is located at the left) and C 2 (e.g., representing the case where the attended speaker is located at the right). A set of K linear CSP filters W then generates K output signals y(t) = W T x(t), half of which maximize the output energy when t ∈ C 1 , while minimizing the output energy when t ∈ C 2 (the other way around for the other half). For example, consider the first CSP filter (i.e., the first column of W), which is the result of the following optimization problem of equation 16: with R C 1 the autocorrelation matrix of the EEG when listening to the speaker e.g., at the left, while R C2 is the autocorrelation matrix of th EEG when listening to the speaker e.g., at the right. The autocorrelation matrices can be constructed in the same way as for stimulus reconstruction, but only taking time samples that are classified into the corresponding class. This optimization problem can be rewritten, such that can be retrieved by solving a generalized eigenvalue problem (GEVD) according to equation 17: where corresponds to the generalized eigenvector corresponding to the largest generalized eigenvalue. The same can be done for the other K — 1 filters, while requiring that they are uncorrelated to each other. In the end, all the K CSP filters contained in W can be found from the above generalized eigenvalue problem, by taking the K eigenvectors corresponding to the largest and smallest eigenvalues. Note that similarly to equation 1, x(t ) can be extended with multiple time lags to perform a spatio-temporal filtering with W, instead of the previously described spatial filtering. Another possibility to include temporal filtering, without increasing the number of to-be- trained parameters of the CSP filter, is training a set of CSP filters per frequency band. This is called the filter bank CSP filtering (FB-CSP), and leads to KB output signals (assuming B different frequency bands).

[0154] The K output signals y(t) are then segmented into different windows. For each window of length T, the output signals, containing KT time samples, are compressed into a K- dimensional feature vector f, containing the log-energy over each output signal y k (t), according to equation 18: with the output energy for the k-th output signal according to equation 19:

[0155] This feature can then be classified using any classifier. A typical choice is a linear discriminant analysis (LDA) classifier, in which a K-dimensional linear filter v is optimized to maximize the in-between class scatter, while minimizing the within-class scatter. This leads to the following solution according to equation 20:

With w the covariance matrix over all features f , and the class means. / is classified into class C 1 when and into class C 2 when , using bias term [0156] Conventionally, the training of the CSP filters and the LDA classifier require knowledge of the labels. However, by using a method according to the present invention to update the model, these labels are not required to be known. An initial CSP filter and LDA classifier can be used to predict the labels in step S2 as explained hereinbefore. Such an initial pair of CSP filter and LDA classifier can, e.g., be trained subject-independently, i.e., on data of other subjects. The LDA classifier can already be updated in an unsupervised manner (i.e., without knowing the labels) by only updating the bias term, which is independent of the labels is the class mean over all data, independent of the labels).

[0157] In step S3, the CSP filters and/or the LDA classifier can be updated using the predicted labels, by recomputing the class-autocorrelation matrices and GEVD (for the CSP filters) and/or the LDA filter, by updating the class means. Alternatively, the initial LDA classifier of step S2 gives rise to a confidence metric by using the posterior probability of the class label depending on the given feature vector. This posterior probability can be modelled using a multivariate normal density as the likelihood function (using the class-dependent means μ 1 , μ 2 and feature covariance matrix) and a pre-set prior probability (e.g., uniform). This essentially means that the distance to the LDA decision boundary is used as confidence metric. This confidence metric can then be used to weight the individual windows in the computation of the class-autocorrelation matrices and/or feature-means, or they can be thresholded, such that only windows with a high confidence are included in the updating procedure.

[0158] It is noted that this procedure can easily be extended to multiple classes (i.e., multiple directions), by training CSP filters and an LDA classifier to discriminate each direction from all other directions (i.e., one-versus-all coding) and assigning the label with the highest confidence.

Third Example Implementation

[0159] In a third example implementation of a method according to embodiments of the present invention, the model to be updated is a model for performing a match-mismatch classification, i.e. a model for determining whether an auditory stimulus temporally matches a recorded neural response. Contrary to an AAD task, where participants need to listen to one of the two or more competing speakers, participants can only hear one single speaker in an MM task. A (mis)matched envelope is expected to be (un)correlated to a neural response. Although in the following an EEG signal is referred to, it will be understood that this method is also applicable to measured recordings of brain activity that are not EEG, as described elsewhere herein. Similarly, a speech envelope is mentioned as the auditory stimulus, but other possibilities for the auditory stimulus are encompassed by the present invention, as described elsewhere herein.

[0160] In this example, an unsupervised canonical component analysis (CCA) combined with a latent discriminant analysis (LDA) model is used. First, a supervised CCA+LDA model (not part of the present invention) is described, where the supervision consists of providing the CCA model only with 'matched' audio & EEG segments, and providing the LDA model with 'match' as well as 'mismatch' examples (including labels on whether it is a match or mismatch example) to learn it to discriminate between both. Then, an unsupervised CCA+LDA model is described along with the method of updating the model according to embodiments of the present invention. In this case, the CCA+LDA model has to determine on its own which segments are 'matched' or 'mismatched' examples such that these self-generated labels can be used for updating itself.

[0161] The supervised CCA+LDA model has two parts. First, correlation features in a latent space where the EEG signal and the auditory envelope are maximally correlated are computed, and then these features are classified with Fisher's linear discriminant analysis (LDA).

[0162] Feature extraction: Suppose an EEG signal X(t, c) with C channels is recorded while a subject is listening to some speech with an envelope s(t). CCA looks for a (linear) transformation of the EEG signal and the envelope, such that they are maximally correlated. The speech envelope is transformed with a temporal encoder. The EEG signal is transformed with a spatio-temporal decoder. The EEG signal and envelope are lagged with L x and L s samples, respectively, to construct the temporal filters. The envelope is also delayed with S samples to correct for the lag between an auditory stimulus and its neural response. The resulting transformed signals are set out in equation 21:

The time-lagged EEG signal can be rewritten as a time-dependent vector x(t) according to equation 22:

Similarly, the lagged speech envelope can be written according to equation 23: which allows the equations in (21) to be simplified according to equation 24: d and e are chosen such that have a maximal expected Pearson correlation, as expressed in equation 25: is the cross-correlation between the EEG signal and the speech envelope. 38 autocorrelation matrices and the EEG signal and the envelope, respectively.

[0163] This optimization function has as solution the joint generalized eigenvalue problem of

5 equation 26:

Equation 26 has at most L s independent solutions, assuming without loss of generality that are of full rank and When the CCA model is applied to new test data, it can thus compute independent Pearson correlations, which are then provided to Fisher's LDA classifier. These

10 correlations are expected to be large for matched segments, and close to zero for mismatched segments.

[0164] Note that the decoder d can have a large dimension, which can make it prone to overfitting. This can be avoided by reducing the number of channels C of the EEG signal with principal component analysis.

15 [0165] Classification: The CCA model has as output a feature vector This feature vector either belongs to the matched p+ class with class average and covariance ∑ +, or the mismatched class, with class average p. and the covariance ∑.

[0166] The feature vector is then classified with Fisher's LDA with a classification vector that maximizes the scatter between classes and minimizes the scatter within classes as set

20 out in equation 27:

It can be shown that equation 28 maximises equation 27. p belongs to the matched class if w T p > t, with t being a threshold. 5

Unsupervised CCA+LDA

[0167] Suppose that N segments containing both an EEG signal and an auditory stimulus are recorded. For simplicity, it is assumed that half of the segments are matched and half of the segments are mismatched. However, it is not known which of the segments are matched and which are not. According to embodiments of the present invention, the unsupervised CCA model iteratively predicts the labels of these segments and then updates itself using its own self-predicted labels. In this manner the CCA model can train itself on new data without supervision, i.e. without requiring labelled data. [0168] Feature extraction: in order to optimize the decoder d and the encoder e, three correlations matrices must be estimated: is the autocorrelation of the EEG signal, and can be updated irrespective of the label of the new segment. R ss is the autocorrelation of the speech envelope. In general, it is not expected that matched speech follows different statistics than mismatched speech. This matrix can therefore also be updated irrespective of the label of the new segment. R xs is the cross-correlation between the EEG signal and the speech envelope. If the segment labels are classified with an accuracy p, R xs can be written as with - the cross-correlation of matched and mismatched segments, respectively.

[0169] Since EEG signals and mismatched speech envelopes are inherently unrelated, all elements of R xs. are nearly random and expected to be zero on average. The term - thus adds random, unwanted noise to the estimation o , yet is expected to be significantly smaller than as long as p is not too close to zero.

[0170] Therefore, it is expected that the feature extraction improves iteratively up to a certain convergence point, as described hereinbefore.

[0171] Classification: if the classifier in equation 28 were trained with random labels to estimate m + and m., it would also give a random classifier in return. The classification method of the supervised CCA+LDA model is therefore not suitable for unsupervised classification. The classifier is of the N feature vectors p,. It is further assumed that there are as many matched as mismatched segments (this assumption is for the sake of an easy exposition). Furthermore, it is pointed out that since mismatched envelopes and EEG measurements are expected to be uncorrelated. From this, equation 29 follows:

It can also be shown that the covariance can be written according to equation 30:

Substituting 29 and 30 into 28 results in equation 31:

It is noted that equation 31 is only valid when there are as many matched as mismatched segments in the dataset. However, if it is assumed that equation 31 can be expanded to a more general case in which an arbitrary fraction q>0 of the segments are matched, as set out in equation 32:

Either of these assumptions can be further relaxed with more advanced techniques.

[0172] A feature vector is classified as matched if with t being a threshold value. One possible threshold is the point midway between the projection of the two class averages, [0173] The CCA algorithm to label a batch of N independent segments is thus as follows:

It is straightforward to modify the algorithm for applications with a continuous stream of new segments at the input. In such cases, the segment is first classified by the model, and then used to update the unsupervised CCA+LDA model according to the rules of the algorithm above.

Experimental implementation [0174] 48 Flemish speaking participants listened to a selection from ten different Flemish fairy tales of approximately 14 minutes. Two of the six fairy tales have a female speaker. For each subject but one, between 80 and 120 minutes of data are available. For one subject only 28 minutes of data are available. The subject is therefore removed from further analysis. More details about the used data are available in “An LSTM Based Architecture to Relate Speech Stimulus to EEG", (M.J Monesi et al., ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2020- May, IEEE Inc., 5 2020, pp. 941-945). The data is preprocessed according to the suggested methods in “Auditory-inspired speech envelope extraction methods for improved EEG- based auditory attention detection in a cocktail party scenario" (W.Biesmans et al. IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 25, no. 5, pp. 402-412, 42017).

[0175] The auditory stimulus is first split in subbands with a gammatone filter. The envelope of each subband is then extracted with the powerlaw method and an exponent of 0.6. The subband envelopes are afterwards all added together. The artefacts from the EEG are first removed with a Wiener filter. Both the envelope and the EEG signal are then filtered with a bandpass filter between 0.5 and 32 Hz and downsampled to 64 Hz.

[0176] The EEG and stimulus are subsequently cut in segments of a certain window length.

The mismatched segments are created by delaying the speech envelope with 10s relative to the EEG signal before segmenting the data. This way, each EEG and audio signal is present in two segments: one matched and one mismatched. To avoid data leakage, the matched and mismatched segment with the same EEG signal are always kept in the same cross-validation fold during training.

[0177] The iterative procedure according to embodiments of the present invention converges to some fixed point p*, as described hereinbefore. In a first experiment, the location of this fixed point is investigated, and how fast the model converges to this point. This experiment is split in two parts. In the first part, the accuracy p i+1 of the predictions of an unsupervised CCA model is computed, when it is trained with segments of which a fraction p i ∈ [0,1] is matched. This curve is called the updating curve. The fixed point is located at the point where p i = p i+1 . If p i < p i+1 , the model improves with each iteration. If p i < p i+1 , the model worsens. In the second part, it is observed how the unsupervised CCA model iteratively moves through the updating curve after being initialised with random labels (p0 = 0.5). The predicted labels are used as the new training labels in each subsequent iteration. This process is repeated until the model converges.

[0178] It is noted that the final performance of the unsupervised model is not necessarily equal to the expected performance of the model on completely new data. There are two potential locations of data leakage: The classifier of segment n is trained with feature vectors m ¹ n. Each feature vector m ¹ n is extracted with a model that was trained on segment n.

Segment n is used to train a classifier that decides whether a segment m ¹ n is a match or not. This determines whether segment m will be used in the estimation of cross-correlations in the next iteration.

Segment n can thus directly influence which segments are used to train the unsupervised CCA model according to embodiments of the present invention.

[0179] The true performance of the model is measured in a second experiment. In this experiment, the dataset is initialised with random labels - thus po = 0.5 - and split in a training and a test set. The training set first predicts its own labels in an inner 10-fold cross-validation, similar to the first experiment. The training set is then used with predicted labels to construct the CCA model that classifies the completely new test set. Only the performance of the CCA model on the test set is recorded. This process is repeated with changing training test set in an outer 10-fold cross-validation loop.

[0180] The number of channels in the EEG signal is reduced from 64 to NPC = 32 with PCA, and K = 5 linearly independent decoders and encoders are used to construct the feature vector. Both the EEG signal and the envelope use lags up to 250ms, which corresponds to L x = L s = 17 lags. The EEG signal is also delayed with 200ms with respect to the speaker envelope, which corresponds to S = 13. The maximal number of iterations i max was limited to 10 in the first experiment, and to 1 in the second experiment.

[0181] In the first experiment, the accuracy r, +1 of the unsupervised CCA model was recorded when trained using labels of which only a fraction p, were correct. Fig.6 shows how the algorithm quickly converges to p* for random label initialisation (po = 0.5). In the first iteration, the model already achieves an accuracy pi which is only about 2% worse than the fixed point p*. The model is then retrained with its own predictions, and reaches an accuracy p2 which is less than 0.5% worse than p* for both 10s (Fig.6a) and 60s (Fig.6b) window lengths.

[0182] The model still performs well when 60% of the segments are mislabeled. This is possible due to the small influence of mismatched segments on the cross-correlation matrix. The performance only reduces strongly when p i < 0.4. The cross-correlation matrix becomes more and more random as p i -> 0, which causes the model to perform around chance level at p, = 0.

[0183] In the second experiment, the unsupervised CCA model was used to classify a completely new test set. Based on the findings in the first experiment, 60s long window lengths were used to label each training segment, and the labels were updated only once. The model was then applied on a completely new test set with a varying window length, as shown in Fig.7. The unsupervised CCA model (uCCA, uLDA) is compared to three other models: a fully supervised subject specific CCA model (CCA, LDA), a fully supervised subject independent model (SI CCA) and a subject specific model with unsupervised feature extraction and supervised classification (uCCA, LDA). Although not used in practice, the latter allows the influences of unsupervised feature extraction and unsupervised classification to be studied separately.

[0184] Comparing (uCCA, uLDA) to (uCCA, LDA) shows that unsupervised classification is only significantly worse than supervised classification for window lengths longer than 20s (p = 0.03, p = 1.6e-4, p = 3.8e-4 respectively for window lengths of 20, 30 and 60s and p > 0.14 for all other window lengths, paired Wilcoxon signed rank test). The difference in accuracy increases with the window length, and reaches 1.1% for 60s window lengths. This is because the classifier is trained with fewer segments as the window length increases, which makes the estimation of and more prone to noise. (uCCA, LDA) performs with its unsupervised feature extraction between 0.43% and 0.05% worse than (CCA, LDA), which requires supervised training. The difference is maximal for 20s window lengths, and significant for all window lengths but three: 1, 2 and 60s (p < 0.05, Wilcoxon signed rank test).

[0185] Overall, the average difference in performance between the fully unsupervised model

(uCCA, uLDA) and the fully supervised model (CCA, LDA) increases with increasing window length. The loss is limited to 0.6% (p = 1.2e - 4) for 10s window lengths and reaches 1.2% (p = 2.4e-5) at 60s long windows.

Fourth Example Implementation

[0186] In this example it is shown how an instantaneous realization of the iterative process as presented in Claim 1 can be used to cast the least-squares based attended stimulus decoding and classification problem (see the first example) into a binary maximization problem. Again, the goal is to reconstruct the segment of an attended stimulus based on a time-lagged EEG segment To this end, a least squares (LS) decoder is trained on N different training segments: with N the total number of segments in the training data, L the number of time lags, C the number of channels and T the length of a segment.

[0187] Assume now that two stimuli y t and y 2 are given with corresponding EEG data X, which was recorded while the two stimuli were playing simultaneously. During supervised training of the decoder, has to be set to either depending on which of both stimuli the subject was attending to during segment i. This requires ground truth attention labels during the training phase, which makes the training supervised (note that only r xy contains supervised model parameters, as all other parameters do not require these labels).

[0188] During the testing phase, a common way to determine to which of both stimuli the subject was attending, is by applying the pre-trained decoder to the EEG segment X (i) and correlating the output with both stimuli. The stimulus that results in the highest correlation is then labelled as the attended stimulus (see also the first example). The decoder output is given by The correlation between the decoder output and is then given by (assuming normalized stimuli). Therefore, if

It is concluded that stimulus was attended (we define [0189] Equation (34) is a way to decide whether either is the attended stimulus in segment i. To this end, we define the binary label z i which is equal to if is the attended stimulus, and z i = 0 otherwise. Based on (34), one should set z i such that is maximal.

[0190] Ifz* is now defined as the vector containing all the binary labels for all the segments, then the labelling procedure based on (34) is equivalent to maximizing the following sum over all possible binary vectorsz: with is a binary variable with elements

[0191] In an unsupervised setting where there are no ground truth labels, the supervised parameters r xy as shown in (33) cannot be computed. Instead, one encodes an instantaneous version of the iterative feedback loop from Claim 1. The supervised parameters r xy are now computed based on the predicted labels from (35). This means that r xy itself becomes a function of the predicted labels z (instead of ground truth labels) , i.e. Plugging this in (35) implies that we encode an instantaneous realization of the feedback loop of Claim 1 within (35), which then becomes: This casts the method of updating a model into an equivalent optimization problem, of which the solution will correspond to a convergence point of the iterative process described in claim 1. Solving (36) would lead to a set of labels that are indeed invariant under the iterative process, which is also referred to as a fixed point of the iterative process in Claim 1. [0192] In summary, according to embodiments of the present invention, a fully unsupervised subject-specific model is presented, which is iteratively updated without requiring ground-truth labels for the data used in the updating. Embodiments of the present invention update themselves based on their own predictions in the previous iteration, resulting in a self-leveraging effect. As such, they should automatically adapt to a new subject, integrating the two major advantages of a subject- specific and subject-independent decoder, i.e. a higher performance than a subject-independent decoder, and retaining the unsupervised 'plug-and-play' feature of a subject-independent decoder, thus without requiring knowledge about the labels during training. Furthermore, such a self-adaptive algorithms could be applied adaptively in time, when EEG and audio data are continuously recorded, adapting to changing conditions and situations. [0193] Embodiments of the present invention include system for updating a model for determining a property of an audio stimulus and/or a measured recording of brain activity based on a set of parameters, as described hereinbefore. The system comprises a data storage configured to store an initial model and at least a measured recording of brain activity either or not in response to an audio stimulus or at least one feature derived from a measured recording of brain activity either or not in response to an audio stimulus, and a processor configured to retrieve the initial model and at least a measured recording of brain activity either or not in response to an audio stimulus or at least one feature derived from a measured recording of brain activity either or not in response to an audio stimulus from the data storage, wherein the processor is further configured to predict at least one label of the audio stimulus and/or the measured recording of brain activity using the initial model, to generate an updated model by recalculating at least one parameter of the initial model, wherein the recalculation is dependent on at least the at least one predicted label and/or a confidence metric corresponding to the at least one predicted label. The processor is further configured to predict at least one label of the audio stimulus and/or the measured recording of brain activity using the updated model, to generate a second updated model by recalculating at least one parameter of the updated model, wherein the recalculation is dependent on at least the at least one predicted label and/or a confidence metric corresponding to the at least one predicted label.

[0194] The processor may provide the recalculated at least one parameter of the initial or updated model to the data storage for storage therein. The system may comprise or be comprised in, for example, a computer, laptop, mobile computing device such as a smartphone, tablet, a microcontroller, etc. The processor may be a microprocessor comprised in a microcontroller. The processor and the storage may be comprised in the same device or in separate devices provided that communication is possible between the storage and the processor by wired or wireless means, for example Ethernet, Bluetooth, wireless internet connection. For example, the processor may be comprised in a cloud computing system and the storage may be comprised in a mobile computing device which can connect to the cloud computing system via a wireless or ethernet connection. The processor and the storage may be comprised in a hearing assistance device such as a hearing aid or a cochlear implant. The processor and/or the storage may be configured to communicate with a hearing assistance device such as a hearing aid or cochlear implant, for example by Bluetooth or wireless communication, without being comprised in the hearing assistance device. The processor and the storage may be configured to perform any method according to embodiments of the present invention.

[0195] Methods according to embodiments of the present invention may be used in neuro- steered headphones or earphones. The method can be used to update a model for determining auditory attention in a situation where a subject is listening to a source played through the head- /earphones (e.g., via a playback device streaming audio to the head-/earphones) and ambient sound coming from outside the head/-earphones (e.g., a voice). The ambient sound can be captured by built-in microphones of the head-/earphones and can be amplified when the attention is shifted towards these sources, as determined by the model. The source that was played through the head- /earphones can then be automatically paused or reduced in volume depending on the direction of attention. Similarly, if it is determined by the model that attention is directed to the source in the head-/earphones, the outside sources can be at least partially filtered out.

[0196] While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. The foregoing description details certain embodiments of the invention. It will be appreciated, however, that no matter how detailed the foregoing appears in text, the invention may be practiced in many ways. The invention is not limited to the disclosed embodiments.

[0197] Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single processor or other unit may fulfil the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.