Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD AND SYSTEM FOR CLASSIFYING SLEEP RELATED BRAIN ACTIVITY
Document Type and Number:
WIPO Patent Application WO/2020/248008
Kind Code:
A1
Abstract:
A machine learning method and system for classifying sleep related brain electrical activity is disclosed. The method comprises receiving brain electrical activity data defining a time dependent variation of brain electrical activity as a function of time and processing the brain electrical activity data to determine one or more time dependent features each characterising an aspect of the brain electrical activity as a function of time. The method further includes classifying the brain electrical activity at the selected time based on the one or more time dependent features, wherein the machine learning system is trained on a training set of brain electrical activity data to apply a classification of brain electrical activity for a given time based on values of the one or more time dependent features as determined for an extended time period relative to the given time.

Inventors:
HARTMANN SIMON (AU)
BAUMERT MATHIAS (AU)
Application Number:
PCT/AU2020/000053
Publication Date:
December 17, 2020
Filing Date:
June 15, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV ADELAIDE (AU)
International Classes:
G06N3/08; A61B5/0476; G06N3/02; G06N3/10
Foreign References:
CN105559777A2016-05-11
Other References:
HUY PHAN ET AL.: "Automatic Sleep Stage Classification Using Single-Channel EEG: Learning Sequential Features with Attention-Based Recurrent Neural Networks", 40TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC, 18 July 2018 (2018-07-18), Honolulu, HI, USA, pages 1452 - 1455, XP033431880, DOI: 10.1109/EMBC.2018.8512480
KUSUMIKA KRORI DUTTA: "Multi-class time series classification of EEG signals with Recurrent Neural Networks", 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2019, 10 January 2019 (2019-01-10), Noida, India, pages 337 - 341, XP033584832, DOI: 10.1109/CONFLUENCE.2019.8776889
YILDIRIM OZAL, BALOGLU ULAS, ACHARYA U: "A Deep Learning Model for Automated Sleep Stages Classification Using PSG Signals", INT. J. ENVIRON. RES. PUBLIC HEALTH, vol. 16, no. 599, 19 February 2019 (2019-02-19), pages 1 - 21, XP055771813, DOI: 10.3390/ijerph16040599
ALI ABDOLLAHI GHARBALI: "Sleep Stage Classification: A Deep Learning Approach", DOCTORAL DISSERTATION, November 2018 (2018-11-01), pages 1 - 187, XP055771815, Retrieved from the Internet [retrieved on 20200608]
Attorney, Agent or Firm:
MADDERNS PTY LTD (AU)
Download PDF:
Claims:
CLAIMS

1. A computer-implemented machine learning method for classifying sleep related brain electrical activity at a selected time, the method comprising:

receiving brain electrical activity data defining a time dependent variation of brain electrical activity as a function of time;

processing the brain electrical activity data to determine one or more time dependent features each characterising an aspect of the brain electrical activity as a function of time;

classifying by a machine learning system the brain electrical activity at the selected time based on the one or more time dependent features, wherein the machine learning system is trained on a training set of brain electrical activity data to apply a classification of brain electrical activity for a given time based on values of the one or more time dependent features as determined for an extended time period relative to the given time.

2. The computer-implemented machine learning method of claim 1, wherein the machine learning system is based on a recurrent neural network (RNN).

3. The computer-implemented machine learning method of claim 2, wherein the RNN is a long- short term memory (LSTM) implementation.

4. The computer-implemented machine learning method of claim 2, wherein the RNN is a bidirectional LSTM (Bi-LSTM) implementation.

5. The computer-implemented machine learning method of any one of claims 1 to 4, wherein the one or more time dependent features comprises a power measure characterising a variation in power of brain electrical activity as a function of time.

6. The computer-implemented machine learning method of claim 5, wherein the power measure comprises a Hjorth activity.

7. The computer-implemented machine learning method of claim 5 or 6, wherein the power measure comprises a Band Power Descriptor (BPD).

8. The computer-implemented machine learning method of any one of claims 1 to 7 wherein the one or more time dependent features comprises an energy measure characterising a variation in energy of brain electrical activity as a function of time.

9. The computer-implemented machine learning method of claim 8, wherein the energy measure comprises a Teager Energy Operator (TEO).

10. The computer-implemented machine learning method of any one of claims 1 to 9, wherein the one or more time dependent features comprises an entropy measure characterising a variation in entropy of brain electrical activity as a function of time.

1 1. The computer-implemented machine learning method of claim 10, wherein the entropy measure comprises a Shannon entropy.

12. The computer-implemented machine learning method of any one of claims 1 to 11, wherein the one or more time dependent features comprises a frequency shift measure characterising a shift in frequency of the brain electrical activity as a function of time.

13. The computer-implemented machine learning method of claim 12, wherein the frequency shift measure comprises a Differential Variance (DV).

14. The computer-implemented machine learning method as claimed in any one of the preceding claims, wherein the training set of brain electrical activity data is imbalanced and the method further comprises compensating for the imbalanced training set of brain electrical activity data for training the machine learning system.

15. The computer-implemented machine learning method of claim 14, wherein compensating for the imbalanced training set of brain electrical activity data comprises adopting an error function for training the machine learning system that penalises false classification of an underrepresented class in the brain electrical activity data.

16. The computer-implemented machine learning method of claim 15, wherein the error function is a -score .

17. The computer-implemented machine learning method as claimed in any one of the preceding claims, further comprising pre-processing the brain electrical activity data and/or the training set of brain electrical activity data to reduce subject related variability.

18. The computer-implemented machine learning method of claim 17, wherein pre-processing the brain electrical activity data and/or the training set of brain electrical activity data comprises removing artefacts present in the brain electrical activity data associated with a secondary physiological electrical field.

19. The computer-implemented machine learning method of claim 18, wherein removing artefacts present in the brain electrical activity data associated with a secondary physiological electrical field comprises processing the brain electrical activity data with respect to supplementary data associated with the secondary physiological electric field.

20. The computer-implemented machine learning method of claim 19, wherein the artefacts are cardiac field artefacts and the supplementary data is an electrocardiogram (ECG) signal.

21. The computer-implemented machine learning method of claim 19, wherein the artefacts are eye movement artefacts and the supplementary data is an electrooculogram (EOG) signal.

22. The computer-implemented machine learning method of any one of claims 19 to 21, wherein processing the brain electrical activity data with respect to supplementary data includes conducting a discrete wavelet transform process on the brain electrical activity data to generate decomposed brain electrical activity data followed by an independent component analysis on the decomposed brain electrical activity data and the supplementary data to find statistically independent sources.

23. The computer-implemented machine learning method of any one of claims 17 to 22, wherein pre processing further comprises:

resampling the brain electrical activity data to a predetermined sample rate; and/or

applying a band pass filter to divide the brain electrical activity data into a plurality of frequency bands.

24. The computer-implemented machine learning method of any one of claims 1 to 23, wherein the brain electrical activity data is received and classified substantially in real time.

25. A classification system for classifying sleep related brain electrical activity, the system comprising:

a sensor for receiving brain electrical activity data defining a time dependent variation of brain electrical activity as a function of time;

a classifier comprising one or more processors configured to carry out the method of any one of claims 1 to 24; and

a display for displaying brain electrical activity data and the determined classification.

26. The classification system of claim 25, wherein the sensor is an electroencephalography (EEG) recording system.

27. The classification system of claim 25, wherein the sensor is configured as a headband arrangement comprising one or more electrodes.

28. The classification system of claim 25, wherein the sensor is configured as a flexible patch arrangement comprising a single electrode.

29. The classification system of claim 25, wherein the sensor is configured as an in-ear wearable arrangement comprising one or more electrodes.

30. A classification system for classifying sleep related brain electrical activity, comprising: one or more processors;

memory in electronic communication with the one or more processors; and

instructions stored in the memory and operable, when executed by the processor, to cause the system to carry out the method of any one of claims 1 to 24.

Description:
A METHOD AND SYSTEM FOR CLASSIFYING SLEEP RELATED BRAIN ACTIVITY

PRIORITY DOCUMENTS

[0001] The present application claims priority from Australian Provisional Patent Application No.

2019902074 titled“A METHOD AND SYSTEM FOR CLASSIFYING SLEEP RELATED BRAIN ACTIVITY” and filed on 14 June 2019, the content of which is incorporated by reference in its entirety.

INCORPORATION BY REFERENCE

[0002] The following publications are referred to in the present application and their contents are incorporated by reference in their entirety:

S. Hartmann and M. Baumert,“Automatic A-Phase Detection of Cyclic Alternating Patterns in Sleep Using Dynamic Temporal Information,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 27, no. 9, pp. 1695-1703, 2019; and

S. Hartmann, O. Bruni, R. Ferri, S. Redline, and M. Baumert,“Characterisation of cyclic alternating pattern during sleep in older men and women using large population studies,” Sleep, Feb. 2020

TECHNICAL FIELD

[0003] The present disclosure relates to determining brain activity. In a particular form, the present disclosure relates to determining and characterising brain electrical activity during sleep.

BACKGROUND

[0004] Sleep is an essential part of life for many species including humans. Sleep appears to serve several vital functions including cellular restoration, memory consolidation and brain clearance from metabolites, however, the process in its entirety is incompletely understood. It comprises recurring alternating patterns of quiescence in brain activity followed by high activity which is characterised by rapid eye movement (REM) sleep. In terms of human activity, sleep plays a critical role in human health and as a consequence the determination and treatment of sleep disorders is a continually growing field especially with the pressures of modern living. Some example sleep disorders include sleep disordered breathing, which has been associated with high blood pressure and cardiovascular disease, insomnia as well as periodic limb movement disorder involving the involuntary movement of limbs during sleep.

[0005] Diagnosis of sleep disorders typically requires recording of a set of physiological parameters indicative of different body functions using overnight polysomnography where the brain electrical activity is measured by an electroencephalography (EEG) recording system as well as other physiological parameters such as breathing (eg, airflow and oxygen levels), eye movements, muscle activity activation, and heart activity.

[0006] Traditionally, the variation of brain electrical activity or the macrostructure of a period of sleep have been classified in accordance with four different stages based on the American Academy of Sleep Medicine (AASM) consensus guidelines. These stages include a REM stage plus three non-REM stages. The classification is typically performed on thirty-second time windows of the recorded brain electrical activity carried out by an EEG recording system. Every 30 second time window is then scored by a trained expert into one of the four stages based on the prominent EEG pattern for the given window even though there may be microstructure associated within different stages in each 30 second time window epoch.

[0007] It follows that the REM-based sleep staging approach neglects short term events which can be significant to understanding the sleep process of a subject. Importantly, conventional sleep staging rules were originally based on expert consensus and aimed at providing simple descriptors that could be manually obtained. They were not designed to capture the dynamic processes that govern sleep. Hence, the physiological insights that can be gained with this approach are limited.

[0008] More recently, the concept of cyclic alternating pattern (CAP) has been introduced as an alternative way to characterise non REM sleep. The identification of these CAP sequences is of significant interest as they have been linked to neural pathologies. In contrast to the standard sleep stage scoring methodology, the CAP model focuses on the microstructure of sleep. CAPs are recurring short term events correlating with the activation of the cardiorespiratory system. They can be seen as arousal like events although they are not covered by the AASM definition for arousals which characterises arousals as sudden shifts in EEG frequency. Moreover, they represent a marker for cerebral activity in a reduced vigilance state. In summary, a CAP is a marker for cerebral and physiological activity during sleep that potentially interrupts the restorative process of sleep. Typically, the more CAP events that occur during sleep, the less regenerative and stable it is.

[0009] Referring now to Figure 1, which shows an example plot 100 of multiple synchronous EEG measurements or channels 110, a CAP sequence 130 is defined as consisting of more than two CAP cycles 120 where each CAP cycle or event is composed of an activation phase (A-phase) 121 and a background phase (B-phase) 125. The A-phase is commonly restricted to sleep stages without rapid eye movement (REM) and characterised by slower high-voltage rhythms, faster lower voltage rhythms or by both. In this case, the A-phase in the highlighted time period indicated by the dashed box displays a slow high-voltage rhythm compared to the B-phase which is typical for a certain type of CAP cycle. [0010] Currently, the determination and classification of CAP sequences are performed semi -manually by a trained expert. This task can be exhausting and time-consuming and subjective variations may occur depending on the person carrying out the classification task. This has been seen to be an impediment to the implementation of these methods in clinical applications and/or implementation in electronic devices that could carry out analysis of CAP sequences. These in turn could provide an indicator of sleep fragmentation generally and in particular sleep disorders such as periodic leg movement disorder or sleep disorder breathing that are known to correlate with the frequency of CAP sequences.

[0011] As a consequence, there have been some attempts to develop automated approaches to detect and characterise brain electrical activity by recognising and classifying CAP events or sequences. These approaches have included analysing the alteration in signal amplitude averages between short and long time periods in recorded brain electrical activity data. However, this approach is extremely subject dependent because of the requirement to set thresholds depending on signal values. EEG signals for different subjects can vary strongly in terms of amplitude and time behaviour meaning that fixed thresholds do not work accurately across subjects and have to be readjusted manually for each subject.

[0012] In another approach, statistical or spectral features are extracted from the recorded brain electrical activity data followed then by the application of thresholding classification algorithms. Again, this approach cannot adequately account for the subject dependent variation in parameters such as signal amplitude and having to dynamically adjust these thresholds on a per-subject basis which greatly detracts from the convenience of this approach and its ability to be automated.

[0013] Against this background, it would be desirable to provide a method and system classifying brain electrical activity that can be applied to different subjects and is capable of being automated.

SUMMARY

[0014] In a first aspect, the present disclosure provides a computer-implemented machine learning method for classifying sleep related brain electrical activity at a selected time, the method comprising: receiving brain electrical activity data defining a time dependent variation of brain electrical activity as a function of time;

processing the brain electrical activity data to determine one or more time dependent features each characterising an aspect of the brain electrical activity as a function of time;

classifying by a machine learning system the brain electrical activity at the selected time based on the one or more time dependent features, wherein the machine learning system is trained on a training set of brain electrical activity data to apply a classification of brain electrical activity for a given time based on values of the one or more time dependent features as determined for an extended time period relative to the given time. [0015] In another form, the machine learning system is based on a recurrent neural network (RNN).

[0016] In another form, the RNN is a long-short term memory (LSTM) implementation.

[0017] In another form, the RNN is a bidirectional LSTM (Bi-LSTM) implementation.

[0018] In another form, the one or more time dependent features comprise a power measure characterising a variation in power of brain electrical activity as a function of time.

[0019] In another form, the power measure comprises a Hjorth activity.

[0020] In another form, the power measure comprises a Band Power Descriptor (BPD).

[0021] In another form, the one or more time dependent features comprises an energy measure characterising a variation in energy of brain electrical activity as a function of time.

[0022] In another form, the energy measure comprises a Teager Energy Operator (TEO).

[0023] In another form, the one or more time dependent features comprises an entropy measure characterising a variation in entropy of brain electrical activity as a function of time.

[0024] In another form, the entropy measure comprises a Shannon entropy.

[0025] In another form, the one or more time dependent features comprises a frequency shift measure characterising a shift in frequency of the brain electrical activity as a function of time.

[0026] In another form, the frequency shift measure comprises a Differential Variance (DV).

[0027] In another form, the training set of brain electrical activity data is imbalanced and the method further comprises compensating for the imbalanced training set of brain electrical activity data for training the machine learning system.

[0028] In another form, compensating for the imbalance training set of brain electrical activity data comprises adopting an error function for training the machine learning system that penalises false classification of an underrepresented class in the brain electrical activity data.

[0029] In another form, the error function is a F b -score . [0030] In another form, the method further comprises pre-processing the brain electrical activity data and/or the training set of brain electrical activity data to reduce subject related variability.

[0031] In another form, pre-processing the brain electrical activity data and/or the training set of brain electrical activity data comprises removing artefacts present in the brain electrical activity data associated with a secondary physiological electrical field.

[0032] In another form, removing artefacts present in the brain electrical activity data associated with a secondary physiological electrical field comprises processing the brain electrical activity data with respect to supplementary data associated with the secondary physiological electric field.

[0033] In another form, the artefacts are cardiac field artefacts and the supplementary data is an electrocardiogram (ECG) signal.

[0034] In another form, the artefacts are eye movement artefacts and the supplementary data is an electrooculogram (EOG) signal.

[0035] In another form, processing the brain electrical activity data with respect to supplementary data includes conducting a discrete wavelet transform process on the brain electrical activity data to generated decomposed brain electrical activity data followed by an independent component analysis on decomposed brain electrical activity data and the supplementary data to find statistically independent sources.

[0036] In another form, pre-processing further comprises:

resampling the brain electrical activity data to a predetermined sample rate; and/or

applying a band pass filter to divide the brain electrical activity data into a plurality of frequency bands;

[0037] In another form, the brain electrical activity data is received and classified substantially in real time.

[0038] In a second aspect, the present disclosure provides a classification system for classifying sleep related brain electrical activity, the system comprising:

a sensor for receiving brain electrical activity data defining a time dependent variation of brain electrical activity as a function of time;

a classifier comprising one or more processors configured to carry out the method in accordance with the first aspect of the disclosure; and

a display for displaying brain electrical activity data and the determined classification. [0039] In another form, the sensor is an electroencephalography (EEG) recording system.

[0040] In another form, the sensor is configured as a headband arrangement comprising one or more electrodes.

[0041] In another form, the sensor is configured as a flexible patch arrangement comprising a single electrode.

[0042] In another form, the sensor is configured as an in-ear wearable arrangement comprising one or more electrodes.

[0043] In a third aspect, the present disclosure provides a classification system for classifying sleep related brain electrical activity, comprising:

one or more processors;

memory in electronic communication with the one or more processors; and

instructions stored in the memory and operable, when executed by the processor, to cause the system to carry out the method in accordance with the first aspect of the disclosure.

BRIEF DESCRIPTION OF DRAWINGS

[0044] Embodiments of the present disclosure will be discussed with reference to the accompanying drawings wherein:

[0045] Figure 1 is an example plot of multiple synchronous EEG measurements or channels depicting a CAP sequence;

[0046] Figure 2 is a figurative view of the electrode placements for an EEG recording system in accordance with an illustrative embodiment;

[0047] Figure 3 is a flowchart of a method for classifying brain electrical activity in accordance with an illustrative embodiment;

[0048] Figure 4 is a system overview diagram of a classification system for classifying brain electrical activity in accordance with an illustrative embodiment;

[0049] Figure 5 is a flowchart of a method for pre-processing brain electrical activity data in accordance with an illustrative embodiment; [0050] Figure 6 is a flowchart of a method for removing artefacts from the brain electrical activity data in accordance with an illustrative embodiment;

[0051] Figure 7 depicts plots of a measured EEG signal and a corresponding measurement of an ECG signal taken over the same period in accordance with an illustrative embodiment;

[0052] Figure 8 is an overview diagram depicting the first stage of the discrete wavelet transform process operating on the sampled ECG signal in accordance with an illustrative embodiment;

[0053] Figure 9 is an overview diagram depicting the subsequent stages of the discrete wavelet transform process illustrated in Figure 8;

[0054] Figure 10 is an overview diagram depicting the inverse discrete wavelet transform required to reconstruct the original signal in accordance with an illustrative embodiment;

[0055] Figure 11 is a series of plots depicting the independent sources determined by the independent component analysis applied to the decomposed EEG signal in accordance with an illustrative

embodiment;

[0056] Figure 12 is a series of plots depicting the original EEG signal and de -noised EEG signal following pre-processing to remove artefacts arising from the cardiac electric field in accordance with an illustrative embodiment;

[0057] Figure 13 is an overview diagram of a LSTM cell in accordance with an illustrative embodiment;

[0058] Figure 14 is a simplified overview diagram of the LSTM cell illustrated in Figure 13;

[0059] Figure 15 is an overview diagram of a LSTM unit comprising the LSTM cell illustrated in Figures 8 and 9;

[0060] Figure 16 is an information flow diagram of a machine learning system trained to apply a classification of brain electrical activity at a given time in accordance with an illustrative embodiment;

[0061] Figure 17 is an example comparison plot of imbalanced and balanced data sets for training a machine learning system; and

[0062] Figure 18 is an information flow diagram of a machine learning system trained to apply a classification of brain electrical activity at a given time in accordance with another illustrative embodiment. [0063] In the following description, like reference characters designate like or corresponding parts throughout the figures.

DESCRIPTION OF EMBODIMENTS

[0064] Referring now to Figure 2, there is shown a figurative top plan view 1000 of cranium showing the electrode positions 1020 for an EEG recording system according to an illustrative embodiment. As depicted the NASION reference 1010 indicates the nose side of the cranium while the INION reference 1030 indicates the rear of the cranium. An EEG system is a method of recording brain electrical activity in a non-intracranial manner, ie, the electrodes are not required to be placed on the exposed brain. As would be appreciated, the method and systems for classifying sleep related brain electrical activity data in accordance with the present disclosure are not necessarily limited to brain electrical activity data that has been recorded by an EEG recording system.

[0065] In an EEG system, each electrode measures the electrical field created by the membrane potential of neurons inside the brain which stimulates the ions in the scalp skin to move. As the electrical field of a single neuron is too small to be measured, an individual electrode will display the activity of multiple pyramidal cells which share the same spatial orientation. As such, a higher measured voltage by an electrode is an indication of the synchronous activity of thousands or millions of such cells.

[0066] If an EEG is recorded over a long period it will show some significant oscillations and recurring patterns which define the state of the brain (eg, see Figure 1). Especially in sleep research, these oscillations and events in EEG recorded brain electrical activity data indicate in which state the brain and the body are. Since the origination of EEG recording systems, recurring activity in the frequency range of 1 30 Hz found in the recorded brain electrical activity data has been used to describe the state of sleep.

[0067] These oscillations and waveforms have been subdivided into four frequency bands or ranges each of them having a separate physiological significance as follows:

• Delta Waves (< 4 Hz). These waves contain slow waves and high amplitudes and are normally found in adults during slow-wave sleep.

• Theta Waves (4 - 8 Hz). These waves are usually seen in young children and during drowsiness or meditation in adults.

• Alpha Waves (8 - 16 Hz). These waves are predominantly seen in the period after closing the eyes.

• Beta Waves (> 16 Hz). These waves are commonly found during active phases.

[0068] As an electrode in an EEG recording system is measuring the time-dependent signal of electric potential difference at a location point on the scalp in volts there needs to be a reference electric potential from which the electric potential difference is referenced to. In one example, the electrical potential difference is defined by reference to another electrode. This is termed a bipolar recording because two locations on the scalp are compared to each other. Referring again to Figure 2, examples for bipolar measurements would be F3-C3, C3-P3, P3-O1 and Fp2-F4 corresponding to electric potential difference measurements made by electrodes between the two locations on the scalp as indicated.

[0069] In another example the electric potential difference is measured with respect to a reference electrode or with respect to ground. These last two cases are commonly referred to as unipolar recordings as all electrodes share the same reference. In most cases the reference electrode is placed close to one of the ears (A1 and A2 for each hemisphere) and typically the ear of the opposite hemisphere is used as a reference to diminish the influence of artefacts. Alternatively, ground is used as a reference electrode as referred to above.

[0070] Referring now to Figure 3, there is shown a flowchart of a method 200 for classifying brain activity according to an illustrative embodiment.

[0071] At step 300 brain electrical activity data is received which defines a time dependent variation of brain electrical activity as a function of time. Referring also to Figure 4 which illustrates a classification system 500 according to an illustrative embodiment, the brain electrical activity data may be received from a sensor 510 for transmission to classification processor 520. One example of a sensor 510 is an EEG device comprising one or more electrodes that are configured to be attached to various positions on the scalp of a subject such as illustrated in Figure 2. Each electrode measures an EEG channel corresponding to a time varying electric potential signal which is then amplified and digitised to provide the brain electrical activity data. An EEG device of this type is typically used when determining brain electrical activity in a sleeping subject where multiple electrodes may be attached to the subject and monitored.

[0072] As would be appreciated brain electrical activity data may be provided by any suitable sensor arrangement or system including, but not limited to:

• Headband arrangements which may be worn by the subject and which allow the subject to move.

These systems will typically only involve a limited number of recording channels and can be vulnerable to movement of the headband which may result in the sensors moving in position with respect to the subject’s head. The electrodes are embodied in the headband making it necessary to create a contact to the skin and dry electrodes are selected eliminating the application of electrolytes. As adhesive patches are not used, as in the case of a traditional EEG system, the position of the headband is fixed by increasing the tightness of the headband which can potentially impact the wearability of the sensor for the subject. The advantage of headband arrangements is their positioning and the opportunity to measure multiple channels. As headbands can be worn around the centre of the scalp they can also be positioned to acquire important information.

• Flexible patch arrangements comprising a single sensor which may be attached to a location on the head, such as the forehead, and providing a single channel of brain electrical activity data. The major drawback of these arrangements is their positioning as they cannot be placed on hair because they lose contact to the skin. As a result, they cannot be placed around the centre of the scalp. In addition, using the forehead as a measurement position can reduce the information received from the sensor. On the plus side, these arrangements are more comfortable to wear during sleeping.

• In-ear wearable arrangements comprising an in-ear insert that includes a number of EEG

electrodes on its surface which when worn will make contact with the ear canal to measure brain electrical activity. These arrangements are easy to wear and comfortable for the subject.

However, positioning variability may occur from subject to subject due to potential variations in the shape of the inner ear.

[0073] As would also be appreciated, the sensor may transmit detected brain electrical activity data to classification processor 520 by any suitable wired or wireless means.

[0074] Referring back to Figure 3, at step 400 the brain electrical activity data is processed to determine one or more time dependent features that each characterise an aspect of the brain electrical activity as a function of time.

[0075] In one example, the measured brain electrical activity data is pre-processed to assist analysis and/or reduce or remove subject related variations which are not related to the brain electrical activity.

[0076] Referring now to Figure 5, there is shown a flow chart 600 depicting exemplary pre-processing steps according to one illustrative embodiment where in this example the brain electrical activity data relates to a standard EEG recording relating to either the C4-A1 or the C3-A2 channel. While the present method and system is not restricted to a particular bipolar or unipolar recording, these unipolar recording channels are suitable because of the central location of the C4 and C3 electrode positions. As would be appreciated, the brain electrical activity data may arise from different sensing regimes that employ different measurement sampling frequencies and in this case to facilitate both the training and application of the machine learning system the brain electrical activity data may be resampled to a standard frequency. As an example, the received brain electrical activity data corresponding to different subjects could be measured at differing sampling frequencies between 100 Hz and 512 Hz and these sampling frequencies, in one embodiment, would be resampled to a common predetermined frequency of 128 Hz at step 610. [0077] In one example, the re-sampling frequency is of the form 2 n where n is an integer as this will facilitate the application of signal processing algorithms such as the fast Fourier transform (FFT) or the discrete Fourier transform (DFT) and wavelet transformation. Typically, subsampling makes the algorithm more efficient as less data has to be stored. In one example, where there is data corresponding to a range of sampling frequencies, the data would be resampled to the lowest frequency in the range that is a power of two taking into account that down sampling may result in some loss of information in the signal.

[0078] In some instances, a secondary physiological electric field such as the cardiac electric field may be superimposed on measured brain electrical activity data as the cardiac electric field is distributed on the body surface and in some cases even to the scalp. As such, electrodes placed on the scalp surface will then measure a linearly mixed signal of the cardiac field and the brain electrical activity resulting in visible peaks present in the EEG recording corresponding to the activity of the heart. At step 620, and to facilitate both the training and application of the machine learning system of the present disclosure the effects of the secondary physiological electric field may be removed from the brain electrical activity data. In one example, the cardiac electrical field may be removed where there is also access to corresponding supplementary data associated with the cardiac electrical field such as a recorded electrocardiogram (ECG) signal.

[0079] In one example, blind source separation methods (BSS) such as independent component analysis (ICA) may be adopted to remove the effects of a secondary physiological electric field. In one embodiment, and as described below, a modified Wavelet-ICA method is employed where the Wavelet- ICA method combines an initial discrete wavelet transform process on the brain electrical activity data to generate decomposed brain electrical activity data in the form of an array of frequency sub-bands from the EEG signal followed by an independent component analysis (ICA) on the decomposed brain electrical activity data and the supplementary data to find statistically independent sources. This is based on the understanding that while the recording of electrical brain activity for each location can contain a superimposed signal, nevertheless, the signals related to the superimposed ECG signal (as an example) and the brain electrical activity are independent of each other.

[0080] Referring now to Figure 6, there is shown a flowchart of a method 620 for removing artefacts associated with a secondary physiological electrical field from brain electrical activity data corresponding to EEG signal 621 according to an illustrative embodiment. In this example, the artefact is in the form of the cardiac electric field which is superimposed on the EEG signal and a corresponding measurement of the ECG signal 622 provides supplementary data associated with the cardiac electrical field which is employed to remove its presence from the brain electrical activity data 621. These signals can be seen in Figure 7 which depicts plots of a measured EEG signal 621 and a corresponding measurement of an ECG signal 622 taken over the same period. In one example, the ECG signal 622 is re-sampled in accordance with the re-sampled EEG signal resulting in corresponding sampled EEG signal m[k] and sampled ECG signal n[k], where k indicates the index of the signal value in each case.

[0081] At step 623, a wavelet decomposition 623 is applied to the sampled EEG signal m[k ] to generate a decomposed EEG signal.

[0082] In one example, the wavelet decomposition involves a combination of discrete wavelet transforms followed by reconstruction of the wavelet coefficients. The discrete wavelet transform (DWT) is the discrete version of wavelet transform with discretely sampled wavelets. The key advantage of wavelet transform is its ability to capture frequency and location in time information as compared to the Fourier transform.

[0083] Each step in the DWT process generates so called approximation coefficients a[k] and detail coefficients d [k] . The approximation coefficients are the output of low pass filtering followed by dyadic subsampling while the detail coefficients are determined identically but with a high pass filter. The filters are discretely sampled wavelets and referred to as analysis filters. Examples for wavelets include, but are not limited to, Haar wavelets, Daubechies wavelets or Shannon wavelets. In one example applicable to ECG signals, a Coiflet wavelet is adopted due to its resemblance to the form of heartbeats in ECG signals (ie, QRS complexes).

[0084] Referring now to Figure 8, there is shown an overview diagram depicting the first stage of the DWT process operating on the sampled EEG signal m[k]. The coefficients of the first level a 1 and d 1 are calculated by passing the signal through the filters and subsampling by 2 as follows:

[0085] where * represents the convolution operation, | l=2k indicates the subsampling and g[k] and h[k] describe the impulse response of the low pass and high pass filter, respectively. Each output has half of the frequency band which doubles the frequency resolution but only half of the time resolution due to subsampling.

[0086] Referring now to Figure 9, there is shown an overview diagram showing the subsequent stages of the DWT process where the respective approximation coefficients are then repeatedly decomposed to increase the frequency resolution. This then results in a filter bank which resembles a binary tree as depicted in Figure 9. [0087] Referring now to Figure 10, in order to reconstruct the original signal, the inverse discrete wavelet transform (IDWT) is applied. IDWT follows the same procedure as DWT but with so called synthesis filters and up-sampling. In this example, the synthesis filters g s [k] and h s [k] are calculated based on the analysis filters as follows:

[0088] The IDWT is then calculated as:

[0089] where a i+1 and d i+1 represent the detail and approximation coefficients at level i + 1.

[0090] In this example, the signal part of a specific frequency band is sought to be reconstructed, ie, the coefficients at each level separately. Therefore, the same procedure is applied but without adding information from preceding or succeeding levels.

[0091] In this example, the wavelet decomposition results in:

[0092] In this example, because of the common sampling frequency of 128 Hz for signals m[k] and n[k], a 6-level decomposition was selected as can be seen in Figure 9 due to the dyadic nature of wavelet decomposition, ie, at each level of decomposition half of the bandwidth is filtered and the remaining signal is down sampled to half of the sample rate. In this case, six levels of decomposition are chosen to obtain seven frequency bands with a minimum band frequency of 1 Hz (2 7 = 128). As would be appreciated, the number of levels of decomposition may be varied depending on the re-sampling frequency and the nature of the artefact that is being removed from the EEG signal.

[0093] Referring back to Figure 6, at step 624 an independent component analysis (ICA) is applied to the decomposed sampled EEG signal and to the sampled ECG signal to determine the independent sources. ICA can be described with following formula: [0094] where x are the mixed signals, A is the mixing matrix and s are the independent sources. In this embodiment , Equation 7 may be written as follows:

[0095] In this example, the aim of the ICA is to find sources with minimized mutual information /

[0096] Mutual information of two signals or variables describes the information obtained about a signal or variable by observing the other signal or variable. If the mutual information is minimized, the signals will be less dependent because no information can be gained from one signal about the other signal. Independence can be achieved by forcing each of them to be as far from the normal distribution as possible which maximizes the non-Gaussianity of the sources. Non-Gaussianity is determined with negentropy using differential entropy.

[0097] Negentropy is defined as:

[0098] where H(s G ) is the differential entropy of s G which is a Gaussian variable with the same mean and variance as s, and H(s) is the differential entropy of s:

[0099] with p(s) as density function of s.

[00100] In this embodiment, as a pre-processing strategy, a whitening process was adopted which transformed the components of the observed vector to be uncorrelated and their variances to be equal unity. Whitening can be performed by multiplying , where Cov is the covariance matrix of the

vector. [00101] Referring now to Figure 11, there is shown a series of plots depicting the independent sources determined by the independent component analysis applied to the decomposed EEG and ECG signals according to an illustrative embodiment. As expected one of the independent sources illustrates the ECG signal (ie, ICA 7). As would be appreciated, as a result of the ICA process the ECG signal components in EEG signal are“added” to the ECG signal resulting in a clean reconstructed or de-noised EEG signal.

[00102] At step 625, the EEG and ECG signals are reconstructed. Initially, the source representing the ECG signal is selected by computing the correlation of each of the sources (ie, ICA 1 , ICA 2, ICA 3, ICA 4, ICA 5, ICA 6 and ICA 7) with the ECG signal and then selecting the source with the highest correlation and determining if it is above a certain threshold which in this example was set to be 0.75. Following this process, the ECG signal is reconstructed using the selected source and the mixing matrix. In addition, the EEG signal is reconstructed using the remaining sources.

[00103] Referring now to Figure 12, there is shown a series of plots showing the original

“contaminated” EEG signal 710 and the reconstructed (ie, de-noised) EEG signal 720 as well as the original ECG signal 730 and the reconstructed ECG signal 740 according to an illustrative embodiment.

[00104] At step 626, the removal of the cardiac field artefact was completed by determining the

Pearson correlation coefficient between the original ECG signal 730 and the reconstructed ECG signal 740. If the correlation value is above a certain threshold, say 0.75 in one example, it confirms that ECG artefacts have been removed from the original EEG signal 710 and the reconstructed EEG signal 720 is then used for further processing. If not, the original EEG signal 710 is used for further processing.

[00105] In another example, step 620 includes the removal of eye movement artefacts arising from the movement of the eyeballs which can create an additional secondary physiological electrical field whose effects are present in brain electrical activity data that is recorded while the subject is asleep. In this example, the electrooculogram (EOG) signal 622, representing eye movement, is recorded as part of the polysomnographic recording setup where the EOG measures the electrical potential of each eye with electrodes close to the eyes. Due to proximity, the electrical potential of the eyes (muscle activity and the eye considered as a dipole) distributes to the top of the scalp and can also be seen in recordings of EEG channels especially in channels located close to the forehead.

[00106] Similar to the cardiac electric field, the electrical potential of the eyes is linearly mixed with the electrical brain activity, ie, it is an independent electric field source in the EEG signal. To decouple both electrical components to remove the eye movement artefacts, the EOG recording is used in the same manner as the ECG following the process illustrated in Figure 6. In this manner, the method for CFA removal described above may be applied similarly to deal with eye movement artefacts where in this case it is EOG signal, as opposed to the ECG signal, that is added to the decomposed EEG signal before the ICA processing at step 624.

[00107] Accordingly, in these examples, where there exists a separate signal or recorded data corresponding to the artefact or secondary physiological electrical field that is being sought to be removed from the brain electrical activity data then this supplementary data may be used to remove the artefact from the brain electrical activity data.

[00108] Referring back to Figure 5, at step 630 a bandpass filter is applied to the brain electrical activity data. In this example, the brain electrical activity data is bandpass filtered with a finite impulse response (FIR) filter (0.5 Hz - 30 Hz) and subsequently divided into five frequency bands using a least squares linear phase FIR filter bank to separate the measured brain electrical activity data into the following frequency bands, ie, Delta (0.5 - 4 Hz), Theta (4 - 8 Hz), Alpha (8 - 12 Hz), Sigma (12 - 16 Hz) and Beta (16 - 30 Hz). As would be appreciated, the number and extent of the frequency bands may be modified in accordance with requirements of the classification system. In one example, the alpha and sigma bands may be merged into a single alpha band.

[00109] Referring back to Figure 3, at step 400 the brain electrical activity data, which may have been optionally pre-processed, is processed to determine one or more time dependent features that each characterise an aspect of the brain electrical activity as a function of time.

[00110] In one example, the features are calculated on a defined window length with partial overlapping and all features are centred on the current second resulting in an effective sample rate of 1 Hz. In one embodiment, where the resulting classification of the brain electrical activity relates to whether a feature is being determined as a CAP event or sequence, an effective sample rate of 1 Hz for the determination of the time dependent features that each characterise an aspect of brain electrical activity implies that each second of brain electrical activity data will be classified as part of an A-phase or not.

[00111] In other examples, a sample rate greater than 1 Hz could be adopted implying that time intervals of less than a second would be classified as A-phases. As would be appreciated, high sample rates are generally not required for CAP detection and would come with an increased computational burden. Alternatively, an effective sample rate of less than 1 Hz means time periods longer than one second are classified which could result in information loss due to averaging over longer periods of time. As would be appreciated, other types of brain electrical activity classification schemes could be carried out at different frequencies depending on the expected time scales of variation in the brain electrical activity. [00112] In one embodiment, the time dependent feature is a power measure that characterises the variation in power of brain electrical activity as a function of time based on the measured brain electrical activity data. In one embodiment, the power measure of brain electrical activity is determined for a given frequency band. In another embodiment, the power measure of brain electrical activity is determined for a range of frequency bands.

[00113] As would be appreciated, some of the typical brain electrical activity or EEG patterns only occur in certain frequency bands (eg, delta bursts, intermittent alpha, K-alpha, arousals). In terms of the classification of CAP events or sequences, short term bursts of power of brain electrical activity in certain frequency bands may be associated with possible A-phases forming part of a CAP event.

[00114] In one example, the time dependent feature that characterises the variation of power of brain electrical activity or“power measure” is the Hjorth activity, m 0 , which in this example is determined for 3-second overlapped windows. The Hjorth activity is defined as the variance of the signal amplitude based on the discrete signal values s i where M is the window length and

[00115] In one example, m 0 is only determined in the delta band as this frequency band is significant for the detection of slow waves representing one of the potential A-phase patterns forming a CAP event. As would be appreciated, the power feature m 0 may also be determined in other frequency bands. As would also be appreciated, the use of overlapping windows of increasing size will create a smoothing effect and the size of the windows may be varied as required.

[00116] In another example, the power measure characterising the variation in power of brain electrical activity or power measure is a Band Power Descriptor (BPD) which evaluates the variation of power in the different frequency bands to highlight the transient spectral variations in a temporal range of 2-60 seconds. Initially, the signal in each band is squared and normalized with respect to the maximum power of the band. The mean power on windows of 2 seconds and 64 seconds are then determined. The BPD formula is shown below:

[00117] where e s (t ) is the mean power of the short 2-s window, e l (t) the mean power of the long

64-s window and d b ( t ) the power descriptor for a specific frequency sub-band. Essentially, the BPD is a normalised version of the instantaneous power e s with respect to the mean power over a longer time interval e l with the timings chosen in this example to be consistent with the transient frequency and amplitude variations of the order of 2-60 s that are typical of CAP. As would be appreciated, these time windows may also be varied as required.

[00118] In other examples, the power measure characterising the variation in power of brain electrical activity includes, but is not limited to: the spectral power (eg, DFT, Welch spectral density) or power estimates based on the discrete Wavelet transform.

[00119] In another embodiment, the time dependent feature is an energy measure that characterises the variation in energy of brain electrical activity as a function of time based on the measured brain electrical activity data. Similar to power, short term changes in energy of brain electrical activity in certain frequency bands may be associated with possible A-phases forming part of a CAP event of sequence.

[00120] In one example, the energy measure that characterises the variation in energy of brain electrical activity is the Teager Energy Operator (TEO) which in this example is determined for all frequency bands in 2-s overlapping windows. The discrete version of the TEO, Y | x | n ] ] , is defined as follows, where s[n] corresponds to the discrete sample of a time series:

[00121] In this example, the Teager Energy Operator (TEO) may be adopted to provide essentially instantaneous estimation of frequency and amplitude components making it applicable for real time classification.

[00122] In another embodiment, the time dependent feature is an entropy measure that characterises the variation in entropy of the brain electrical activity as a function of time based on the measured brain electrical activity data. In general, the concept of entropy describes the information content of a stochastic process. In the field of signal processing the entropy describes how much the signal alters as a function of time or put another way the signal entropy may be seen as a measure of how chaotic the signal is as a result making this feature useful for the detection and characterisation of activation periods in the signal. Short-time events like the occurrence of A-phases in brain electrical activity are such activation periods and can generate high entropy values.

[00123] In one example, the entropy measure is the Shannon entropy, H, is determined for all the five frequency bands in 2-s signal windows with one second overlap. Given the probability p i with i representing all amplitude values, the Shannon entropy is defined as: [00124] In other examples, the entropy measure characterising the entropy of brain electrical activity includes, but is not limited to: sample entropy, spectral entropy, Tsallis entropy or Kolmogrov entropy.

[00125] In another embodiment, the time dependent feature is a frequency shift measure that characterises the shifts in frequency of the brain electrical activity as a function of time based on the measured brain electrical activity data. Short-time events like the occurrence of A-phases in brain electrical activity associated with CAP events are typically characterised by abrupt frequency shifts in brain electrical activity.

[00126] In one example, the frequency shift feature is the Differential Variance (DV) of the brain electrical activity data. The DV is defined as the variance difference of consecutive 1—5 windows based on the discrete signal values s i where M is the window length and :

[00127] Referring back to Figure 3, at step 500 the brain electrical activity at a selected time is classified by a machine learning system that is trained on a training data set to apply a classification of brain electrical activity at a given time based on values of the one or more time dependent features (eg, power measure, entropy measure, etc) determined over an extended time period relative to the given time in order to exploit the dynamical temporal information in the brain electrical activity data. In one embodiment, the classification relates to determining the presence of CAP events or A-phases of CAP cycles in the brain electrical activity.

[00128] In one embodiment, the machine learning system is trained and evaluated on brain electrical activity data that is available in the public domain. In one example, the training set of brain electrical activity data is sourced from the publicly available CAP Sleep Database on PhysioNet which is an open-source repository for physiological signal recordings targeting various biomedical research fields.

[00129] In this example, the polysomnographic measurements to provide the brain electrical activity data were conducted by the Sleep Disorders Center of the Ospedale Maggiore of Parma, Italy and normal healthy subjects (n1 - n15) were selected. In this example, the brain electrical activity data includes at least one EEG channel (C3 or C4), multiple bipolar EEG channels and other parameter such as ECG or eye movement signals which as discussed above may be used to remove artefacts associated with secondary physiological electric fields during any optional pre-processing of the training set of brain electrical activity data.

[00130] For each of the subject's brain electrical activity data there is a corresponding annotation file providing manual scoring performed by expert neurologists. The scoring comprises sleep stages and CAP events (ie, Al, A2 or A3 subtypes) according to the Rechtschaffen & Kales rules and the atlas of CAP scoring respectively (see Table 1 below).

TABLE 1

DEFINITION OF CAP SUBTYPE EVENTS

[00131] The manual scoring serves as the“ground truth” for the supervised learning of machine learning system. In this example, the brain electrical activity data contains a total of 7519.5 minutes of scored sleeping time. Since CAP events only occur in NREM stages, the data set comprises 5040.5 minutes of scoring-relevant data of which 15.4% are A-phases and 84.6% pertain to background periods. Summary statistics of the sleep macrostructure and CAP occurrence for each subject and in total are listed in Table 2. TABLE 2

STATISTICS OF SLEEP MACROSTRUCTURE AND CAP OCCURRENCE FOR SUBJECTS N1 -

N15 IN SECONDS

[00132] In one embodiment, the machine learning system that is trained to apply a classification of brain electrical activity at a given time based on values of the one or more time dependent features over an extended time period relative to the given time is based on a recurrent neural network (RNN). In one example, the RNN is implemented as a Long Short-term Memory (LSTM) RNN.

[00133] Referring now to Figure 13, there is shown an overview of a LSTM cell 800 according to an illustrative embodiment configured to control and store information. In this example, memory cell 800 is composed of the input gate (i t ) 810, forget gate ( f t ) 820, output gate ( o t ) 830, cell candidate (g t ) 840, hidden/output state ( h t ) 850 and the cell state ( c t ) 860.

[00134] In this example, and in contrast to standard neural networks, the cell state 860 contains all the information from previous time steps. Initially, the forget gate 820 decides which information from the past is relevant and which information can be removed. This is carried out by multiplying each number in the cell state 860 with either 0 (remove) or 1 (keep). The cell candidate 840 and input gate 810 update the cell state 840 with the information of the current time step. The input gate 810 decides which values will be updated and the cell candidate 840 creates the new values which will be added. Finally, the output gate 830 controls which information is added to the output state 850. As such, LSTM cell 800 has the ability to remove or add information to the cell state 860 as regulated by the gate structures that control information flow and as a result manipulate the current hidden state 850.

[00135] In one example, the gates are calculated by the following numerical calculations,

[00136] where additionally s is the logistic sigmoid function, tanh is defined to be the hyperbolic tangent, x t is the cell input, W yz is the weight of gate z corresponding to gate input y and b z is the bias of gate z.

[00137] In another example embodiment, the gate definitions are defined in accordance with a peephole LSTM where instead of using h t-1 in equations 17a, 17b, 17c, and 17e, the cell state c t-1 is used. In another example, gated recurrent units (GRU) are adopted which are LSTMs without the output gate and which accordingly have fewer parameter to train.

[00138] Referring now to Figure 14, there is shown a simplified overview diagram of the LSTM call 800 illustrated in Figure 13 which will assist in the following discussion.

[00139] Referring now to Figure 15, there is shown an overview diagram of a LSTM unit 900 comprising the LSTM cells 800 for one time step for a given layer. In this example, the number of nodes for each LSTM unit 900 is defined as the number of hidden neurons.

[00140] The LSTM RNN design or configuration is defined by the number of layers and the number of hidden neurons in each layer. In accordance with this definition, a LSTM RNN configuration of [128, 64, 32] would consist of three layers with 128 hidden neurons per LSTM unit 900 in the first layer, 64 hidden neurons per LSTM unit 900 in the second layer, and 32 hidden neurons per LSTM unit 900 in the third or last layer.

[00141] Referring now to Figure 16, there is shown an information flow diagram of a machine learning system 1000 according to an illustrative embodiment. Machine learning system 1000 in this example is trained to apply a classification of brain electrical activity at a given time t based on values of the one or more time dependent features that characterise aspects of the brain electrical activity which have been determined over an extended time period relative to the given time. In this example, the extended time period comprises a period of time prior to the given time.

[00142] In one embodiment, machine learning system 1000 is based on a LSTM RNN configuration consisting of k stacked layers 1030 each comprising LSTM units 900 where machine learning system 1000 has been trained based on the time variation of the features for the time period before the given time. In this example, the dimension or size of the input vector x 1010 for each time step will correspond to the number of time dependent features that have been chosen that characterise aspects of brain electrical activity as referred to above.

[00143] As a non-limiting example, the input vector dimension or input size may be 5 and correspond to an input vector x[t] consisting of the:

• Hjorth activity m 0 ;

• Band Power Descriptor (BPD);

• Teager Energy Operator (TEO);

• Shannon Entropy; and

• Differential Variance (DV),

as determined for each time step.

[00144] The input sequence length N 1020 is the number of time steps that are considered and in machine learning system 1000 for the selected time t, the input sequence will consist of the sequence of vectors x[t— N + 1], ··· , x[t], ie for the N time steps prior to the selected time t. In this example, the number of time steps, ie, the sequence length, defines how much of the previous information is entered into the calculation of the current state.

[00145] In this example, the output of LSTM unit 900 of the current time step in the last layer is further processed to create a classification output by, in this example, a feed-forward neural network layer 1040 which is connected to this output, ie, each node of the feed-forward neural network layer 1040 receives the output of the last LSTM unit 900 in the last layer of the k stacked layers 1030. In this example, layer 1040 of machine learning system 1000 is referred to as a fully connected layer because it connects all neurons in the fully connected layer to all the neurons in the previous LSTM unit 900.

[00146] As would be appreciated, a general neural network layer may be regarded as a mapping architecture which will assign an input of dimension n to an output of the dimension m by using processing units. Such a processing unit is called a neuron and creates initially a linear combination of the input variables combined with weights and a bias. In this case, each neuron receives the same aforementioned output as input. In this example, a non-linear activation function is subsequently wrapped around the linear combination to exploit non-linearities in the data.

[00147] Depending on the choice of the activation function, the output may be mapped into a desired range between 0 to 1 or -1 to 1. If an activation function is not applied, the output would be a simple linear function with limited complexity. This would transform the neural network to a linear regression model. To learn and model non-linear dependencies in the data, an activation function is typically adopted for neural networks. In this case, a rectified linear unit (ReLU) was used as an activation function for layer 1040 which only takes the positive part of its argument.

[00148] Machine learning system 1000 further includes a Softmax layer 1050 configured to map the output of the fully connected layer 1040 to the output classes and functions to further normalise the output into a probability distribution consisting of two probabilities. Each probability will be in the range of (0, 1) and they will add up to 1.

[00149] Machine learning system 1000 is trained on the training set of brain electrical activity data by updating the parameters of the neural network to decrease the distance between the output values and the target values of the training set. For each input sample, the expected target output is provided in the training set. To update the parameters of the network, an error or loss function is determined which characterises the“distance” between the output and the target. The goal of the training phase is then to minimise this error so that the output is similar to the target. In one example, the gradient of the error function is calculated to find the minimum. In a further example, the gradient descent optimisation technique is adopted to find the minimum of the error function.

[00150] In this case, where the gradient descent is seeking to find the minimum of a function, the method will determine iteratively the direction of steepest descent on the multi-parameter surface defined by the error function, and whose slope is characterised by the derivative of the loss function, and proceed in this direction until it finds the minimum. As calculating the derivative comes at a computational cost it is generally preferable to reduce the number of times this computation is carried out. The time between calculating each derivative is defined as the learning rate of the method and further the learning rate drop factor and drop rate are defined as the number of times the derivative must be calculated in order to reduce the learning rate by a factor in order to obtain more precise determinations of the gradient when closer to the minimum.

[00151] Referring now to Figure 17, there is shown an example comparison plot 1200 of imbalanced 1210 and balanced 1220 data sets for training of a machine learning system. Typical error functions for classification tasks involving cross entropy loss, logistic loss or mean square error are employed to focus on increasing the accuracy of the classification output. If an imbalanced data set 1210 is used, the classifier will classify the predominant class all the time to achieve high accuracy.

Considering the example in Figure 17, a classifier would continuously choose class 0, if trained with the imbalanced data set 1210, because the effect of the false classifications for class 1 on the accuracy is negligible.

[00152] Referring again to Table 2, which summarises the training set of brain electrical activity data, it can be seen that the number of A-phases in the brain electrical activity data comprising each sleep recording (nl - n15) is significantly lower than the number of background phases (ie, NREM) and as such this training set of brain electrical activity data would be characterised as imbalanced. Training on this data set would likely result in a trained classifier that most likely chooses the class of the most prominent label to achieve a high accuracy. In this case, the classifier would always choose the background class, ie, NREM sleep, when seeking to classify actual brain electrical activity data as the A- phases or CAP events are only ever a small percentage of the NREM sleep.

[00153] In one example, compensating for an imbalanced training set of brain electrical activity data includes generating artificial samples of the underrepresented class, ie CAP events, based on the available data. However, while this process may assist, generating‘artificial’ samples for one class based on the available samples may result in a biased classifier due to the input samples resembling each other.

[00154] In another example, compensating for an imbalanced training set of brain electrical activity data includes removing samples of the more prominent class, ie NREM sleep to generate a balanced data set. However, while this process may assist, removing samples will reduce the dimension of the data set and in machine learning systems, especially those based on deep learning, a data set of maximum possible size is typically necessary to train a precise classifier.

[00155] In another example, compensating for an imbalanced training set of brain electrical activity data includes adopting a loss or error function configured to strongly penalise false classifications for the underrepresented classes, ie CAP events, as compared to the standard error functions referred to above. [00156] In one example, the error function adopted is in the form of the F b -score which is defined as follows:

[00157] where o i represents the output of the classifier, t i describes the target label, and L is the size of the data set

[00158] The F b -score relies solely on the precision and sensitivity of the prediction. Both values may be used as objective quality measures for binary classification problems along with accuracy and specificity. Sensitivity describes the quantity of correctly detected events and precision defines how many estimated event periods were classified wrongly. High sensitivity and precision will result in an accurate detection of even underrepresented classes.

[00159] The positive b -value defines the prioritisation of precision or sensitivity. A b-value equal to 1 is known as the F 1 -score or F-score and is the harmonic average of the precision and sensitivity.

[00160] As discussed above, not only the error function but its gradient must be calculated to train the machine learning system. Equation 19 below sets out the derivative of the F b -score, where o i represents the output of the classifier, t i describes the target label, and L the size of the data set:

[00161] In another example, the loss or error function which is configured to strongly penalise false classifications for the underrepresented classes is based on a weighted cross entropy where a higher weight is assigned to the minority class and a lower weight is assigned to the majority class.

[00162] In one embodiment, the weighted cross entropy is defined as follows:

weighted cross entropy

[00163] where y is the class binary indicator (0 or 1), p is the predicted probability for instance belonging to class 1 and w 0 and w 1 are the weights for class 1 and 0, respectively.

[00164] In one example embodiment, directed to the classification of CAP cycles, the classification from the machine learning system was post-processed in accordance with the atlas for CAP scoring to determine the following classifications including, but not limited to: • CAP rate (total CAP seconds/total NREM seconds);

• A1 ratio (total subtype A1 seconds/total CAP seconds);

• A2 ratio (total subtype A2 seconds/total CAP seconds); and

• A3 ratio (total subtype A3 seconds/total CAP seconds).

[00165] In one example, since A or B phases can only occur in time periods longer than a second, isolated one-second classifications from the machine learning system were replaced by their neighbouring values.

[00166] Where the machine learning system classified a portion of the brain electrical activity data as having an A-phase longer than 60 seconds then this was reclassified again due to the high probability of this time window containing multiple A-phases. In one example, a self-organizing neural network with 500 epochs was used to cluster the particular time window again. These two steps were performed three times in a row to check for possible changes after the preceding iteration should one- second classifications appear again.

[00167] In this example, based on the data set of recorded brain electrical data of n1 - n15, this data set was divided into a test set and a train and validation set with the test set comprising the subjects n3, n6, n9, n12, n15 and corresponding to approximately 33% of the total recorded brain electrical activity data. The remaining subjects were merged into the training and validation set of brain electrical activity data which was used for tuning the parameters of the LSTM algorithms and validating the effect of the artefact removal in the pre-processing step.

[00168] As would be appreciated, the LSTM-RNN is configured to classify the brain electrical activity for a given time based on values of the one or more time dependent time features that have been evaluated at previous times to the given time. This type of machine learning system is applicable to where the brain electrical activity data is being recorded and processed to be classified in real-time or in substantially real-time. In another embodiment, applicable to where the brain electrical activity data is in the form of an extended recording over a time period a bidirectional LSTM (Bi-LSTM) may be adopted by the machine learning system to make use of“future” information relative to the given time at which the classification is applied.

[00169] Referring now to Figure 18, there is shown an information flow diagram of a machine learning system 1300 progressing from the measured brain electrical activity data in the form of a single EEG signal trace to the classification outcome using a Bi-LSTM RNN according to an illustrative embodiment. [00170] In this example, the machine learning system in the form of a Bi-LSTM RNN is configured to harvest information in both directions by merging the hidden states of the forward and backward sequence relative to the current time step to the same output layer. In this example, the Bi- LSTM RNN contains one forward 1330 and one backward 1340 LSTM layer which simultaneously receive the same current input. Both directional layers 1330, 1340 are built on LSTM units and can themselves consist of multiple layers creating a deep bidirectional LSTM RNN network.

[00171] As shown in Figure 18, the time dependent features that characterise aspects of the brain electrical activity are calculated based on a time window 1311 of the brain electrical activity data 1310 creating an input array with a feature vector 1320 for every second (ie, x[t] with t Î {1, 2, ··· , L} where L is the length of time corresponding to the recorded brain electrical activity data.

[00172] For every time step, the Bi-LSTM RNN classifier takes a sequence of 2 · ( N— 1) feature vectors (ie, x[t— N + 1], ··· , x[t + N— 1]) centred on the current time step where N— 1 is the length of the desired past and future information.

[00173] At the end, the combined hidden states, h[t], of the forward and backward layer of the central time step are linked to a fully connected layer followed by a Softmax layer as described previously. Thus, for each time step the information from N— 1 values of the past and the future plus the current features are employed for the classification task.

[00174] Prior to the supervised training of the classifiers, the parameters for both the LSTM and

Bi-LSTM machine learning systems were set. The parameters of the LSTM layers (sequence length, number of layers and number of neurons per layer) may be tuned by applying the leave-one -out (LOO) method to the training and validation set. In this approach, determining which LSTM parameters perform well, eg, the number of layers, the number of hidden neurons and the beta value of the loss function may be optimised. The LOO method is a k-fold cross-validation algorithm in which for each fold, one subject is determined as a test set and all remaining subjects are merged into the training set. Thus, the classifier is trained k times on dataset D/D t and tested on D t , where D represents the entire dataset, k is the total number of subjects in the set (here: 10) and t Î 1, 2, ... , k . Set out below in Table 3 are parameters determined for a machine learning system based on LSTM and Bi-LSTM RNN implementations for classifying CAP events in measured brain electrical activity data. TABLE 3

LIST OF LSTM & BI-LSTM RNN MACHINE LEARNING SYSTEM PARAMETERS

[00175] In this example, in order to speed up the training process the training samples are divided into batches (see“Batch size” in Table 3) and then the error/loss is determined for the entire batch instead of each sample individually. The number of epochs (see“Epoch” in Table 3) describes how often the network has seen the entire data set for training as the network is not optimized after seeing the samples the first time.

[00176] To compare the performance of the various A-phase or CAP event classifiers, a set of performance measures for binary classification problems were calculated. The efficacy was evaluated based on the number of correctly identified events (true positives, t p ), the number of correctly recognized background phases (true negatives, t n ), and the number of seconds which were misidentified either as A- phase (false positive, f p ) or as background phase (false negative, f n ). Based on these, the accuracy (ACC), true positive rate or sensitivity (TPR), specificity or true negative rate (SPC) and the F 1 -score are quantified as follows:

TABLE 4

COMPARISON OF DIFFERENT CLASSIFIERS ON TEST DATA SET

[00177] Referring now to Table 4, there is shown the performance measures in accordance with

Equations 21 and 22 above for the LSTM and Bi-LSTM embodiments referred to above as applied to the test set. The values represent the means plus the standard deviation of the five subjects that the machine learning system was applied to. As would be appreciated, these performance measures indicate a high degree and precision obtained by a machine learning system implemented in accordance with the present disclosure.

TABLE 5

COMPARISON OF PERFORMANCE MEASURES ON TEST SET USING IMBALANCED AND

BALANCED DATA SET

[00178] Referring now to Table 5, there is shown a comparison between the results of the imbalanced training data set to the results of a modified and supplemented balanced training data set. For a machine learning system trained with a balanced data set, the standard cross entropy function was adopted as the error or cost function. The numbers show an increase for both RNN algorithms regarding the sensitivity and precision where the original imbalanced data set combined with F b -score as the loss function set was applied in training (TPR:+3.5 - 7%, F : + 5. 5—.5%) indicating that in this example a classifier trained on the unmodified imbalanced data and F b -score performs significantly better than a classifier trained with a modified balanced data set and a standard loss function.

[00179] As would be appreciated, methods and systems for classifying sleep related brain electrical activity in accordance with the present disclosure provide an enhanced capability to automate the classification of this brain electrical activity without requiring human intervention. In one example, directed to the classification of CAP events or cycles, the classification of these events may be subsequently used as indicators of human health.

[00180] As referred to previously, a high number of CAP events indicate a less restorative, instable sleep. To determine the amount of CAP in sleep, the CAP rate is calculated which defines the percentage of NREM sleep covered by CAP. A higher CAP rate than usual indicates that the sleep process was interrupted multiple times.

[00181] Pathologies such as obstructive sleep apnoea (OS AS), periodic leg movement (PLM), or insomnia show higher CAP rates than usual, ie, subjects with one of the mentioned diseases experience much more arousals or arousal-like events than other persons. As such, the body must work more during sleep resulting in potential cardiovascular problems or mental health issues. Subtype A1 is mostly found in the descending branch of the sleep cycle, ie in transition from light sleep to deep sleep. It is related to the build-up and maintenance of deep sleep. Subtypes A2 and A3 are very similar to arousals due to their high EEG desynchrony. A higher rate of A2 and A3 subtypes generally implies that more arousals occurred during sleep leading to permanent activations of the cardiorespiratory system.

[00182] The machine learning system in accordance with the present disclosure is configured to classify the brain electrical activity based on the dynamical temporal characteristics which improves the classification accuracy as compared to known classification techniques. In one embodiment, the classifiers were trained employing an error function that penalised the false classification of CAP events. In one example, adoption of the F b -score as the error or loss function was found to further increase the sensitivity and precision of the machine learning system.

[00183] In addition, the machine learning method disclosed in embodiments of the present disclosure may be adopted in sensor arrangements that both measure and classify the brain electrical activity data substantially in real time to provide a real time indication of parameters that may relate to the health of a subject. While the above disclosed embodiments have been described in relation to the classification of CAP events, it would be appreciated that classification methods and systems in accordance with the present disclosure may also be trained to classify other types of sleep related brain activity including, but not limited to, arousals, sleep spindles, K-complexes or REM periods.

[00184] Those of skill in the art would further appreciate that the various illustrative logical blocks, modules, circuits, and method steps described in connection with the embodiments disclosed above may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Accordingly, embodiments may be implemented to achieve the described functionality in varying ways for each particular application.

[00185] For a hardware implementation, processing by classification processor 520 may be implemented within one or more devices or systems, including but not limited to, application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or any combination as appropriate. Software modules, also known as computer programs, computer codes, or instructions, may contain a number of source code or object code segments or instructions, and may reside in any computer readable medium such as a RAM memory, flash memory, ROM memory, EPROM memory, registers, hard disk, a removable disk, a CD-ROM, a DVD-ROM or any other form of computer readable medium. In the alternative, the computer readable medium may be integral to the processor. The processor and the computer readable medium may reside in an ASIC or related device. The software codes may be stored in a memory unit and executed by a processor. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.

[00186] An example classification system as illustrated in Figure 4 may comprise a display device

530 and classification processor 520 further comprising a memory. The memory may comprise instructions to cause the processor to execute methods in accordance with the present disclosure. The processor memory and display 530 may be included in a standard computing device, such as a desktop computer, a portable computing device such as a laptop computer or tablet, or they may be included in a customised device or system such as wearable sensor as discussed above. The classification processor 520 may be a unitary computing or programmable device, or a distributed device comprising several components operatively (or functionally) connected via wired or wireless connections.

[00187] Classification processor 520 may comprise a single CPU (core) or multiple CPU's

(multiple core). The computing device may use a parallel processor, a vector processor, or be a distributed computing device. The memory is operatively coupled to the processor(s) and may comprise RAM and ROM components, and may be provided within or external to the classification processor 520. The memory may be used to store the operating system and additional software modules that can be loaded and executed by the processor(s).

[00188] One example of a method and system for classifying sleep related brain electrical activity at a selected time in accordance with an embodiment of the present disclosure is disclosed the publication, S. Hartmann and M. Baumert,“Automatic A-Phase Detection of Cyclic Alternating Patterns in Sleep Using Dynamic Temporal Information,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 27, no. 9, pp.

1695-1703, 2019, whose entire content is incorporated by reference in the present disclosure.

[00189] An example application of a method and system for classifying sleep related brain electrical activity at a selected time in accordance with an embodiment of the present disclosure is disclosed in the publication, S. Hartmann, O. Bruni, R. Ferri, S. Redline, and M. Baumert,

“Characterisation of cyclic alternating pattern during sleep in older men and women using large population studies,” Sleep, Feb. 2020, whose entire content is incorporated by reference in the present disclosure.

[00190] Throughout the specification and the claims that follow, unless the context requires otherwise, the words“comprise” and“include” and variations such as“comprising” and“including” will be understood to imply the inclusion of a stated integer or group of integers, but not the exclusion of any other integer or group of integers.

[00191] The reference to any prior art in this specification is not, and should not be taken as, an acknowledgement of any form of suggestion that such prior art forms part of the common general knowledge.

[00192] It will be appreciated by those skilled in the art that the invention is not restricted in its use to the particular application described. Neither is the present invention restricted in its preferred embodiment with regard to the particular elements and/or features described or depicted herein. It will be appreciated that the invention is not limited to the embodiment or embodiments disclosed, but is capable of numerous rearrangements, modifications and substitutions without departing from the scope of the invention as set forth and defined by the following claims.




 
Previous Patent: PROSTATE CANCER DETECTION

Next Patent: IMPROVED BRA