Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ACTIVITY DETECTION
Document Type and Number:
WIPO Patent Application WO/2010/083562
Kind Code:
A1
Abstract:
The specification describes activity detection, such as training of an activity detector to detect activities represented in input time series data. One example of activities are swimming stokes, such as freestyle and breaststroke. Time series data representing an ordered sequence of activities (30) is provided as input. The method comprises repeatedly (60) generating a set of clusters of the time series data by iteratively (40) performing semi-Markov clustering on the data (30). A set of clusters that satisfy a criteria that is based on at least a list of activity types (32) is selected so that for each cluster of the selected set of clusters, associating an activity type based on the order of the activity types in the list. It is an advantage that the activity detector can be trained using minimal input, that is, only the time series data and the (received or known) list of ordered activity types. No annotated data is required. Aspects include methods, computer systems, activity detectors and software.

Inventors:
SUNEHAG PETER (AU)
ROBARDS MATTHEW (AU)
Application Number:
PCT/AU2010/000056
Publication Date:
July 29, 2010
Filing Date:
January 21, 2010
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NAT ICT AUSTRALIA LTD (AU)
AUSTRALIAN SPORTS COMMISSION (AU)
SUNEHAG PETER (AU)
ROBARDS MATTHEW (AU)
International Classes:
A63B69/00; G06F19/00
Foreign References:
US6441846B12002-08-27
US6993462B12006-01-31
US20050256817A12005-11-17
US20080030460A12008-02-07
US20090016610A12009-01-15
Attorney, Agent or Firm:
F B RICE & CO (44 Market StreetSydney, NSW 2000, AU)
Download PDF:
Claims:
CLAIMS:

1. A computer implemented method of training an activity detector to detect activities represented in time series data, the method comprises the steps of:

(a) receiving time series data representing an ordered sequence of activities;

(b) repeatedly generating a set of clusters of the time series data by performing semi-Markov clustering on the data by segmenting the time series data into segments and clustering segments into clusters, where each segment represents a candidate activity and each cluster represents a candidate activity type; and

(c) selecting a set of clusters that satisfy a criteria that is based on at least a list of activity types, wherein the activity types in the list are in the order they are represented in the time series data; and

(d) for each cluster of the selected set of clusters, associating an activity type based on the order of the activity types in the list.

2. The method of claim 1, wherein the activities are based on human movement.

3. The method of claim 2, wherein the activity is a swimming stroke and the activity types are any two or more of freestyle, breaststroke, butterfly and backstroke.

4. The method of any one of the preceding claims, wherein the method further comprises the step of receiving the list of activity types.

5. The method of any one of the preceding claims, wherein the semi-Markov clustering iteratively attempts to improve a similarity measure of segments within each cluster.

6. The method of any one of the preceding claims, wherein the similarity measure between segments of a cluster comprises determining a feature vector of each segment which is independent of segment duration.

7. The method of claim 5, wherein the similarity meausre is based on a measure of the distance between the feature vector of a segment of a candidate activity and a centroid measure of the corresponding cluster of a candidate activity type.

8. The method of any one of the preceding claims, wherein the clustering of step (b) is K-means clustering, where K is equal to, or greater than, the number of different activity types in the list.

9. The method of any one of the preceding claims, wherein the criteria of step (c) comprises, for a set of clusters, the number of transitions of activity types in the ordered list is the same as the number of transitions of clusters of that set of clusters, where the clusters are in time order based on the underlying time order of sets of consecutively timed segments that comprise each cluster.

10. The method of any one of the preceding claims, wherein the criteria of step (c) comprises, for a set of clusters, a number of consecutive timed segments that are in each cluster satisfies a minimum number criteria.

11. The method of any one of the preceding claims, wherein the method comprises determining a score for each set of clusters, and the criteria of step (c) comprises that the set of clusters have a score representative of being the best solution compared to the other sets of clusters.

12. The method of any one of the preceding claims, wherein the criteria of step (c) comprises the set of clusters and segments within each cluster conforming to a grammar.

13. The method of any one of the preceding claims, wherein for associating an activity type based on the order of the activity types in the list comprises associating in time order the activity types in the list to clusters that are in time order based on the underlying time order of sets of consecutively timed segments that comprise each cluster.

14. The method of any one of the preceding claims, wherein associating an activity type to a cluster comprises associating the activity type with each segment that comprised that cluster.

15. The method of any one of the preceding claims, wherein the method comprises determining a pattern characteristic that can be used to describe each activity type based on the characteristics of segments within clusters associated with that activity type.

16. Software, that is computer readable instructions stored on a computer readable medium, that when installed on a computer system causes it to perform the method according to any one of the preceding claims.

17. An automated activity detector for detecting activities represented in time series data, wherein the detector is trained according to the method of any one of the preceding claims.

18. A computer system for training an activity detector to detect activities represented in time series data, the computer system comprising: an input port to receive time series data representing an ordered sequence of activities; a processor to repeatedly generate a set of clusters of the time series data by performing semi-Markov clustering on the data by segmenting the time series data into segments and clustering segments into clusters, where each segment represents a candidate activity and each cluster represents a candidate activity type; to select a set of clusters that satisfy a criteria that is based on at least a list of activity types, wherein the activity types in the list are in the order they are represented in the time series data; and for each cluster of the selected set of clusters, associating an activity type based on the order of the activity types in the list.

Description:
ACTIVITY DETECTION

Technical Field

The disclosure concerns activity detection, such as but not limited to training of an activity detector and use of an activity detector to detect activities represented in input time series data. One example of activities are swimming stokes, such as freestyle and breaststroke. Aspects of the invention include methods, computer systems and software.

Background Art

Clustering analysis is a tool used in the data mining community and beyond. In essence, clustering allows the information of a large data set X to be summarised by creating a much smaller set C of classes and a membership map relating each point in X to its representative in C .

A time series data set is a special type of data set as it has a temporal ordering on its data points. That is,

X = {x t \ t = l,..., n} where t reflects the temporal order.

Two types of clustering of time series have historically been performed:

(1) whole series clustering - generally the data comprises a number of time series of equal length (say n ). A vector space is formed of dimension n so that each time series is represented by a single point in the space. Clustering then takes place in the usual way and groupings of similar time series are returned.

(2) subsequence clustering - generally the data comprises a single long time series data set X , The aim is to find repeating sequences of features (i.e. a pattern). Usually, subsequences of a certain length (say p ) are detected by moving a window of that size over the data in step of a certain number of points (say k ). With p =100 and A; =10 the windows considered would be [0,100], [10,110], [20,120],...,[« -100, « ]. Every subsequence is a vector of dimension 100 in this case.

The description here concerns (2) subsequence clustering. Every detected subsequence (i.e. segment) is associated with a label identifying that segment as having a pattern that is similar to other segments that also is assigned the same label. One example of time series data could be data captured from sensors (e.g. accelerometers) worn by a moving body. Clustering and pattern identification of this data can enable the identification of distinct movements of the body. For example, the time series data of three dimension observations of acceleration could be identified as a series of primitive actions (activity atoms) undertaken by the wearer of the sensors. In this example detailed monitoring of athletes is possible, which is an important part of their training.

Throughout this specification the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.

Summary

A first aspect, provides a computer implemented method of training an activity detector to detect activities represented in time series data, the method comprises the steps of:

(a) receiving time series data representing an ordered sequence of activities;

(b) repeatedly generating a set of clusters of the time series data by performing semi-Markov clustering on the data by segmenting the time series data into segments and clustering segments into clusters, where each segment represents a candidate activity and each cluster represents a candidate activity type; and

(c) selecting a set of clusters that satisfy a criteria that is based on at least a list of activity types, wherein the activity types in the list are in the order they are represented in the time series data; and

(d) for each cluster of the selected set of clusters, associating an activity type based on the order of the activity types in the list.

It is an advantage that the activity detector can be trained using minimal input, that is, only the time series data and the (received or known) list of ordered activity types. No annotated data is required. Using semi-Markov clustering, repeating patterns of segments can be identified in the data that can be associated with activity types. The detector, having been trained what an activity (i.e. segment) of an activity type is represented in the data, can now use the features of these segments for detection of activity types on future data. The activities may be based on human movement. The activity may be a swimming stroke and the activity types may be any two or more of freestyle, breaststroke, butterfly and backstroke.

The method may further comprise the step of receiving the list of activity types.

The semi-Markov clustering may iteratively attempt to improve a similarity measure of segments within each cluster, such as by extremising a cost function.

The segments may be of varying length. The similarity measure between segments of a cluster may comprise determining a feature vector of each segment which may be independent of segment duration. This feature vector represents the segment and may, for example, contain the segment's mean, variance and/or other summarising statistics. The cost function may be based on a measure of the distance between the feature vector of a segment of a candidate activity and a centroid measure of the corresponding cluster of a candidate activity type.

Each repeat of step (b) may rely on the set of clusters of a previous repetition of step (b).

Step (b) may be repeated until a further criteria is met, such as based on a number of repeats. Alternatively, or in addition, step (b) may comprise for each repeat determining a score, such as an accuracy score, for the set of clusters, and step (b) may be repeated until a change in the score over a one or more repeats is less than an amount, such as representative of little improvement in the accuracy score.

The clustering of step (b) may be K-means clustering, where K is equal to, or greater than, the number of different activity types in the list.

The criteria of step (c) may be for a set of clusters, the number of transitions of activity types in the ordered list is the same as the number of transitions of clusters of that set of clusters, where the clusters are in time order based on the underlying time order of sets of consecutively timed segments that comprise each cluster. That is the timing of a cluster is the time of each first segment of each set of time consecutive segments within that cluster. The criteria of step (c) may be for a set of clusters, a number of consecutive segments in time order that are in the same cluster satisfies a minimum number criteria.

The criteria of step (c) may comprise the set of clusters and segments within each cluster conforming to a given grammar.

The method may further comprise determining a score for each set of clusters, and the criteria of step (c) comprises that the set of clusters have a score representative of being the best solution compared to the other sets of clusters.

For each cluster associating an activity type based on the order of the activity types in the list may comprise associating in time order the activity types in the list to clusters that are in time order based on the underlying time order of sets of consecutively timed segments that comprise each cluster. That is, pairing time consecutive segments in the same cluster to activity types in the list in time order. More specifically, the first cluster comprised of the first segment in the time series data and any consecutive segments also comprising the first cluster is associated with the first activity type on the list. The next segment in the time series data associated with a second cluster and any consecutive segments also comprising the second cluster is associated with the second activity type on the list.

Associating an activity type to a cluster may comprise associating the activity type with each segment that comprised that cluster.

The method may further comprise determining a pattern characteristic that can be used to describe each activity type based on the characteristics of segments within clusters associated with that activity type.

In a second aspect, a computer system is provided, such as an activity detector trainer, is provided for training an activity detector to detect activities represented in time series data, the computer system comprising: an input port to receive time series data representing an ordered sequence of activities; a processor to repeatedly generate a set of clusters of the time series data by performing semi-Markov clustering on the data by segmenting the time series data into segments and clustering segments into clusters, where each segment represents a candidate activity and each cluster represents a candidate activity type; to select a set of clusters that satisfy a criteria that is based on at least a list of activity types, wherein the activity types in the list are in the order they are represented in the time series data; and for each cluster of the selected set of clusters, associating an activity type based on the order of the activity types in the list.

In yet a further aspect, software is provided, that is computer readable instructions stored on a computer readable medium, that when installed on a computer system causes it to perform the method described above.

In another aspect an automated activity detector is provided for detecting activities represented in time series data, wherein the detector is trained according to the method described above.

Advantages of at least one embodiment includes:

By repeating step (b) the time series data is repeatedly segmented resulting in dynamic segmentation of the time series data.

The method is unsupervised.

The ordered list of activity types make it possible for us to connect the segments with names provided by the user. This list also helps us choose among the results of the restarts for the semi-Markov clustering. It does this by narrowing the choice down to results that has the property that the activity types that have persisted for at least a specified number of consecutive instances all have a unique corresponding type in the provided list of activity types.

It is an advantage that minimal annotated input is required which is unlike existing learning/training systems that require the input to be annotated with segment information, such as beginning and end times, and correspondingly activity type. By avoiding the need for highly annotated input the method has the advantage of avoiding time communising and expertise demanded to generate it.

The method is suited to learning complex tasks as it avoids the creation of a hard coded rule system for each activity type. The training of the activity detection system in this way has the advantage of only requiring a list of ordered activity types and will automatically detect activities within the input data from which many details about the activities in the data can be extracted. The method is also able to learn to detect multiple activities and to deal with increasingly complex inputs of activity type lists. Further, the method can be applied repeatedly to learn to detect activities that belong to say, different users. In this way the method is able to easily accommodate for differences between users.

The method does not rely on sliding windows to perform time series clustering, instead semi-Markov clustering is used.

Brief Description of the Drawings

A non-limiting example will now be described with reference to the accompanying drawings, in which:

Fig. l is a flow chart of the semi-Markov Kmeans clustering training technique described;

Fig. 2 is a sample flow diagram of this technique;

Figs. 3 to 9 are pseudo code explaining in further detail steps of the flow diagram of Fig. 2;

Fig. 10 is a schematic diagram of a computer system used in one example;

Fig. 11 (a) schematically shows a sample grammar and Fig. l l(b) shows that grammar applied to the swimming example of Fig. 10;

Fig. 12 schematically shows a special case of the grammar in Fig. 11;

Fig. 13 schematically shows the application of the additional grammar and heuristics to of segmenting in the time series data; and

Figs. 14 and 16 show graphically the results of various implementations.

Best Modes

Subsequence clustering aims to find patterns that appear repeatedly in time series data. We describe here a novel subsequence clustering technique that is called here "semi- Markov kmeans clustering".

As will be described in further detail below, the novel technique will be performed by a training module. The aim of the training module is to train a computerised activity detector to detect activities represented in time series data. The training module produces the time series data that is segmented into primitive activities (i.e. activity atoms). That is, each segment represents a primitive activity (such as a swimming stoke, step or push up). Each segment of the same activity type is given the same label, and each label is associated with an activity type in a ordered list of activities types that are represented in the time series data. This training method is unsupervised and does not require segments of fixed length - instead it works with all possible segmentations to find one way of segmenting the sequence into consistent patterns, preferably an optimal way.

The output from the training module can then be used by an activity detector to automatically identify the activities in further time series data.

The method performed by training module in this example will now be described in more detail and is shown in reference to Fig. 1.

Input

We assume that we are given a sequence of observations x , i.e. a multi-dimensional time series 20(a), and that there exists an unknown associated sequence of labeled segments ? 20(b) describing the sequence of events which has generated x . We say that pairs ^ x ' ^' have been drawn from an unknown probability distribution.

The data x , of length N , consists of a sequence of observations X ' , l ~ l> > ™ . Associated with x are labels ' where ^ - { *> •■■ > L ) denotes the set of possible labels. The labels, together with segment start positions, form what we call a labeled segmentation ^ .

Since we want to design models which have a long-range persistence of labels, we typically express ^ in a compressed representation as a ^ st of pairs .y - U«oΛλ-Λ«m > 'm) / of segment boundaries "' and associated labels ' . In other words, the segment K-, +i '-' J bears the label /( . Moreover, "° = ° and l ° = start to indicate the left boundary, " m to denote the end of the whole sequence, and n ' +l n ' for all ' to denote consecutive labels. Note that the number o f se g me nts m itself is variable. We will restrain the durations of a segment by giving minimum and maximum allowed durations and in some special situations we will also consider incorporating prior knowledge about it through other restrictions which we here summarize by (n, , /, ) € .?(«,_, ,/,_, ) where S(n,l) is the set of labeled segments that can immediately follow the segment that ends at n and has label / . That the restriction is only based on adjacent segments is the definition of the semi-markov property which is desirable since it is compatible with dynamic programming. It can, however, be loosened to a higher order condition which allows dependence for a fixed number of consecutive segments. The set S(n,l) is defined by a combination of a grammar 20(d) which says which label transitions are allowed, e.g. in a gym a person who is working out on the training bicycle cannot directly transition to working out on the rowing machine without having some state like standing or walking in between. Grammars will be described in further detail below. We also restrict the possible durations of a segment by giving minimum and maximum allowed length and we can also for computational reasons introduce what we call a step size which regulates which segment lengths between the minimum and the maximum lengths that we will consider.

Also a stopping criteria is provided, such as a convergence threshold or number of iterations.

Inputs of number of clusters, stopping criteria and the grammar may be predetermined and accessed from memory by the training module.

Formulation of the Semi-Markov kMeans Clustering Method

Semi-Markov means dependencies are local, but based on segments rather than points.

Semi-Markov kMeans attempts to find a labeled segmentation

to minimize a loss function. This loss function is designed to obtain a segmentation where all segments which are given the same label are found to be "similar", according to some notion of similarity with cluster centroids. The similarity is measured based on a square euclidean distance between a feature vector of a proposed segment and the centroid corresponding to the proposed cluster. These centroids are found in an unsupervised fashion, and are analogous to the cluster centroids in the traditional kMeans algorithm. We define a feature φ(x, n t _ x , n, ) for every segment such that the feature vector's dimension does not depend on the length of the segment. This enables us to compare segments of different lengths. These calculations can be distributed over the sensors, as long as the features for each sensor are calculated based only on the data from that sensor. In other words features and similarities can be calculated for each individual sensor, in parallel, and later combined.

We define our feature vector in the following manner; Given a segment starting at index p of length / , we firstly take e.g. ten means for each channel in x , starting at indices p, p + — ,..., p + l , over a length of — . This both resolves the problem of

10 10 10 comparing segments of different lengths and performs a time-warping that enables us to recognize patterns even when dilated.

The number of bins can be varied, and in our ECG implementation below we also modify the bins by subtracting the mean of the whole segment. This is important if one wants to be able to recognize the same pattern at different height. Secondly we use derivative features which are defined as the difference between a given mean bin and the preceeding mean bin. Ten mean bins therefore result in nine derivative features. The third kind of features we use is the variance over the whole segment. Variance can be binned like mean features, however, it has been found unnecessary for this study. For every implementation described below we will describe the exact choice of features used. In our semi-Markov kMeans algorithm, we find a labeled segmentation y * of a given data sequence x by performing the following minimization:

y := aτgminF(x,y;μ) (1)

where we would ideally like to use

which is the average square distance of the feature vector for the segments from y = ,l m )} to their class centroids μ(l,). This objective is useful for picking class centroids given training data, however, (2) is ill suited for the optimization in (1). The problem here is that the essential dynamic programming requires the objective to be of a different form, on which (2) can not be rewritten due to the variablity in the number of predicted segments. Instead we take advantage of the fact that if all segments have the same length η and the total length of the sequence is

1 n

M then — = — and we can move η into the summation, resulting in the objective that mm M we use, namely

where η, = n t - «,_, , i.e. the segment length of the / : th segment, and we have removed the irrelevant factor — . If all η t are equal, this formulation is equivalent to the ideal M

(2). Note that if we did not use either an average distance or the factors η t then we would have a bias towards having as few and long segments as possible since that would yield fewer terms in the sum. Given a segmentation y , then for every label / we let

μ(l) = —∑φ(x,n,_ ] ,n ι ) (4) mι /,=i

where m, is the number of segments labeled / . This is minimizing (2).

Given centroids and a data sequence, the predicted segmentation is calculated by minimizing (3). This prediction step is actually in itself a novel supervised method that we call the semi-Markov nearest centroid method. It can be trained from labeled segmentations by simply calculating the centroid of each label through (4). Dealing with the problem of how to perform the minimization in a tractable manner through dynamic programming is described further below.

Gaussian Model

We will introduce a modified and generalized formulation of the objective function by interpreting the segment scores in the previous section as the logarithm of the probability density function of a Gaussian distribution. The previous section's method corresponds to using Gaussians with the same covariance matrix for all classes. The covariance matrices were a positive constant times the identity matrix. We now generalize this model by using a not necessarily constant diagonal covariance matrix. Moreover, taking a probabilistic approach gives a better theoretical foundation, and allows us to model classes with variable widths, and shapes. When performing activity recognition we will have data where running has more variability than working out on a training bicycle. We note here that we take the case where each class has a diagonal covariance matrix, but also acknowledge the further generalization to any covariance matrix. We now, using the multivariate Gaussian probability density function, formulate the new objective function as

y' := arg max F(x, y; μ, ∑) (5)

where

for P(x, y t \μ, ∑) given by

where ∑(/) is the covariance matrix for our model of class / .

Given training data, the centroids and standard deviations are chosen through maximum likelihood estimation for each class using the gaussian density function.

More formally,

where m t is the number of segments with label / in the training data. This optimization is achieved by setting:

and ∑(/) is set to

— ∑ {φ(x, n,_ λ ,n, )-μ, μ, J (10)

where the centroid μ, in (10) corresponds to that found in (9). We have decided to replace ∑(/)with a matrix that agrees with it for the diagonal elements but whose non- diagonal elements are zero.

Dynamic Programming

Our problem of optimizing (3) and (6) with the appropriate maximization and minimization function is a so called max-sum problem of the form

aig max /(«;_, , «,',/,') (11)

For this kind of problem, dynamic programming can be carried out in the following manner;

Denote by V{n,l) a score function and let U(n,l) be a tuple of a position and label. In this case we may carry out dynamic programming via

U(n,l) := argmax V(n'J')+ f(n', n,l) (12)

(n'/)w,th(n,l)eS(n',r)

V(n,l):= mμxV{n',l')+ f{n',n,l) (13)

\ n J )

Once the recursion reaches (N,-) (here '-' denotes the final null label) we may traverse U(n,l) backwards to obtain a full segmentation of the sequence.

Prior On Variance With the preference of finding "tight" clusters, we now put a prior on the variance of the gaussian for each cluster. Our choice of prior, although not the only choice, is the inverse gamma distribution as it is a conjugate of the gaussian distribution.

Now with such a prior in our armory we reformulate our score function as

F(x,y; μ, σ) = ∑ log(p(x, y, (15)

where p(σ) is drawn from the inverse gamma distribution. This has the effect that it changes our update of the variance such that

The output is a labeled segmentation of y .

This training technique is summarised by the flow diagram of Fig. 2 and in the pseudo code UNSUPERVISEDLEARNER of Fig. 3 Here we can see that the time series data 30 and a list of activity types 32 is provided as input into the training module.

Initially the matrix ® is initialised 34, and this is described in further detail in the pesudo code of Fig. 4. The matrix ® is initialised from a random sampling of the raw data. As shown in Fig. 5(a) Θ is comprised of KCLASS columns 36 and NUMBINS rows 38. Each columns 36 (a single class) corresponds to a particular type of segment/atom and arranged by bin type. Each bin corresponds to a particular shape in the underlying data. The segment/atom is a non linear function of the data.

Alternatively, INITIALISEMAP can rely on the results of any previous segmentations rather than generating ® at random.

Line 7 of Fig. 4 calculates the "bin statistic" for the bin. In this example, bins are: 1. mean bins: the statistic is the mean of the data segment 42 2. difference bins: the statistic is the difference between two adjacent mean bins 44

3. variance bins: the statistic is the variance on the data segment 46 The enumeration above defined the values for t in line 4.

Next the labelled segmentation is built via an iterative loop 40. The labels are stored in a list Y = {y x ,y 2 ,..] where each label y t corresponds to a segment and Y is the labelling. GENERATESEGMENTS of Fig. 5(b) finds the optimal labelling Y', which is a solution to

2 Y' = arg min ]T \GENERA TEMAP(rawData, y, ) - Θ[: , y, Jabel\\ x y, .length

The VITERBI algorithm (shown in Fig. 6) of GENERATESEGMENTS effectively tries every possible labelling. Viterbi is, in this case, enabled by the Markov restriction on the atoms. Further, the recursive step forces the next segment to start at the end of the current segment, while allowing the segments to be of variable length (see lines 5 and 11 of Fig. 6). Since the method is Markov with respect to atoms, it is semi-Markov with respect to the underlying data.

The lines shown in Fig. 7 are added between lines 4 and 5 in the VITERBIQ of Fig. 6, to make the recursion use a dynamic programming approach. This prevents the algorithm from re-calculating labels scores which have previously been processed.

The determined Y is then used to update the matrix map, see Fig. 8 for UPDATEMAP.

Then referring back to Fig. 3, if the score of the labelling Y is larger than the score for the matrix map (i.e. a criteria) and Y represents a valid solution then score for Y of the most recent iteration 40 is made the new score to improve on 48 for future repeats 60 map score and the updated map θ is stored as representing the best possible solution so far.

Further criteria are applied by checking for a valid solution which is performed by REALITYCHECK shown in Fig. 9. The number of atoms in time series order that share the same label is counted and this count is checked 70 to determine whether this meets the minimum labels per group criteria pervious set. Also, the number of transitions between the labels when considered in time series order is checked 72 that it is equal to the number of transitions in the user's input of the ordered list of activity types. That is, the number of times the label of consecutive segments change in Y must be equal to the number of changes in the activity types in the user's inputted list (i.e. the number of activities in the user's inputted list minus 1). For example, when considering the labels of Y in time order, Y represents the following list (where each label in this list belongs to multiple consecutive segments): backstroke freestyle backstroke breast stoke There are three transitions in this list.

If the user's inputted activityList has three activities listed, then it will not satisfy this check. If the user's inputted acitivityList has four activities listed, then it will satisfy this check.

In this way the selected set of clusters that will be used as the basis for the output of the method meets certain criteria. The criteria based on the ordering of the activity list is shown in Fig. 3 as being tested by REALITYCHECK within the repeating loop of 60.

Alternatively, these criteria can be applied outside the loop 60, such as testing all the generated sets of clusters for the one set that meets the REALITYCHECK and is the best scored outside repeating loop 60 in order to select the optimal set of segments.

Using the output of the training described here, the activity detector can now be used to detect activities in further time series data received, such as input from the swimmer's further training. Max-margin semi-Markov model can be used. One among many alternative detection algorithms is the semi-Markov nearest centroid method which appears in the training above. Using the semi-Markov nearest centroid method is equivalent to using the GENERATESEGMENTS function with the patterns stored during training in the matrix ® .

Implementation 1 In this example time series data taken from sensors worn by a swimmer during training is provided. Initially an activity detector is trained, which then can identify the strokes in further received data. This in turn allows for a comprehensive break-down of training sessions, including lap times, detailed statistics of strokes and turns.

As schematically shown in Fig. 10, the swimmer 10 is wearing sensors 12 while performing an activity. The sensors 12 may be three dimensional accelerometers that measure the swimmer's movement in three directions. In this example, the multiple sensors 12 are attached to key swimming movement areas of the swimmer and are each wireless transmitters that cause the sensed data to be transmitted in real time to the computer 14. Alternatively, the time series data may be provided to a computer after the activity is completed, such as through a wired connection.

The time series data representing an ordered sequence of activities, in this case swimming strokes, is received at the input/output port 16 of the computer 14, such as a bluetooth receiver. The processor 18 causes the received sensed data is stored in the internal memory 20 of the computer 14.

Interacting with an interface, such as an interface driven by the processor and displayed on a monitor (not shown) connected to the port 16, a user such as the swimmer or the swimmer's coach is able to provide further input of a list of activity types in an order they are represented in the time series data. This may be done by selecting the swimming strokes in order from a pick list displayed on the interface. This activity list is also received at the port 16 and stored in internal memory 20 of the computer 14.

There are two main modules operated by the processor 18, a training module 18(a) and an activity detection module 18(b). This technique described concerns the training module 18(a). Using installed software, the processor 18 accesses the internal memory 20 to obtain the required input and operates in the method described in detail above.

The aim of this training is to process the received data as described above to identify segments in the data of primitive activities, so that each segment represents a certain type of swimming stroke, and all segments representing the same swimming stroke share similarities. The output of the segmented data is provided as port 22 and again is displayed using the monitor to the user.

The output may be stored in internal memory 20 as an invertible map from clusters to activity types in the provided list which can translate the cluster list to the given activity type list and back again. This map is ready for use by the activity detection module 18(b) on further data received to guide the segmentation process. The timing of the start and end of each segment, and segment groups, can be used by the processor 18 to provide further output on various statistics on timing that are sent to the output port.

In this example the training module 18(a) and the training module 18(b) are contained within the same computer system 14. In other embodiments, these modules 18(a) and 18(b) may be provided on separate computer systems.

Example Grammars

As explained above grammars can be provided as input to guide the clustering technique. Grammars provide flexibility by avoiding hard coding of these rules. These predefined grammars define allowable state transitions. In this example, a grammar with one or more repeating states is used.

Fig. 11 (a) shows a sample grammar that can be applied to repeatable activities, such as gymnasium exercises or swimming. This grammar displayed includes a simple repeating pattern 50, that is one state that repeats. This simple repeating pattern 50 is contained within a complex repeating pattern having multiple states in a specified order. This case the complex repeating pattern includes an activity 52 before the simple repeating pattern 50 and a further activity 53 after 50. In this grammar 52, 50 and 53 are repeated in this order. The grammar also takes account that some activities may occur at the very start 56 and at the very end 58 that are not part of the complex repeating grammar.

Fig. l l(b) shows the grammar of Fig. 11 (a) applied to swimming. The simple repeating pattern 50 is the swimming strokes. The push 52 and a turn 53 can bound every lap of a particular stroke. At the start an activity 56 such as a dive or splash that is not part of a repeating pattern can be provided for, in the same way climbing out of the pool 58 is provided outside the complex repeating pattern. Fig. 12 schematically shows a four stroke system, with repeating stroke types. This grammar can be considered a special case of the grammar of Fig. 1 l(a).

Fig. 13 shows the application of the grammar of Fig. l l(b) and additional heuristics to the detection and understanding of the received data 100. The raw data 100 is a time series of unlabeled measurements from the sensors described for Fig. 10. Labels are also received, such as Freestyle laps and then a butterfly lap, but no additional timing information is provided.

The first pass 108 of the algorithm shown in Fig. 3 detects simple repeating patterns. The predetermine grammar is then accessed 110 and the simple repeating pattern of the grammar to condense each set of repeating states into 1 meta-state group. A meta state is a variable in length in that they contain different number of strokes. The new "meta data" now contains unknown, unclassified data 104 and classified data 102.

A second pass 112 of the algorithm of Fig. 3 is applied to detect complex patterns in combined group and unknown data. For example, using the complex grammar to evaluate the presence of turns or terminals in the data. This step may also use heuristics, such as a lap of one type of stroke, followed by a lap of a stroke with a short intervening section of unclassified data has a high probability of being a lap-turn-lp sequence. A heuristic may be applied to the unknown data to estimate likelihood of "turn" within the data, given prior knowledge of the surrounding stroke-types.

Next the grammar 114 is applied again to condense each set of complex repeating states into higher-level descriptions.

This process may be continued for successively more complex activity grammars.

In not-swimming instantiations, the simple-repeating activity may be a "step" on a treadmill or a "lift" of a weight and the grammar can be applied to the gymnasium setting.

Cylinder-Bell-Funnel Implementation

The original CBF task was formulated by Saito (Saito, 1994) and consists of classifying a time series as one of three classes; Cylinder, Bell or Funnel. This publicly known problem, through high noise levels and variable onset and duration of states. 128 samples long time series' are sampled by first drawing h , letting it have a normal distribution with mean 0 and variance 1. For every time sample / we draw ε (t) from the same distribution iV(θ,l). Furthermore an integer a is drawn uniformly from the interval [16, 32] and then we let b = a + l where / is drawn uniformly from the interval [32, 96]. The time series are defined from these numbers by letting

c(ή = (6 + h)l [a b] + ε(ή (14)

* ( ' ) = (6 + Λ) W' ) £ϊf + s( ' ) (15)

where in all of these equations l [o i i(t) is the indicator function which equals one if t € [a,b] and otherwise equals zero.

We simply create one 12800 samples long time series by randomly, with equal probability, choosing one of the three patterns at a time, one hundred times and concatenate the one hundred time series after each other. For every one of the one hundred subsequences we label every time sample with the corresponding name, i.e. Cylinder, Bell or Funnel if t is in the drawn interval [a,b] and for the other samples we use the label flat.

The features used were 10 mean bins and 9 derivatives, scaled by a factor of 5 to help further emphasize slopes on the shapes. We used the traditional kmeans score function as in (3) with 10 restarts of 30 iterations, and k = 4 .

Fig. 14 shows the original CBF data with segment boundaries and tokens as found by semi-Markov kmeans.

Fig. 15 shows a "reconstruction" of the original CBF data by replacing each segment with the corresponding centroid found by semi-Markov kmeans. More specifically this reconstruction is found by dilating the centroid such that it is the same length as the corresponding segments. Activity Recognition during Exercise - Gym implementation 1

We collected a gym dataset in which 3 -dimensional acceleration data was sampled at 100Hz via a Catapult Innovations minimaxX device placed in a pocket at the hip. The subject was doing cross-training, treadmill running, training bicycle and rowing. The features used were 20 mean bins, 19 derivative bins and the variance of the segment. We demanded a consistency of at least 5 consecutive segments of the same type. This was run with k = 10 .

Activity Recognition during Exercise - Gym implementation 2

We collected an office exercise dataset in which an iPOD touch, which contains a 3- dimensional accelerometer, was strapped to a persons chest. We collect two separate sessions of the office exercise with the activities undertaken in different orders but by the same person. This shows the techniques ability to easily train a device to recognize their chosen activities. We will refer to these datasets as office 1 and offϊce2.

The activities undertaken were back extensions, side twists, push-ups, sit-ups and squat jumping. The features used were 20 mean bins, 19 derivative bins and the variance of the segment. We demanded a consistency of at least 5 consecutive segments of the same type. This was run with k = 10 .

Here we provide a discussion of the ability to train an activity recognition system given only an ordered set of activity types as annotation. This is easily provided by nonexpert users. In this case for the dataset Office 1 which was used for training, the user would give the following list of activity type names;

1. Sit-ups

2. Push-ups

3. Squat-jumps

4. Back extensions

5. Side twists

The result of running the semi-Markov kmeans clustering algorithm on Office 1 and the given activity list, the system was able to identify the cluster numbers with the activity names provided by the user. This can be used to predict the segments in Office2 using the semi-Markov nearest centroid classifier that has been trained by performing the clustering on Office 1. The result is a consistent segmentation into the correct activities. In the test set they appear in the order; Back extensions, side twists, squat jumps, pushups, sit ups. The segmentation enables the system to count the number of repetitions of each activity, calculate rates and total time.

Some Alternatives

It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the method, computer system, activity detector and software as shown in the specific embodiments without departing from the scope of the invention.

The examples can be applied to other sports, such as skiing, as well as numerous domains, such as patient care, chronic disease management and promotion of lifelong health and well-being for the aging population. For example, monitoring the rehabilitation of patients wearing sensors in post-ambulatory conditions, detecting abnormalities in ECG data, gait detection for children suffering cerebral palsy and monitoring Parkinson's disease patients.

The examples can be applied to data that is captured in time series from which repeating atoms can be identified, such as financial and physiological data. For example, with financial data the example can be used learn and detect rule discovery.

Alternatively, the example can be used in the task of making mobile devices aware of the context of the user. In that application the sensor data may again be interpreted as consisting of primitive activities whose frequency can indicate higher level context like driving a car.

The example here has been described with K-means used as the clustering technique. Other clustering methods can be extended to the semi-Markov framework, such as Gaussian mixture models trained with the Expectation Maximization (EM) method and hierarchical clustering.

The supervised classification and segmentation part, can be another method that can learn from fully segmented and labelled time series training data, such as semi-Markov conditional random fields (semi-CRF) and hidden Markov models models (HMM). The function of the computer 14 shown in Fig. 10 may be contained in a housing with a sensor, such as a device worn on the wrist. The processing may be distributed, for example the processing may be distributed between two or more of the sensor device, a remote PC or a computer device permanent at the site of the activity itself, such as the pool.

It should be understood that the techniques of the present invention might be implemented using a variety of technologies. For example, the methods described herein may be implemented by a series of computer executable instructions residing on a suitable computer readable medium. Suitable computer readable media may include volatile (e.g. RAM) and/or non-volatile (e.g. ROM, disk) memory, carrier waves and transmission media (e.g. copper wire, coaxial cable, fibre optic media). Exemplary carrier waves may take the form of electrical, electromagnetic or optical signals conveying digital data steams along a local network or a publically accessible network such as the internet.

It should also be understood that, unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that processes and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.