Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
IDENTIFYING ACTIVITY STATES OF A CAMERA UNIT
Document Type and Number:
WIPO Patent Application WO/2016/038385
Kind Code:
A2
Abstract:
Activity states of a camera unit that that comprises an image sensor and at least one motion sensor are identified activity in dependence on changes in a feature vector comprising plural features each representing a characteristic of the motion detected by the motion sensor. Images captured by the image sensor are divided into plural groups of images in dependence on the identified activity states, which facilitates curation of the images.

Inventors:
LEIGH JAMES ALEXANDER (GB)
Application Number:
PCT/GB2015/052635
Publication Date:
March 17, 2016
Filing Date:
September 11, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
OMG PLC (GB)
International Classes:
H04N5/232
Attorney, Agent or Firm:
MERRYWEATHER, Colin Henry (Gray's Inn, London Greater London WC1R 5JJ, GB)
Download PDF:
Claims:
Claims

1. A method of identifying activity states of a camera unit that that comprises an image sensor arranged to capture images and at least one motion sensor arranged to detect motion of the camera unit,

the method comprising identifying activity states of the camera unit in dependence on characteristics of the motion detected by the motion sensor.

2. A method according to claim 1, further comprising assigning images captured by the image sensor to plural groups of images in dependence on the identified activity states of the camera unit when the images were captured.

3. A method according to claim 2, further comprising selecting one or more representative images from each group and displaying the representative images as representations of the groups.

4. A method according to claim 2 or 3, further comprising:

displaying a representation of each group;

accepting user-input selecting a representation of groups; and

displaying the images in a group when its representation is selected.

5. A method according to any one of claims 2 to 4, further comprising selecting one or more representative images from each group and generating a summary video comprising all the representative images displayed successively.

6. A method according to any one of the preceding claims, the activity states of the camera unit are identified in dependence on a feature vector comprising plural features each representing a characteristic of the motion detected by the motion sensor. 7. A method according to claim 6, wherein the features include at least one feature dependent on the size of the motion detected by the motion sensor.

8. A method according to claim 6 or 7, wherein the features include at least one feature dependent on variation in the motion detected by the motion sensor.

9. A method according to any one of claims 6 to 8, wherein the feature vector is derived from the motion detected by the motion sensor in a rolling window of time.

10. A method according to any one of claims 6 to 9, wherein the features are derived from a scalar quantity that represents the motion detected by the or each motion sensor and is not dependent on the orientation of the camera unit.

11. A method according to any one of claims 6 to 10, further comprising deriving the feature vector.

12. A method according to any one of claims 6 to 11, wherein the activity states are identified in dependence on changes in the feature vector.

13. A method according to claim 12, wherein the step of identifying activity states of the camera unit comprises deriving a measure of difference between successive feature vectors, the activity states being identified from the derived measure of difference.

14. A method according to claim 13, further comprising:

deriving the derivative of the measure of difference;

identifying significant peaks and troughs in the derivative of the measure of difference; and

defining boundaries between activity states based on the identified peaks and troughs, whereby the activity states are identified as occurring between the boundaries.

15. A method according to claim 14, wherein the step of identifying significant peaks and troughs in the derivative of the measure of difference comprises identifying peaks and troughs in the derivative of the measure of difference that have a magnitude exceeding a derivative threshold.

16. A method according to claim 15, wherein the derivative threshold is derived dynamically from the derivative of the measure of difference.

17. A method according to any one of claims 12 to 16, further comprising

determining, in respect of adjacent activity states, a merge score on the basis of at least one of: similarity between the feature vectors derived during the adjacent activity states; the number of images captured during the adjacent activity states; and the change in the feature vectors occurring at the identified boundary; and

selectively rejecting identified boundaries between adjacent activity states on the basis of the merge score, whereby the adjacent activity states are merged.

18. A method according to any one of the preceding claims, wherein the at least one motion sensor includes at least one of a motion sensor that detects translational motion and a motion sensor that detects rotational motion.

19. A method according to any one of the preceding claims, wherein the camera unit is arranged to cause the image sensor to capture images intermittently without triggering by a user. 20. A method according to claim 19, wherein the camera unit comprises plural sensors arranged to sense physical parameters of the camera unit or its surroundings, and the method is performed intermittently in response to the outputs of the sensors.

21. A camera unit comprising an image sensor arranged to capture images and at least one motion sensor arranged to detect motion of the camera unit, the camera unit being arranged to perform a method according to any one of the preceding claims.

22. A computer program executable on a computer apparatus and arranged on execution to cause the computer apparatus to perform a method according to any one of claims 1 to 20. A computer apparatus arranged to perform a method according to any one of claims

Description:
Identifying Activity States Of A Camera Unit

The present invention relates to a camera unit that that comprises an image sensor arranged to capture images.

As camera units are miniaturised and incorporated into a wide range of portable electronic equipment, and as data storage becomes cheaper and more capacious, users tend to capture large and increasing numbers of images.

Large numbers of images are captured with the example by a camera unit of a wearable camera, that is a camera unit incorporated into wearable equipment that may be worn by a user. When so worn, the camera is continuously available to capture images, and large numbers of images tend to be captured as a result. The camera unit may be arranged to cause the image sensor to capture images intermittently without triggering by a user, for example in response to the output of sensors that sense physical parameters of the camera unit or its surroundings. Such automatic triggering again has the result of increasing the number of images captured.

Although the high volume of images provides rich and interesting imagery, it creates a management issue for the user. For example, reviewing and curating large sets of images manually is time consuming for the user. Typically, users do not want to invest large amounts of time to curate image data. To improve the user experience, it would be desirable to reduce the time required for the user to manage the images.

Some solutions for image grouping and automatic curation which may assist the user are known as follows.

A first approach uses image processing algorithms. Image similarity and automatic curation is an active area of research within the image processing and computer vision community. Suitable algorithms typically focus on analysing the content in the images in some manner. There are many variants of such algorithms. However, they have a relatively high processing burden, and generally there is a trade-off between quality of results and the amount of processing power required, which has a knock-on effect on the time and/or hardware required to process the images. Whilst this approach is potentially powerful, the quality and processing constraints limit its applicability to a consumer product that generates large data sets, such as a wearable camera.

A second type of approach is to perform manual tagging. This is currently in widespread use and involves a user associating one or more textual tags with the images, which effectively group the images. This approach allows the generation of meaningful tags and groupings. However, it places a high requirement on the user's time, making it undesirable for a large number of images.

A third approach is to use geo-location data such as GPS (Global Positioning System). The geo-location data identifies the location at which an image was captured and may be obtained automatically by the camera unit at the time of image capture. The geo- location data allows the images to be grouped by location and displayed on a map.

However, the application of this approach is restricted by the availability of the geo- location data. Such geo-location data might be unobtainable in some environments such as indoors. Such geo-location data also typically has a relatively high power requirement meaning that it is incompatible with provision of a long battery life in a portable device such as a wearable camera.

According to the present invention, there is provided a method of identifying activity states of a camera unit that comprises an image sensor arranged to capture images and at least one motion sensor arranged to detect motion of the camera unit,

the method comprising identifying activity states of the camera unit in dependence on characteristics of the motion detected by the motion sensor.

This method makes use of a motion sensor that is provided in the camera unit. Such a motion sensor is commonly provided in a portable electronic device. For example, such a motion sensor may be provided as one of the sensors in the case of a wearable camera that is arranged to cause the image sensor to capture images intermittently, without triggering by a user, in response to the output of sensors that sense physical parameters of the camera unit or its surroundings. It has been appreciated that such a motion sensor allows the identification of activity states in dependence on characteristics of the motion of detected thereby. Different activities of the camera unit, and hence the user carrying or wearing the camera unit, cause the motion to occur with different characteristics. As a very simple example, the motion will be different when the user is seated, walking or running.

The identified activity states may be used to facilitate the curation of images captured by the camera unit. For example, images captured by the image sensor may be assigned to plural groups of images in dependence on the identified activity states of the camera unit when the images were captured. Such assignment of the images into groups is therefore based on user activity, as indicated by the detected motion of the camera unit, rather than being based on image content or geo-location data (although such other techniques could be used in combination).

These advantages are particularly relevant to wearable cameras with which images are typically captured when the user is undertaking a range of different activities, in contrast to a conventional camera for which the user will typically cease other activity to take a photograph. This assists in identifying different activity states, and conversely makes the grouping in dependence on the activity state of significance to the user.

According to further aspects of the present invention, there are provided a camera unit, a computer program or a computer apparatus that is capable of performing such a method.

An embodiment of the present invention will now be described by way of non- limitative example with reference to the accompanying drawings, in which:

Fig. 1 is a schematic block diagram of a camera;

Fig. 2 is a schematic view of the camera showing the alignment of rotational axes with the image sensor;

Fig. 3 is a flow chart of a method of deriving feature vectors from detected motion; Fig. 4 is a flow chart of a method of identifying activity states from the feature vectors; and

Fig. 5 is a flow chart of a method of grouping and curating the images based on the identified activity states.

Fig. 1 is a schematic block diagram of a camera 1 comprising a camera unit 2 mounted in a housing 3. The camera 1 is wearable. To achieve this, the housing 3 has a fitment 4 to which is attached a lanyard 5 that may be placed around a user's neck. Other means for wearing the camera 1 could alternatively be provided, for example a clip to allow attachment to a user' s clothing.

The camera unit 2 comprises an image sensor 10 and a camera lens assembly 11 in front face of the housing 13. The camera lens assembly 11 focuses an image of a scene 16 on the image sensor 10 which captures the image and may be of any suitable type for example a CMOS (complimentary metal-oxide-semiconductor) device. The camera lens assembly 11 may include any number of lenses and may provide a fixed focus that preferably has a wide field of view.

The size of the image sensor 10 has a consequential effect on the size of the other components and hence the camera unit 2 as a whole. In general, the image sensor 10 may be of any size, but since the camera 1 is to be worn, the image sensor 10 is typically relatively small. For example, the image sensor 10 may typically have a diagonal of 6.00mm (corresponding to a 1/3" format image sensor) or less, or more preferably 5.68mm (corresponding to a 1/3.2" format image sensor) or less. In one implementation, the image sensor has 5 megapixels in a 2592-by-1944 array in a standard 1/3.2" format with 1.75μιη square pixels, producing an 8-bit raw RGB Bayer output, having an exposure of the order of milliseconds and an analogue gain multiplier

In normal use, the camera unit 2 will be directed generally in the same direction as the user, but might not be directed at a scene that has a natural point of interest since the user does not know when image capture will occur. For this reason, it is desirable that the camera lens assembly 11 has a relatively wide field of view ("wide angle"). For example, the camera lens assembly 11 may typically have a diagonal field of view of 85 degrees or more, or more preferably 100 degrees or more.

The camera unit 2 includes a control circuit 12 that controls the entire camera unit

2. The control circuit 12 controls the image sensor 10 to capture still images that may be stored in a memory 13. The control circuit 12 is implemented by a processor running an appropriate program that may be stored in the memory 13. The control circuit 12 controls conventional elements to change the parameters of operation of the image sensor 10 such as exposure time. Similarly, the memory 13 may take any suitable form, a non-limitative example being a flash memory that may be integrated or provided in a removable card.

A buffer 14 is included to buffer captured images prior to permanent storage in the memory 13. The buffer 14 may be an integrated element separate from the memory 13, or may be a region of the memory 13 selected by the control circuit 12.

The camera unit 2 further includes plural sensors that sense different physical parameters of the camera unit 2 or its surroundings, as follows.

One of the sensors is an accelerometer 15 being a motion sensor that detects translational motion, in particular translational acceleration. Thus, the output signal of the accelerometer represents the translational acceleration of the camera unit 2. The method described below uses the output signal of the accelerometer 15, but could alternatively use the integral of the output signal which represents the translational velocity of the camera unit 2. Another of the sensors is a gyroscope sensor 16 being a motion sensor that detects rotational motion, in particular rotational velocity. Thus, the output signal of the accelerometer represents the rotational velocity of the camera unit 2. The method described below uses the output signal of the gyroscope sensor 16, but could alternatively use the derivative of the output signal which represents the rotational acceleration of the camera unit 2.

The accelerometer 15 and the gyroscope sensor 16 each detect motion relative to three orthogonal axes X, Y and Z as illustrated in Fig. 2. In Fig. 2, the axes X and Y are in the plane of the image sensor 10 and the axis Z is perpendicular to the plane of the image sensor 10, but in general they could have any orientation relative to the image sensor 10.

The accelerometer 15 and the gyroscope sensor 16 may be implemented by a MEMS (Micro-Electro-Mechanical System), and may be integrated in a common component or implemented in different components.

Thus, in this example the sensors include two motion sensors, that is the accelerometer 15 and the gyroscope sensor 16, but the camera unit 12 could just comprise one of these motion sensors, or could in general be replaced by any other type of motion sensor. The gyroscope sensor 16 has a higher power consumption than the accelerometer 15. Accordingly, power consumption is reduced by using only the accelerometer 15. This may be done in a low-power mode of operation.

The sensors also include: a GPS (global positioning system) receiver 17 that senses the location of the camera unit 2; a light sensor 18 that senses ambient light; and a magnetometer 19 that senses magnetic fields. This particular selection of sensors is not limitative. Some or all of the GPS receiver 17, light sensor 18, and magnetometer 19 may be omitted, and/or other sensors may be included. Possible other sensors include: an external motion sensor that senses motion of external objects, for example an infra-red motion sensor; a thermometer that senses temperature; and an audio sensor that senses sound.

The control circuit 12 performs the image capture operation intermittently without being triggered by the user. The control circuit 12 may perform the image capture operation based on various criteria, for example in response to the outputs of the sensors 15, or based on the time elapsed since the previous image capture operation, or on a combination of these and/or other criteria. Without limitation, a typical period between capture of images may be in the range from 5 seconds to several minutes, with an average of 30 seconds.

In the case of triggering in response to the outputs of the sensors 15, capture of images may be triggered when the outputs of the sensors 15 indicate a change or a high level on the basis that this suggests occurrence of an event that might be of significance to the user. Capture may be triggered based on a single sensor or a combination of sensors 15. That allows for intelligent decisions on the timing of image captures, in a way that increases the chances of the images being of scenes that are in fact significant to the user. Thresholds on the outputs of the sensors used for triggering may be reduced over time to ensure capture of an image in due course.

The camera 1 may be connected by a user to a computer apparatus 30 by a data connection 31 for transfer of captured images to the computer apparatus 30 for storage and/or processing of the images. The data connection 31 may be of any type, including a wired data connection such as a USB (Universal Serial Bus) connection or a wireless data connection such as a Bluetooth connection. The data connection 30 may be an internet connection, in which case the computer apparatus 30 may be local to the camera 1, or distant from the camera 1, for example being a server which may be in the "cloud". The camera 1 and the computer apparatus 30 include respective interfaces 32 and 33 for the data connection 31.

The computer apparatus 30 includes a processor 34 and a memory 35, and also has a display 36. The processor 34 executes computer programs stored in the memory 35 and may be any type of computer system, but is typically of conventional construction, for example being a PC (personal computer) or a mobile computing device (such as a smart phone or tablet). The computer program may be written in any suitable programming language. The computer program may be stored on a computer-readable storage medium, which may be of any type, for example: a recording medium which is insertable into a drive of the computing system and which may store information magnetically, optically or opto-magnetically; a fixed recording medium of the computer system such as a hard drive or the memory 35.

There will now be described a method that may be implemented in the camera unit

2 and the computer apparatus 30 for identifying activity states of the camera unit in dependence on characteristics of the motion detected by the accelerometer 14 and the gyroscope sensor 16. This method makes use of the fact that the motion detected by the accelerometer 14 and the gyroscope sensor 16 has characteristics that change with change of the activity of the camera unit 2, and hence the user wearing or carrying the camera unit 2. Furthermore, the accelerometer 14 and the gyroscope sensor 16 have a low power consumption, allowing them to be used repeatedly, or even continuously, without significantly impacting on the battery life of the camera unit 2. Indeed, the accelerometer 14 and the gyroscope sensor 16 may already be used as a basis for triggering capture of images.

A feature vector derivation method of deriving a feature vector comprising plural features each representing a characteristic of the motion detected by the accelerometer 15 and the gyroscope sensor 16 is shown in Fig. 3. The feature vector derivation method is performed in the camera unit 2, for example by the control unit 12 executing an appropriate computer program, because it involves analysis of the output signals of the accelerometer 15 and the gyroscope sensor 16. The feature vector derivation may be performed intermittently at the time of image capture by the image sensor 10 (for example after triggering of image capture) or on a continual basis, depending on power usage requirements. Deriving feature vectors continually provides higher resolution information on motion and activity state, but is essentially the same method with some dynamic thresholding.

In principle, the feature vector derivation method could be performed in the computer apparatus 30, but that would involve storing the output signals of the

accelerometer 15 and the gyroscope sensor 16 for subsequent transfer to the computer apparatus 30 which would inconvenient.

The feature vector derivation method is performed as follows.

In step Sl-1, the output signals of the accelerometer 15 and the gyroscope sensor

16 are sampled. This sampling occurs at the sample frequency, which may be typically in the range from 25Hz to 100Hz. These output signals represent the detected motion of the camera unit 15. As the motion is detected in respect of the three axes X, Y and Z, a total of six samples are taken when using both the accelerometer 15 and the gyroscope sensor 16.

In step SI -2, the raw samples output taken in step Sl-1 are subject to low pass filtering to remove high frequency noise and smooth the signal.

In step Sl-3, the vector Euclidean norm squared is derived from the smoothed samples output from step SI -2 in respect of the translational and rotational motion.

In respect of the translation acceleration detected by the accelerometer 15, the vector Euclidean norm squared accel n0 rm is derived from the accelerations accel x , accely, and accelz along the axes X, Y and Z in accordance with the equation:

accerl ncrm = accel x 2 + accel y 2 + accel s 2

In respect of the rotational velocity detected by the gyroscope sensor 16, the vector Euclidean norm squared gyro n0 rm is derived from the angular velocities g ro x , g ro y , and g ro z around the axes X, Y and Z in accordance with the equation:

syronor = 8J x 2 + gyro y 2 + gyro/

The vector Euclidean norm squared in respect of the rotational and translational motion may also be relatively scaled in step SI -3 such that both readings have the same range.

The vector Euclidean norm squared is a scalar quantity that represents the detected motion but avoids any dependency on the orientation of the camera unit 2. This is advantageous because it avoids dependency on the orientation in which the camera unit 2 is worn or carried. As an alternative other scalar quantities that are not dependent on the orientation of the camera unit 2 may be used, for example the vector Euclidean norm or the sum of the motions detected in respect of each axis.

In step SI -4, the values of the vector Euclidean norm squared accel n0 rm and gyro n0 rm are each fed into separate buffers that store values from a rolling window of time. For example, the window may be the last N seconds worth of readings. By way of

non-limitative example, the length of the window is typically 5 seconds.

In step SI -5, the feature vector 40 is derived. Whereas steps S 1-1 to S 1-4 are performed at the sampling frequency, step SI -5 is performed at the feature vector update rate, which is slower than the sampling frequency. For example the feature vector update rate may be half the inverse of the length of the window, that is in the example that the length of the window is 5 seconds, deriving a feature vector every 2.5 seconds.

The feature vector 40 comprises plural features. Each feature is calculated from the values stored in the buffer in respect of the rolling window of time that have been derived from and represent the detected motion. Each feature represents a characteristic of the detected motion. Any feature that represents a characteristic of the detected motion may be used. The features may be scaled relative to each other to have the same dynamic range, for example so that they each take a value in the range from 0 to 1.

Typically, the features include at least at least one feature dependent on the size of the motion detected by one or both of the accelerometer 15 and the gyroscope sensor 16. Non-limitative examples of such features include:

• the mean of all values in the window

• the minimum value in the window

• the maximum value in the window

• the DC component of the frequency spectrum of the window

Typically, the features include at least one feature dependent on variation in the motion detected by one or both of the accelerometer 15 and the gyroscope sensor 16. Non- limitative examples of such features include:

• the variance of all the values in the window

• the total difference signal of the values in the window, for example calculated from the following equation where r t is the t-th value and the total number of values is window size:

∑t = window j-ijre— i

t=0 r t+l T t

• the peak location in the auto correlation of the values in the window, with a lag, for example of 0.1 to 1.5 seconds.

• the ratio of the two largest peaks in the auto correlation of the values in the window, with a lag, for example of 0.1 to 1.5 seconds

• the number of peaks and troughs in the values in the window

• the difference between the first and third quartile of the values in the window

• the spectral energy in a predetermined frequency range, for example from 0.3 to 3 Hz

• the spectral entropy

• the values of a predetermined number of primary peaks in the frequency

spectrum

Any combination of the above or other features may be used. Typically not all of the above features will be used in combination The choice of features used depends on the processing capabilities of the camera unit 2.

In step SI -6, the feature vector 40 derived in step SI -6 is stored in the memory 13 for later use. The feature vector 40 is stored with an associated time stamp indicating the time of the feature vector. Then, the method returns to step Sl-1.

An activity state identification method of identifying activity states is shown in Fig. 4. The activity state identification method identifies activity states from the feature vectors 40, including the associated timestamps, derived in the feature vector derivation method shown in Fig. 3.

The activity state identification method is performed in the computer apparatus 30. When the camera 1 is connected to the computer apparatus 30, the captured images and the feature vectors 40, including the associated timestamps, derived by the feature vector derivation method are transferred to the computer apparatus 30, thereby allowing performance of the activity state identification method. Alternatively, the activity state identification method could be performed in the camera unit 2, but the relatively high processing requirement makes it preferable to use the computer apparatus 30 which will have greater processing power than the camera unit 2.

In general, activity states are detected as being periods during which the detected motion remains broadly similar, albeit with a relatively small degree of fluctuation, with a relatively large degree of change in the detected motion between the activity states.

Therefore, the activity states may be viewed as "events" and the activity state identification method is a type of event detection method in which the activity states are identified in dependence on changes in the feature vector 40. As discussed further below, the method also takes account of periods of random activity where the distance measure is relatively large.

In particular, the activity state identification method is performed as follows.

In step S2-1, a smoothing window is derived from the feature vectors 40. The smoothing window may be derived based on the average time between the feature vectors 40, which may be calculated from the timestamps. The length of the smoothing window is based on absolute time in order to smooth out periods of activity change that occur only for a short period of time, so as to prevent excessive numbers of activity states being identified. Essentially the length corresponds to the minimum time an activity state must occur for in order to be identified as such. Without limitation, a typical value may be in the range from 30 seconds to 5 minutes. As an alternative, the length of the smoothing window could have a preset value, which may be stored or set by user input.

In step S2-2, the feature vectors are averaged using the window size wzise derived in step S2-1. The smoothed feature vectors smoothed, may be derived from the feature vectors feature vector k in accordance with the equation:

The index k labels the input feature vectors 40 and the index i labels the smoothed feature vectors and so the index i has a total number of values equal to the length of the feature vector length divided by the window size wzise, that is the index i takes values from 0 to lengthlwsize)-!).

In step S2-3, there is derived a measure of difference between successive smoothed feature vectors derived in step S2-2. The measure of distance dist, between the i-th and the (i+l)-th smoothed feature vectors may be derived by the equation:

disti = Di3t nce(3m.oothed i+li 3moothedi)

where:

n— I eng t h (x )

Herein, the length(x) represents the number of features in the feature vector and the weighting vector weightingfnj is used to weight each feature by respective scalars that may be different. The weighting vector weightingfnj is optional but may be used to provide different features with different discriminating effects.

Any other measure of distance could alternatively be used, for example a sum of the differences between the features.

During periods of similar activity, i.e. activity states, the measure of distance will be small. Hence, activity states are identified by detecting changes such as impulses and steps in the measure of distance that represent boundaries between activity states, in particular as follows.

In step S2-4, there is derived the derivative of the measure of distance derived in step S2-3. The derivative dist derivative, may be derived from the measure of distance dist, by the equation:

dist derivative^ = dist i+:L — dist t

In step S2-5, the derivative of the measure of distance is offset to ensure the signal fluctuates about zero. This is done by calculating the mean and variance of the and subtracting the mean from each value. As described below, peaks and troughs in the offset derivative are to be detected as boundaries between activity states by comparison with a derivative threshold. Rather than using a static derivative threshold, a dynamic derivative threshold is derived as follows.

In step S2-6, a derivative threshold is derived dynamically from the offset derivative of the measure of distance output from step S2-5. The derivative threshold min height is calculated from a predetermined threshold value threshold and the standard deviation std(dist derivative) of the offset derivative of the measure of distance using the equation:

min height = threshold * s td(dist derivative ' )

The use of such a dynamic derivative threshold, makes the peak finding more robust to the variety of users and the variety of activities they are performing. It also allows for control on whether the groups represent major changes in activity or more subtle changes in activity.

In this step, there may also be applied a global minimum level of the derivative threshold. The global minimum level is selected on the basis of the noise floor of the offset derivative of the measure of distance. The global minimum level may have a

predetermined level based on typical usage of the camera unit 2, or may be derived dynamically from the offset derivative. The purpose of the global minimum level is to prevent false identification of activity changes whilst the camera unit 2 is stationary.

In step 2-7, significant peaks and troughs of the offset derivative of the measure of distance are identified. This uses a two stage process. In a first stage, all peaks and troughs are identified using a standard peak finding algorithm. In a second stage, the magnitudes of the peaks and troughs are compared to the derivative threshold to identify the peaks and troughs whose magnitude exceeds the derivative threshold as being significant peaks and troughs. These are potential boundaries because they occur where the characteristics of the detected motion represented by the feature vectors 40 have changed by a significant amount.

In step 2-8, the significant peaks and troughs identified in step 2-7 are analysed to identify some of the peaks and troughs as being boundaries between activity states. This implicitly identifies the activity states as occurring between those boundaries. This analysis of the peaks and troughs is performed using the following rules.

A first rule is to identify a boundary when a pair of a significant peak and trough (in either order) occur within a predetermined period, which may for example be in a range from 1 to 10 seconds. This occurrence indicates a sudden changes in constant activity, for example the user walking for a period of time then suddenly standing still. This occurrence of pair of a peak and a trough is identified as a single boundary. The time of the boundary may be taken as the mid-point between the peak and the trough.

A second rule is to identify a boundary when a pair of a significant peak and trough (in either order) occur with constant activity outside the pair and random activity between the pair. In this context, a period of constant activity is taken to occur when the average distance measure across that period is relatively small (as defined for example by comparison with a distance threshold) and a period of random activity is taken to occur when the average distance measure across that period is relatively large (as defined for example by comparison with the same distance threshold). Such a change from constant to random activity and back again to constant activity is identified as a single boundary. The time of the boundary may be taken as the mid-point between the peak and the trough.

A third rule is that where boundaries identified by the first and second rules are separated by less than a minimum period of time, one of those boundaries is rejected. The minimum period of time may be predefined, or may be dynamically selected based on average rate at which images are captured so that an activity state extends across the capture of a minimum number of images assuming capture at the minimum rate.

In step S2-9, which is optional, an analysis is performed on adjacent activity states, being activity states on each side of a boundary, to decide whether to merge those activity states by rejecting the identified boundary. This is done according to three criteria . A score in respect of each of the criteria is derived as follows.

The first criterion is similarity between the feature vectors derived during the adjacent activity states. A measure of this similarity is derived. The measure may be an earth mover's distance. Such a distance indicates how much work would need to be done to convert the feature vectors of one of the adjacent activity states into the feature vectors of the other of the adjacent activity states, and is therefore a measure of similarity between the two groups of feature vectors. Other measures of similarity between the feature vectors derived during the adjacent activity states may alternatively be used.

A score distance score is calculated from the measure of similarity dist using the equation: distance sco e

1 ■ :

0,075

This score favours merging activity states whose feature vectors have a distance between them that is relatively small, i.e. favours merging activity states whose feature vectors are similar.

The second criterion is number of images captured during the adjacent activity states. A score size score is derived from the number of images size using the equation:

1

size score

i + ^

This score favours not merging activity states which already have a relatively large number of images in them.

The third criterion is the change in the feature vectors occurring at the identified boundary. A measure of this change is derived. Although conceptually there is overlap with the first criterion, this criterion considers only the change in the feature vectors occurring at the boundary.

A score boundary score is calculated from the measure of the change occurring at the boundary boundary using the equation:

1

boundary_scare = betmd ary - os

1 + e 0.O3 B

This score favours not merging activity states if the change in the feature vectors occurring at the identified boundary is large, i.e. if there was a single large event creating the boundary between them. For example it would favour keeping not merging activity states that are parts of the same car journey having similar motion but with an emergency stop in the middle identified as a boundary.

A merge score merge SCO re is determined combining the three scores which take into account the three criteria using the equation:

mer ge score = boundary score *= size score * distance score

Boundaries are selectively rejected on the basis of the merge score merge SCO re, in particular by rejecting boundaries whose merge score merge SCO re exceeds a merge threshold.

The particular criteria and the particular scores are not essential. Other scores could be used to consider these criteria. The criteria could be changed, for example by omitting any of the criteria described herein, or by including further criteria.

In step S2-10, activity state data 50 representing the activity states is output and stored in the memory 35 of the computer apparatus 30. The activity state data 50 comprises timestamps identifying the time at which the boundaries occur, and may optionally also include additional data for example data concerning the detected motion calculated during the methods described above.

A curation method of grouping and curating the images based on the identified activity states is shown in Fig. 5. The curation method is performed in the computer apparatus 30 and curates the images 60 transferred from the camera unit 2 using the activity state data 50 derived by the activity state identification method shown in Fig. 4. In the case that the activity state identification method was performed in the camera unit 2, then the images 60 and activity state data 50 are transferred from the camera unit 2 to the computer apparatus 30 when the camera 1 is connected to the computer apparatus 30.

In step S3-1, the images 60 are grouped in dependence on the identified activity states. The images 60 are each assigned to one of plural groups of images in dependence on the identified activity states, in particular by being the images 60 by being assigned to the activity state corresponding to the time of image capture. The images 60 have associated timestamps indicating the times at which the images 60 were captured, so this may done by comparing the timestamps of the images with the timestamps of the activity state data 50 that identify the time at which the boundaries between activity states occur.

The group of images to which each image 60 is assigned may be indicated by group identification data that is associated with each image 60. The group identification data may be for example the index of the boundary at the start of the activity state during which an individual image 60 was captured.

In step S3-2, one or more representative images from each group is selected. The or each representative image may be selected by any criteria. Some non-limitative examples are as follows. An image at a predetermined position within the group (for example, the first image or the centre image) may be selected. An image may be selected on the basis of image analysis, for example selecting an image having a high degree of contrast and/or a high dynamic range. An image may even be selected randomly. The number of images selected from each group can be dynamically chosen based on the number of images in the group. This allows activity states of longer duration to be represented by more images. Steps S3-3 to S3-5 facilitate display of the images on the basis of the groups.

In step S3-3, a representation of each group of images is displayed on the display 36. This representation may be a single representative image selected in step S3-2, or if plural representative images are selected may be one or all of them. This representation may also include a textual label identifying the group, which textual label may be generated automatically and edited by the user. Thus, the grouping on the basis of the activity states produces a summary subset of images for display to the user.

In step S3 -4, user-input is accepted selecting any of the representations of the groups of images. This may be done using a conventional user-input system of the computer apparatus 30.

In step S3-5, in response to selection in step S3-4, all the images of the group that is selected are displayed on the display 36.

Thus, steps S3-3 to S3-5 allow the user to navigate the images using the group structure. The method may also permit the selection of individual images for display or further processing in a conventional manner.

Step S3-6 may be performed as an alternative to steps S3-3 to S3-5, according to a selection made by user-input. In step S3 -6, there is generated a summary video 70 comprising all the representative images selected in step S3-2 displayed successively. The summary video 70 is output and stored in the memory 35.

Thus the curation method facilitates curation by grouping the overall set of images

60 which may be large in number into individual groups. The specific steps S3 -2 to S3 -6 are useful but are merely illustrative, and other curation techniques using the grouping may be applied.

The methods described above are not limitative and may be modified. In the above methods grouping is performed in dependence solely on characteristics of the motion detected by the motion sensor, but other factors may additionally be incorporated. For example, the activity states may be identified in dependence on other factors in addition to the characteristics of the motion detected by the motion sensor. Similarly, the images may be assigned to groups in dependence on other factors in addition to the identified activity states.