NETWORK CONNECTED SENSORS

Title:

NETWORK CONNECTED SENSORS

Document Type and Number:

WIPO Patent Application WO/2017/032986

Kind Code:

Abstract:

A method of identifying at least one sensor (10) which is determined to be characteristically similar to a first sensor, and the sensors arranged to communicate sensed data with the internet (4), wherein the method of identifying similar sensors comprises using at least one of the following criteria, (i) a similarity in a physical property or environment measured by a sensor, (ii) a similarity in the statistical characteristics of measured physical properties/environments, and (iii) a dependency between physical properties/environments measured, and the method comprising storing an association or mapping between sensors determined to be similar, and thereby making data from the identified sensors available for use subsequently.

Inventors:

KAMPA SIMON (GB)
RUSSELL ROBERT (GB)
HILL ALEXANDER (GB)
REID DANIEL (GB)

Application Number:

PCT/GB2016/052576

Publication Date:

March 02, 2017

Filing Date:

August 19, 2016

Export Citation:

Click for automatic bibliography generation Help

Assignee:

SENSEYE LTD (GB)

International Classes:

G01D21/00; H04W4/70; H04W84/18

Domestic Patent References:

WO2015041577A1

2015-03-26

Foreign References:

US7937167B1	2011-05-03
US20100125437A1	2010-05-20

Attorney, Agent or Firm:

BARKER BRETTELL LLP (GB)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A method of identifying at least one sensor which is determined to be characteristically similar to a first sensor, and the sensors arranged to communicate sensed data with the internet, wherein the method of identifying similar sensors comprises using at least one of the following criteria,

(i) a similarity in a physical property or environment measured by a sensor,

(ii) a similarity in the statistical characteristics of measured physical properties/environments, and

(iii) a dependency between physical properties/environments measured,

and the method comprising storing an association or mapping between sensors determined to be similar, and thereby making data from the identified sensors available for use subsequently. 2. The method as claimed in claim 1 which comprises conducting a search of sensors which are included within a searchable data resource, for example by way of registration with a software application.

3. The method as claimed in claim 1 or claim 2 which comprises conducting a search for publicly accessible sensor data over the internet or a communications network.

4. A method as claimed in any preceding claim which comprises performing processing in respect of each of (i), (ii) and (iii), to determine whether a sensor similarity exists.

5. A method as claimed in any preceding claim which comprises use of statistical similarity to generate proposed semantic identifiers for sensors. 6. A method as claimed in any preceding claim which comprises determining similarity if at least one of the criteria (i), (ii) or (iii) is met, and at least one additional similarity criterion.

7. A method as claimed in any preceding claim which comprises determining categories of sensor similarity in relation to whether one some or all of criteria (i), (ii) and (iii) are satisfied. 8. A method as claimed in any preceding claim which comprises calculating a similarity metric.

9. A method as claimed in claim 8 in which the similarity metric is inversely proportional to extent of similarity.

10. A method as claimed in any preceding claim which comprises weighting one or more similarity criteria so as to prioritise or downgrade one or more similarity criteria, or an aspect of similarity criteria. 1 1. A method as claimed in any preceding claim which comprises calculating a multi -dimensional similarity metric.

12. A method as claimed in any preceding claim which comprises comparing a calculated similarity metric to a threshold similarity value o as to determine whether a sensor can be classified as similar.

13. A method as claimed in claim 12 which comprises reaching a determination of similarity if the calculated similarity metric surpasses the threshold. 14. A method as claimed in any preceding claim in which calculation of criterion (ii) comprises determining a similarity metric between two sensors' statistical characteristics.

15. A method as claimed in any preceding claim in which the measured physical properties/environments comprise at least two different physical properties, and/or two different environments, respectively.

16. A method as claimed in any preceding claim which comprises normalising a similarity metric.

17. A method as claimed in any preceding claim which comprises ranking identified sensors in relation to the extent of similarity with the first sensor.

18. A method as claimed in any preceding claim which comprises use of historical and/or prevailing data associated with an identified sensor in determining extent of similarity with the first sensor.

19. A method as claimed in any preceding claim which comprises comparing at least one of: sensors' metadata, sensors' manufacturer(s), sensors' model(s), sensor's type(s) and sensors' units of measurement(s) .

20. A method as claimed in any preceding claim which comprises determining a similarity of statistical properties comprising at one of: average values, minimal values, maximum values, modal values, quartiles, kurtosis and skewness of sensors' measurements.

21. A method as claimed in any preceding claim which comprises use of stored data which is indicative of known characteristics of at least one of a particular type of sensor and a particular type of node to which a sensor is attached.

22. A method as claimed in claim 21 in which the characteristics include historical data.

23. A method as claimed in any preceding claim which comprises determining a statistical dependence between measurements of respective sensors.

24. A computer-readable instructions which, when executed by a data processor, are arranged to implement the method of any of claims 1 to 23. 25. Data processor apparatus which is provided with the computer-readable instructions of claim 24.

Description:

NETWORK CONNECTED SENSORS

Technical field The present invention relates to discovery of similar sensors within a communications network

Background The (Internet of Things) IoT can be viewed as the construct of objects or environments informing and being informed about other objects systematically, via connection and exchange of information over the internet, by use of sensors. Sensors can capture motion and direction, magnetic fields, sound and light, and many other things including chemicals and pollutants. Broadly, the IoT system structure or architecture requirements include appropriate sensors to provide their location and status, a (wireless) connection to the internet, appropriate software between the obj ect, the internet and other objects and operator interfaces.

We have realised that it would be beneficial to identify and associate sensors which are sufficiently similar, and to improve exploitation of the data from those sensors.

Summary

According to a first aspect of the invention there is provided a method of identifying at least one sensor which is determined to be characteristically similar to a first sensor, and the sensors arranged to communicate sensed data with a communications network, wherein the method of identifying similar sensors comprises using the following criteria,

(i) a similarity in a physical property or environment measured by a sensor

(ii) a similarity in the statistical characteristics of measured physical properties/environments, and

(iii) a dependency between physical properties/environments measured,

and the method comprising storing an association between sensors determined to be similar, and thereby facilitating the use of data from the identified sensors. The communications network may include the Internet. By the term 'Internet' we include a global system of interconnected computer networks that use a protocol suite (such as TCP/IP) to link devices worldwide. It may be viewed as a network of networks that consists of millions of private and public networks which are linked by a broad array of electronic, wireless, and optical networking technologies. We also include communications networks more generally.

The invention may be viewed as identifying sensors connected to a communications network, obtaining current or historical sensed data from the sensors, and/or attributes of the sensor(s), and determining whether any of the located sensors are sufficiently similar to a first sensor or to each other based on sensed sensor data and/or sensor attributes.

In determining similarity between sensors, use may be made of actual or (substantially) raw data which is sensed/measured/recorded by the sensors.

Another aspect of the invention may include a computer software application, comprising machine-readable instructions, which when executed by a data processor of a computer, implement the method of the first aspect of the invention. A further aspect of the invention may include a data processor device which executes the instructions of the computer software application.

The invention may include one or more features described in the detailed description and/or shown in the drawings.

Brief description of the drawings

Various embodiments of the invention will now be described, by way of example only, with reference to the following drawing in which:

Figure 1 is a schematic of a network comprising multiple sensors. Detailed description

There is now described a method for discovering network or internet connected sensors which measure the same, similar or related phenomenon or environment. Identified similar sensors can be grouped, and then their respective data beneficially exploited, and in particular making use of publically available data. This use may include increasing discoverability, providing additional information for use in data analytic algorithms, and aid in tuning algorithms and allow automatic configuration of graphical user interfaces. The sensors are (pre-)registered with a software application, which then allows their data and characteristics to be analysed for similar sensors to be discovered therefrom and associations created. The group of sensors which is registered can be increased (or decreased) over time . The registered group of sensors could be supplemented by way of manual input and/or internet/external database searching for other sensors.

Figure 1 shows a network which comprises a plurality of nodes 9, the internet, or other communications network 4, a server 3 and a user computer terminal 2. Each of the nodes may comprise an obj ect or device which is provided with one or more sensors, or the node may itself be a sensor. The server 3 is loaded with machine- readable instructions in the form of a software application, which, when executed by the processor of the server, are configured to identify one or more sensors which are determined to be characteristically similar to a first sensor. In broad terms, the characteristics of the first sensor of the sensors 10 are known, as are the data and meta data of the entire group of sensors registered. However, the links or associations between them, as to which ones are similar, are not. This characterising data of the first sensor is then processed and used to determine whether the registered sensors have characteristics which are sufficiently similar to those of the first sensor to qualify as being considered as a similar sensor. A link or association with that or those located similar sensors and the first sensor can then be stored in a memory (for example, such as a hard drive), and data from the former can be beneficially exploited in relation to the latter (or vice versa). Whilst it may be that the similarity between all (possible) pairs is tested, in many cases, a subset of the group of sensors is likely to be considered at any one time . The user computer/client machine 2, will benefit from the outputs of identifying similar sensors, which may result in improved GUI configuration(s) or improved results analysis. It will be appreciated that the links/associations between sensors may not necessarily be displayed on the computer 2. The user computer 2 may communicate with the server 3 by way of a cloud-based solution (requiring the user computer to have internet browser software) or may do so by way of being loaded with a bespoke software application arranged to communicate with the server 3.

Similarity criteria

Three criteria or characteristics of similarity are used, which identify similarity between sensors, and in particular the phenomena or environments being measured by sensors:

1. Physical property similarity: A pair of sensors measuring the same physical quantity. For example, two accelerometers would be considered similar as they both measure acceleration.

2. Statistical similarity: A pair of sensors measuring physical quantities with similar statistical properties. For example, an accelerometer measuring a machine's vibrations and an ambient noise sensor near the machine, would be considered similar as they have similar distribution characteristics of the data which is sensed.

3. Dependency: A pair of sensors measure physical quantities with a level of dependence between the measurements. For example, a sensor measuring the amount of rain may be considered similar to a sensor measuring soil moisture as the measurements would be correlated.

One, some or all of the above criteria may be used in determining similarity. It will be appreciated that additional criteria may be used in determining whether similarity exists between two sensors.

The machine-readable instructions include rules which allow a determination of whether a further sensor can be determined to be considered as similar to the first sensor by applying the rules to characteristics of the first sensor in relation to characteristics of the further sensor. The characteristic sensor data of the sensors form inputs, which is acted on by the rules which determine similarity. The rules may include limits, equations, thresholds, algorithms etc.

In addition to these three forms of similarity, combinations of the above can be used to determine numerous bands of similarity. For example, two temperature sensors measuring room temperature could be considered more similar than a temperature sensor measuring a furnace and another measuring freezer temperature, as the room temperature sensors would have both statistical and physical property similarities. Although reference herein is made principally to a first sensor and discovering sensors which are sufficiently similar to that sensor, the first sensor could be the average, or given attributes/data, of a group of sensors, and in that case the first sensor is essentially a meaningful measure indicative of a group of similar sensors. Furthermore, the strength of both the statistical similarity (i.e. a measure of how close they are statistically) and dependency are used to convey stronger and weaker similarities between sensors. For example, two humidity sensors in the same house will likely have higher similarity than two humidity sensors in neighbouring houses. The detection of the above three forms of similarity can based on the following sensor characterising information:

1. Measurement units: Internet connected sensors often have associated metadata. This metadata typically describes a sensor's attributes and location. Units of measurement is a typical attribute associated with a sensor, this indicates a definite magnitude of the physical quantity being measured. For example, a temperature sensor which records temperature in Celsius may have a unit attribute of ^loC'.

2. Sensor Make/Model: The make and model of internet connected sensors are typically included in the sensor's metadata. This information provides information about the physical properties being measured. For example, a temperature sensor could have a model of 'DHT 1 Γ.

3. Statistical properties of a sensor's measurements : Current and historical measurements recorded by connected sensors are typically stored in a database, allowing calculation of their statistical properties. Statistics including, but not limited to, averages, minimums, maximums, modes, quartiles, kurtosis and skewness can be determined from the sensor measurements. Statistics can be calculated for all the recorded sensor measurements or windows within the recorded sensor measurements.

4. Statistical dependence between pairs of sensor measurements: Statistical dependence between pairs of measures from different sensors can be calculated from the sensor measurements or the statistical properties of the sensor measurements. Statistical properties include, but are not limited to, correlation and cross-correlation.

In addition to the above, metadata that allows the association of measures/sensors from a common sensor node could be used. This could involve analysing for similarity in an (Internet Protocol) IP address or for similarity from a collection of measures from a single sensor stream.

Physical property similarity can be assessed using the sensor's unit of measure. The units in which the sensor records measurements directly indicates the physical property being measured. For example, a sensor measuring values in Celsius units measures temperature . A lookup table of units and physical properties could resolve standard units originating from the International System of Units (SI) and supplemented with any non-standard units encountered.

In addition to measurement units, a sensor's model number can be indicative of the physical property being measured. For example, a 'DS 18B20' sensor is known to measure temperature. A stored/accessible look-up table of sensor models to physical properties would allow physical property resolution. This sensor look-up table could conveniently be automatically populated using information from the internet. In addition, sensors which expose both model number and measure units can be exploited to determine a mapping from model number to physical property, this would have an associated confidence rating based on how many sensors were observed confirming the determined mapping/association. This form of similarity is a straightforward Boolean assertion, ensuring both sensors are measuring the same physical property.

The statistical similarity can be based on standard statistical formulas (but may include variants thereof). If the sensor's unit of measurement is defined and is non- standard, it can be converted to SI units. This is useful in detecting meaningful statistical similarities.

Determining the statistical similarity could be achieved by calculating a similarity metric between two sensors' summary statistics which serve to give a simplified, but meaningful, description of the sensor's data. The choice of summary statistics to utilize would be selected based on the most important statistical similarities to discover for the application. For example, a typical selection of summary statistics could capture central tendency, dispersion and distribution characteristics, mean, median, mode, minimum value, maximum value, and range, for example . It may be the case that multiple statistical similarity metrics are calculated, each covering different sets of statistical properties, for example two statistical similarity metrics may be used, one to convey central tendency similarity, the other to explain distribution similarity. A similarity metric would be utilized to quantify the similarity of the sensors' summary statistics. As an example, the distance between sensors' summary statistics could be calculated using a multidimensional distance metric, such as Euclidean or Manhattan. A weighted similarity metric could be utilized if some statistical characteristics are more important than others. The distance between sensors within the multidimensional summary statistics space conveys their similarity, where distance is inversely proportional to similarity. The resulting distance could be inversed and normalized to a standard range if required.

Dependency between sensors can be calculated using standard statistical formulas. All forms of correlation result in scaled coefficients which indicate dependency between data sets. The absolute value of the coefficient is used as a similarity metric where coefficient value is proportional to similarity.

Statistical similarity and dependency are continuous similarity metrics and as such can be utilized by either: obtaining the top X most similar sensors, where X is application specific. This is achieved by ranking other sensors based on their similarity metric and returning the X most similar sensors. This approach ensures that 'similar' sensors are found. specifying a threshold which dictates the boundary between similar and not similar sensors, this could be per sensor or system wide . The threshold could be tuned based on user feedback and expert knowledge . · determining clusters of similar sensors using clustering algorithms. This approach does not enforce rigid limits and instead attempts to detect natural groupings of sensors in the similarity space. Clustering techniques can be based on centroids, density, connectivity and distributions - the selected technique would depend on the desired cluster formations.

If required, the three similarity metrics described could be merged into a single similarity metric. This could be performed using a sum or average of the individual similarities. Furthermore, the metrics could be weighted to prioritise certain forms of similarity. In addition to the three similarity metrics discussed, the combined similarity metric could take into account other forms of similarity, such as geospatial similarity.

Aiding users in finding similar sensors allows them to augment their own sensors with publically accessible sensors registered within the system, and the data therefrom, on the basis that the discovered sensors have been meaningfully determined as being sufficiently similar. This can advantageously, for example, increase the accuracy of analytic techniques and allow manual comparison to sensors with similar applications.

Tagging

One method of improving discoverability is the use of 'tags' assigned to sensors. Tags typically comprise single words which semantically describe the sensor. For example, a sensor measuring the height of a river may have the tags 'river' and 'height'. Tags only aid discoverability if the chosen tags are relevant and if standard tags are used. Tags are typically manually suggested by users on creation of sensors within the system. Discovering pairs of similar sensors, allows tags to be automatically suggested based on the tags assigned to similar sensors.

As an example, consider two temperature sensors, both monitoring soil temperature at different farms. The first temperature sensor is registered with the system and its owner enters the tags 'field', 'soil', '30cm', 'temperature'. The second temperature sensors is registered and its owner enters the tags 'agriculture', 'crop', 'temp'. Both sets of tags describe the sensors application in slightly different ways. The sensors are likely to be detected as highly similar as they measure the same physical property, have similar statistical properties and may be correlated (if nearby). When a user logs in, the system will attempt to determine similar sensors and suggest hints for new tags to apply to the user's sensors. In this case, the system could ask for the ten most similar sensors. For the example, if it is assumed that the owner of the first temperature sensor logs in and the system identifies the second sensor as the most similar sensor. The tags suggested to the user will be 'agriculture', 'crop', 'temp'.

Clearly these tags are suitable to the first sensor and may be applied by the user.

Suggesting tags to the user removes the effort associated with tag creation, resulting in users assigning more tags being assigned to sensors. In addition, the tags used will be more consistent across users. Both contribute to making sensors more discoverable.

Similarity Table

In addition to automatic tagging, the ranked list of similar sensors can be presented to the user. Explicitly stating a ranked list of sensors based on the similarity to the user's own sensors aids in discovering sensors with similar applications. Complementary Data

Identifying correlated or dependent sensors can aid in prediction. Other sensors can provide additional data which may help predicting the future values of the sensor. This is especially true when little historic information is available for a particular sensor.

As an example, if a sensor which measured electricity use of storage heaters was added to the system with little history, it would be incredibly hard to predict future usage based on the small amount of history available, given seasonal variations in storage heater use. If however, similar storage heater sensors were detected, their historical data could be exploited to improve the prediction of the new sensor. Sensors measuring storage heater electricity usage would obviously measure the same physical property and would likely have similar statistical properties. In addition, their daily pattern of usage and seasonal variations would lead to high correlations between sensors. Similarly, it is also possible to exploit the similar sensors linked to the discovered similar storage heaters. For example, if all the discovered similar storage heaters were dependent on outside temperature sensors, these temperature sensors could also aid in prediction. This concept can be taken further, by using things that correlate with the 'similar' sensor to aid the prediction of the sensor with less history. Consider the case where there is a historic correlation of outside air temperature to the storage heater data. This could be used to understand the forecast for the storage heater with no historic data.

This type of approach can also be applied to different types of algorithms used with sensors. As an example, anomaly detection could exploit similar sensors to build a global model of 'normal' values. A particular example of anomaly detection, could be an accelerometer on a tractor. By detecting dependencies, physical properties and statistical similarities in combination, a layering of evidence can be developed to provide high levels of confidence to apply anomaly detection to new sensors that are proven to work for existing historic sensors.

Consider the example of the case of an accelerometer on a tractor. What is known is that (i) it is related to a particular node (i.e . the tractor) with other sensors, (ii) it has a name of 'accel' and (iii) there is only a small amount of data available. At an automated level, the probability of 'new' accelerometer being on a vehicle is increased through the aggregation of similarity in data measured by the sensors, collectively. In other words, this is a dependency on the grouping of the node's sensors, which is indicative of a type for the sensor (namely vehicle), since the data generated by the sensors (of the node) together represent physical property measurements that are mostly found on vehicles. This could be deduced with reference to stored data which is indicative of characteristics of known types of sensors and/or known types of nodes. This may involve access to a database or look- up table of characteristic historical data. The determined physical property similarity provides confidence that the 'accel' sensor is an accelerometer and this can be added to the evidence of the node type being 'vehicle'. Using a statistical similarity indicator adds even more confidence in that the small amount of data from the new sensor provides statistical indicators that are very similar to the historic statistical indicators for accelerometers on vehicles. Therefore, for example, advantageously it is possible to apply an existing anomaly detection algorithm and/or models for vehicle accelerometers, to the new 'accel' sensor with a high degree of confidence.

Selection and Tuning of Analytical Algorithms

Applying machine learning models to data is often a difficult process of selecting the best algorithm for the data and tuning the algorithm to achieve optimal generalisation and results. This process is complex and time intensive. By identifying similar sensors the process may be shortened by selecting a starting point which has worked with similar sensors. This reduces the optimization process associated with tuning algorithms.

For example, when training a regression model for a humidity sensor, a series of different techniques and parameters would be evaluated. The most successful technique and associated parameters would be stored. When training a regression model for another humidity sensor, the most successful techniques discovered previously for similar sensors would be taken as a starting point, reducing the required optimization and exploration. Configuration of Graphic User Interface

Automatic configuration of graphic user interfaces is possible when information is known about the data being displayed. This includes limiting the types of visualization based on data types, modifying the visualizations based on the measured physical property and displaying data from similar sensors to provide context.

Gathering user feedback on automatically configured visualization provides information as to the success of the visualization. By exploiting the known similarities between sensors, the user feedback on a single sensor visualization can also be applied to other similar sensors in the system. This acts as a form crowdsourcing.

End-to-end example

Four internet connected sensors are gathering the following data:

Sensor A : Soil moisture of a wheat field, recorded in centibar units

Sensor B : Soil temperature of a wheat field, recorded in Celsius

Sensor C : Room temperature of a house in the UK, recorded in Fahrenheit

Sensor D : Room temperature of another house in the UK, recorded in Celsius The sensors send data via the internet to a database. This database stores all raw sensor measurements and the metadata associated with the sensors. The database could be any of the IoT data platforms or a typical relational or NoSQL database. When a sensor is first registered, the owner would input to the software application (stored on the user computer 2) the required credentials to access the sensor's data from the database. All four sensors would be registered with the application, giving it access to all of their recorded measurements and metadata.

The process of determining similar sensors would occur either periodically, when new data is received or as a result of user interaction. The process could operate on a subset of registered sensors or over all registered sensors. Subsets could be determined based on geographic location, filtering by the physical property being measured, the sample rate of the sensor or the variability of sensor measures, as examples. Alternatively, different levels of similarity calculations could be performed sequentially, at each stage disregarding any sensors which are deemed dissimilar. So for example, physical property matching may occur first. All sensors with the same physical property are then checked for statistical similarity. If statistically similar, their dependency is calculated. In this example, we will consider that all four registered sensors will be checked for similarity as a result of a periodic timer. The continuous similarity measures will be converted to binary similarity based on a predefined threshold. Physical property similarity is determined by obtaining the physical property, from the sensor's units and models, and then matching between sensor pairs. In this case sensor A would be moisture, and sensors B,C,D would be temperature. The similarity between sensors would be recorded as a link between sensor pairs.

The statistical properties of each sensors would be calculated to determine statistical similarity. Once determined, the distance between each pair of sensors would be determined using a suitable distance metric. If the metric was less than the specified threshold, the sensors would be deemed similar and the statistical similarity would be recorded as a link between the sensors. The required calculations could be effected more rapidly using a suitable space partitioning data structure . In this example, sensors C and D would be deemed similar, depending on the climate of the country, sensor B may also be deemed similar.

Dependency between sensors would be calculated in a pairwise manner. The correlation between two sensors measurements would be determined and if above a specified threshold, they would be deemed similar. This similarity would be stored as a link between the sensors. In this example, sensors C and D would be deemed similar as they have similar daily and seasonal periodicity.

Once complete, each sensor will have links to similar sensors and for each link the type of similarity could be ascertained. Such that, if we requested dependent sensors to sensor C, we would receive sensor D.

In the application of tag suggestion, the suggested tags would be generated by gathering tags from similar sensors based on the recorded links. These would be displayed to the user to select from.

In the application of algorithm selection and tuning, a database would be available in the application which recoded the best set of parameters found previously for a particular algorithm and sensor. When running an algorithm on a sensor's data, a random sampling of similar sensors would be obtained based on the similarity links. Each similar sensor's best parameters would be tried with the sensor's data. From the selection of similar sensor parameters, the best would be selected as the algorithms starting parameters for the sensor. Metaheuristic optimization would be performed from this starting point.

Previous Patent: PREPARATION OF ARTICLES COMPRISING GRAPHITIC PARTICLES

Next Patent: DRAINING SYSTEM FOR BATHTUB