Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A COMPUTER-IMPLEMENTED METHOD FOR CONTROLLING AN OPERATION OF ONE OR MORE FUNCTIONAL DEVICES IN A DEFINED SURROUNDING AND A CORRESPONDING SYSTEM
Document Type and Number:
WIPO Patent Application WO/2024/068055
Kind Code:
A1
Abstract:
For providing an efficient method for controlling and operation of one or more functional devices in a defined surrounding a computer-implemented method for controlling an operation of one or more functional devices in a defined surrounding is provided, the method comprising the steps of providing and/or collecting sensor data from at least one environment sensor of the surrounding for providing data points out of the provided and/or collected sensor data; applying at least one labeling function on at least some of the data points for labeling data points as a non-anomaly or an anomaly in the defined surrounding; clustering of data points into one or more definable clusters under consideration of at least one defined criterion; recommending an unlabeled data point for active learning, wherein the recommending is based on the clustering; labeling at least the recommended unlabeled data point based on the active learning; and controlling the operation of the one or more functional devices based on information resulting from at least the labeled data points. Further, a corresponding system for controlling one or more functional devices in a defined surrounding is provided.

Inventors:
SOLMAZ GURKAN (DE)
MARESCA FABIO (DE)
CIRILLO FLAVIO (DE)
Application Number:
PCT/EP2023/062677
Publication Date:
April 04, 2024
Filing Date:
May 11, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NEC LABORATORIES EUROPE GMBH (DE)
International Classes:
G05B15/02; G05B23/02
Foreign References:
GB2465861A2010-06-09
US20180096261A12018-04-05
US20200336397A12020-10-22
US20210325072A12021-10-21
US20210072718A12021-03-11
Other References:
NASHAAT MONA ET AL: "Hybridization of Active Learning and Data Programming for Labeling Large Industrial Datasets", 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), IEEE, 10 December 2018 (2018-12-10), pages 46 - 55, XP033508426, DOI: 10.1109/BIGDATA.2018.8622459
MAITREYEE DEYSOUMYA PRAKASH RANASANDRA DUDLEY, SEMI-SUPERVISED LEARNING TECHNIQUES FOR AUTOMATED FAULT DETECTION AND DIAGNOSIS OF HVAC SYSTEMS, August 2018 (2018-08-01)
ZAHID HASANNIRMALYA ROY, TRENDING MACHINE LEARNING MODELS IN CYBER-PHYSICAL BUILDING ENVIRONMENT: A SURVEY, September 2021 (2021-09-01)
Attorney, Agent or Firm:
ULLRICH & NAUMANN (DE)
Download PDF:
Claims:
C l a i m s

1. A computer-implemented method for controlling an operation of one or more functional devices in a defined surrounding, the method comprising the steps of:

- providing and/or collecting sensor data from at least one environment sensor of the surrounding for providing data points out of the provided and/or collected sensor data;

- applying at least one labeling function on at least some of the data points for labeling data points as a non-anomaly or an anomaly in the defined surrounding;

- clustering of data points into one or more definable clusters under consideration of at least one defined criterion;

- recommending an unlabeled data point for active learning, wherein the recommending is based on the clustering;

- labeling at least the recommended unlabeled data point based on the active learning; and

- controlling the operation of the one or more functional devices based on information resulting from at least the labeled data points.

2. A method according to claim 1 , wherein the operation of the one or more functional devices is controlled to achieve a defined operation purpose, wherein the operation purpose can be an efficient energy management.

3. A method according to claim 1 or 2, wherein the data points are provided in a data representation.

4. A method according to any of claims 1 to 3, wherein the defined surrounding comprises or is realized by a building and/or wherein the labeling of data points as a non-anomaly or an anomaly refers to a building occupancy and/or wherein the one or more functional devices comprises or comprise one or more HVAC devices.

5. A method according to any of claims 1 to 4, wherein prior to the clustering step a label augmentation is performed with regard to at least one labeling function, wherein an initial label augmentation can be applied on at least one unlabeled data point that is or are nearest to one or more existing labeled data points.

6. A method according to any of claims 1 to 5, wherein the at least one criterion comprises at least one data feature and/or at least one existing anomaly label.

7. A method according to any of claims 1 to 6, wherein the clustering step, the recommending step and the step of labeling at least the recommended unlabeled data point based on the active learning are iterated for the at least one labeling function until a defined number of labeled data points or a defined proportion of labeled data points out of the provided data points is reached or exceeded.

8. A method according to any of claims 1 to 7, wherein the step of labeling at least the recommended unlabeled data point based on the active learning comprises reinforcement labeling.

9. A method according to any of claims 1 to 8, wherein a machine learning model is trained for anomaly detection after the step of labeling at least the recommended unlabeled data point based on the active learning, so that the controlling step can be performed on the basis of a prediction using the machine learning model.

10. A method according to any of claims 1 to 9, wherein a situation, failure and/or anomaly prediction is performed after the step of labeling at least the recommended unlabeled data point based on the active learning or after the machine learning model training.

11. A method according to any of claims 1 to 10, wherein the active learning uses at least one knowledge base, external data source, knowledge graph or oracle.

12. A method according to any of claims 1 to 11 , wherein at least one step of the computer-implemented method and/or an anomaly detection is performed in a digital twin of the surrounding. 13. A method according to claim 12, wherein the at least one environment sensor provides a real-time monitoring of behaviors and/or situations in the surrounding for feeding the digital twin with sensor data.

14. A method according to claim 12 or 13, wherein the digital twin comprises virtual entities representing real-world functional devices and/or real-world functional services, wherein the devices and services being able to consume energy.

15. A system for controlling an operation of one or more functional devices in a defined surrounding, preferably for carrying out the computer-implemented method according to any one of claims 1 to 14, wherein an operation of one or more functional devices in a defined surrounding is controlled, comprising:

- providing and/or collecting means for providing and/or collecting sensor data from at least one environment sensor of the surrounding for providing data points out of the provided and/or collected sensor data;

- applying means for applying at least one labeling function on at least some of the data points for labeling data points as a non-anomaly or an anomaly in the defined surrounding;

- clustering means for clustering of data points into one or more definable clusters under consideration of at least one defined criterion;

- recommending means for recommending an unlabeled data point for active learning, wherein the recommending is based on the clustering;

- labeling means for labeling at least the recommended unlabeled data point based on the active learning; and

- controlling means for controlling the operation of the one or more functional devices based on information resulting from at least the labeled data points.

Description:
A COMPUTER-IMPLEMENTED METHOD FOR CONTROLLING AN OPERATION OF ONE OR MORE FUNCTIONAL DEVICES IN A DEFINED SURROUNDING AND A CORRESPONDING SYSTEM

The project leading to this application has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871249.

The present invention relates to a computer-implemented method for controlling an operation of one or more functional devices in a defined surrounding.

Further, the present invention relates to a system for controlling an operation of one or more functional devices in a defined surrounding, wherein an operation of one or more functional devices in a defined surrounding is controlled.

Corresponding prior art documents are listed as follows:

[1] WO 2022/207131 A1

[2] US 2018/0096261 A1

[3] Maitreyee Dey, Soumya Prakash Rana, Sandra Dudley, “Semi-Supervised Learning Techniques for Automated Fault Detection and Diagnosis of HVAC Systems”, August 2018

[4] Zahid Hasan, Nirmalya Roy, “Trending machine learning models in cyberphysical building environment: A survey”, September 2021

Further prior art documents are US 2020/0336397 A1 , US 2021/0325072 A1 and US 2021/0072718 A1.

Buildings, for example, may have anomalies that lead to increased energy consumption, whereas certain actions can be taken to either avoid these anomalies in the future or to mediate the loss of energy consumption. A necessary action can be controlling the heating, ventilation, air conditioning, HVAC, system based on the detected anomalies.

A previous development according to document [1] was a machine learning system based on generative weak supervision through reinforced labeling, where a set of labeling functions, LFs, and unlabeled data points are used in a “generative process” to create labels. Later, the created labels are used to generate an end machine learning model, which is a supervised model such as deep neural networks.

Reinforced Labeling could add limited improvements if Data Programming labels only a very restricted part of the dataset: There will be too little labeled data points and as a consequence there will be too little effects of reinforcement to augment new labels close to the labeled data points successfully. Another problem is to have the labeled data points that are distant from all the unlabeled data points. For instance, a data point that is labeled by an LF might be too isolated in a multidimensional space so that no other point can be close to that point. Hence the generative model may not be able to generalize to other data points in the training or testing dataset.

The existing data programming systems such as Snorkel or other known knowledge infusion systems are not able to cover the above mentioned scenarios. As a result, lower accuracy would be obtained. It might cause further labeling efforts through programmatic labeling through labeling functions.

Document [2] discloses an anomaly detection model generator for use in home automation systems and accessing sensor data generated by a plurality of sensors, determining a plurality of feature vectors from the sensor data, and executing a plurality of unsupervised anomaly detection machine learning algorithms in an ensemble using the plurality of feature vectors to generate a set of predictions. Respective entropy-based weightings are determined for each of the plurality of unsupervised anomaly detection machine learning algorithms from the set of predictions. A set of pseudo labels is generated based on the predictions and weightings, and a supervised machine learning algorithm uses the set of pseudo labels as training data to generate an anomaly detection model corresponding to the plurality of sensors.

Document [3] demonstrates and evaluates semi-supervised learning, SSL, techniques for HVAC data from a real building to automatically discover and identify faults. Real HVAC sensor data is unfortunately usually unstructured and unlabelled, thus, to ensure better performance of automated methods promoting machinelearning techniques requires raw data to be preprocessed, increasing the overall operational costs of the system employed and makes real time application difficult. Due to the data complexity and limited availability of labelled information, semisupervised learning based robust automatic fault detection and diagnosis, AFDD, tool has been proposed here. Further, this method has been tested and compared for more than 50 thousand terminal units, Tils, which are small sub-units within a HVAC system. Established statistical performance metrics and paired t-test have been applied to validate the proposed work.

Document [4] discloses electricity usage of buildings. Electricity usage of buildings, including offices, malls, and residential apartments, represents a significant portion of a nation's energy expenditure and carbon footprint. In the United States, the buildings' appliances consume 72% of the total produced electricity approximately. In this regard, cyber-physical system, CPS, researchers have put forth associated research questions to reduce cyber-physical building environment energy consumption by minimizing the energy dissipation while securing occupants' comfort. Some of the questions in CPS building include finding the optimal HVAC control, monitoring appliances' energy usage, detecting insulation problems, estimating the occupants' number and activities, managing thermal comfort, intelligently interacting with the smart grid. Various machine learning, ML, applications have been studied in recent CPS researches to improve building energy efficiency by addressing these questions. In this document, there is comprehensively reviewed and reported on the contemporary applications of ML algorithms such as deep learning, transfer learning, active learning, reinforcement learning, and other emerging techniques that propose and envision to address the above challenges in the CPS building environment. Finally, there are discussed diverse existing open questions and prospective future directions in the CPS building environment research.

It is an object of the present invention to improve and further develop a computer- implemented method for controlling an operation of one or more functional devices in a defined surrounding and a corresponding system for providing an efficient computer-implemented method and corresponding system by simple means.

In accordance with the invention, the aforementioned object is accomplished by a computer-implemented method for controlling an operation of one or more functional devices in a defined surrounding, the method comprising the steps of:

- providing and/or collecting sensor data from at least one environment sensor of the surrounding for providing data points out of the provided and/or collected sensor data;

- applying at least one labeling function on at least some of the data points for labeling data points as a non-anomaly or an anomaly in the defined surrounding;

- clustering of data points into one or more definable clusters under consideration of at least one defined criterion;

- recommending an unlabeled data point for active learning, wherein the recommending is based on the clustering;

- labeling at least the recommended unlabeled data point based on the active learning; and

- controlling the operation of the one or more functional devices based on information resulting from at least the labeled data points.

Further, the aforementioned object is accomplished by a system for controlling an operation of one or more functional devices in a defined surrounding, wherein an operation of one or more functional devices in a defined surrounding is controlled, comprising:

- providing and/or collecting means for providing and/or collecting sensor data from at least one environment sensor of the surrounding for providing data points out of the provided and/or collected sensor data; - applying means for applying at least one labeling function on at least some of the data points for labeling data points as a non-anomaly or an anomaly in the defined surrounding;

- clustering means for clustering of data points into one or more definable clusters under consideration of at least one defined criterion;

- recommending means for recommending an unlabeled data point for active learning, wherein the recommending is based on the clustering;

- labeling means for labeling at least the recommended unlabeled data point based on the active learning; and

- controlling means for controlling the operation of the one or more functional devices based on information resulting from at least the labeled data points.

According to the invention it has been recognized that it is possible to provide a very efficient method and system by means of a very sophisticated labeling process which is based on a particular clustering. It has been further recognized that a recommendation of an unlabeled data point for active learning makes possible a suitable augmentation of labeled data points, wherein the recommendation is based on the particular kind of clustering. The kind of clustering some of the data points guides the active learning recommendation.

Clustering can be performed in a set of some data points each with a set of features. Some of the data points can be unlabeled and the rest of them are labeled by the at least one labeling function. The clustering can take into account the intermediate status of the data points - labeled or unlabeled.

The recommendation should be based on the clustering, which can mean that, the active learning can cover the data points that are in proximity to each other so that a reinforced labeling can help automatically labeling more unlabeled data points. For instance, a data point that is in the center of a cluster can lead to more data points automatically by the reinforced labeling. The clustering can help showing such points that are unlabeled and close to other unlabeled data points. As a result an efficient method and system for controlling an operation of one or more functional devices in a defined surrounding can be provided.

According to an embodiment of the invention the operation of the one or more functional devices can be controlled to achieve a defined operation purpose, wherein the operation purpose can be an efficient energy management. Other operation purposes are possible according to individual situations and requirements.

Within a further embodiment the data points can be provided in a data representation, which simplifies further method steps. Such a data representation can be a suitable matrix comprising the data points.

According to a further embodiment the defined surrounding can comprise or can be realized by a building. Different types and sizes of buildings are possible. According to a further embodiment the labeling of data points as a non-anomaly or an anomaly can refer to a building occupancy. Such an occupancy of buildings can vary in different regions or rooms of a building due to the usage of the building by persons. Within a further embodiment the one or more functional devices can comprise one or more HVAC devices, for example heating, ventilation or air conditioning apparatuses.

Within a further embodiment and for providing a high efficient method or system prior to the clustering step a label augmentation can be performed with regard to at least one labeling function, wherein an initial label augmentation can be applied on at least one unlabeled data point that is or are nearest to one or more existing labeled data points. Such a label argumentation can be performed for more labeling functions one after the other. Further, a label augmentation can be applied or performed for more than one unlabeled data point.

According to a further embodiment the at least one criterion regarding the clustering step can comprise at least one data feature and/or at least one existing anomaly label. Clustering can take into account labeled data points and unlabeled data points. Within a further embodiment the clustering step, the recommending step and the step of labeling at least the recommended unlabeled data point based on the active learning can be iterated for the at least one labeling function until a defined number of labeled data points or a defined proportion of labeled data points out of the provided data points is reached or exceeded. Such an iteration can provide a very efficient and accurate method or system.

Within a further embodiment the step of labeling at least the recommended unlabeled data point based on the active learning can comprise reinforcement labeling. As a result, a high efficient method or system can be provided.

According to a further embodiment a machine learning model can be trained for anomaly detection after the step of labeling at least the recommended unlabeled data point based on the active learning, so that the controlling step can be performed on the basis of a prediction using the machine learning model. Different machine learning models can be used in this context, for example supervised models. As a result, a high efficient controlling step can be performed.

Within a further embodiment a situation, failure and/or anomaly prediction can be performed after the step of labeling at least the recommended unlabeled data point based on the active learning or after the machine learning model training. This can result in a high efficient control of the operation of one or more functional devices in the defined surrounding.

According to a further embodiment the active learning can use at least one knowledge base, external data source, knowledge graph or oracle. This can result in a high efficient active learning process.

Within a further embodiment at least one step of the computer-implemented method and/or an anomaly detection can be performed in a digital twin of the surrounding. Based on such a digital twin anomalies can be avoided before they occur. According to a further embodiment the at least one environment sensor can provide a real-time monitoring of behaviors and/or situations in the surrounding for feeding the digital twin with sensor data. Such sensors can include different types of sensors for measuring different environment parameters. This can result in a very efficient system or method.

Within a further embodiment the digital twin can comprise virtual entities representing real-world functional devices and/or real-world functional services, wherein the devices and services being able to consume energy. On the basis of such virtual entities the digital twin can provide a very real image of the real world. This can result in a very efficient method or system.

Advantages and aspects of embodiments of the present invention are summarized as follows:

Embodiments can comprise an efficient LF-based clustering using additional annotations to data points, like anomalies, non-anomalies and unestimated, for example, for guiding the Active Learning Recommendation.

Further embodiments can comprise iterative augmentation of labels from LFs by active learning over the clusters of, for example weakly, supervised data points.

According to further embodiments a computer-implemented method for controlling HVAC based on building occupancy anomaly prediction can be provided, comprising the steps of

1) Building energy management data collection without ground-truth;

2) Applying of labeling functions, e.g. heuristics, for building anomalies;

3) Label Augmentation;

4) Data clustering: Clustering annotated data like anomalies, non-anomalies and unestimated;

5) Active learning recommendation: Active learning recommendation based on the clustering;

6) Knowledge-base, e.g. oracle, for active learning;

7) Iterating of step 3-6 till minimum dataset coverage threshold; 8) Supervised ML training;

9) Situation/failure/anomaly prediction; and

10)Controlling of building/campus/district operations, such as HVAC configuration, mobility/transportation scheduling, 5G network confiuration, lightning control, based on the ML model prediction.

Further embodiments propose an anomaly prediction for buildings without actual anomalies.

Within further embodiments the effort of domain experts for labeling data points can be reduced.

Further embodiments comprise automating configuration & maintainance effort for building facility manager.

Further embodiments provide energy consumption reduction thanks to better anomaly predictions.

Within further embodiments an anomaly detection through weak supervision can be provided.

According to further embodiments a system and method of energy management through building digital twins can be realized.

Further embodiments can propose a system and method for energy management in smart buildings through anomaly detection in building digital twins. The building digital twin can contain virtual entities representing real-world objects, and services, e.g., detecting anomalies, that might lead to energy consumption. Embodiments can propose a method of anomaly detection through weak supervision as alternative to having real building anomalies as ground-truth, thus the embodiments can allow avoiding anomalies in the first place before they occur.

Further, the disclosed system for controlling an operation of one or more functional devices in a defined surrounding, wherein an operation of one or more functional devices in a defined surrounding is controlled, can be realized in an apparatus or device comprising:

- providing and/or collecting means for providing and/or collecting sensor data from at least one environment sensor of the surrounding for providing data points out of the provided and/or collected sensor data;

- applying means for applying at least one labeling function on the data points for labeling data points as a non-anomaly or an anomaly in the defined surrounding;

- clustering means for clustering of data points into one or more definable clusters under consideration of at least one defined criterion;

- recommending means for recommending an unlabeled data point for active learning, wherein the recommending is based on the clustering;

- labeling means for labeling at least the recommended unlabeled data point based on the active learning; and

- controlling means for controlling the operation of the one or more functional devices based on information resulting from at least the labeled data points.

There are several ways how to design and further develop the teaching of the present invention in an advantageous way. To this end it is to be referred to the following explanation of examples of embodiments of the invention, illustrated by the drawing. In the drawing

Fig. 1 shows in a diagram steps of active labeling and label augmentation for a given labeling function according to an embodiment of the disclosure,

Fig. 2 shows a diagram with ML in a loop for energy management according to an embodiment of the disclosure,

Fig. 3 shows a diagram with ML in a loop for smart campus building occupancy anomaly detection according to an embodiment of the disclosure and Fig. 4 shows in a diagram steps of active labeling and label augmentation for a given labeling function as an equivalent of Fig.1 according to an embodiment of the disclosure.

Active Learning and Label Augmentation

According to an embodiment active Learning can be a valid solution to label new anomalies in a building digital twin by helping Reinforced Labeling - Label Augmentation - to classify new and unlabeled anomalies iteratively. For the active learning, existing knowledge bases or oracles, e.g., knowledge from domain experts, can be considered. These sources provide valuable knowledge that can be utilized in the machine learning models, whereas they can be costly and timeconsuming, especially in cases such as manual annotation of complete datasets.

In particular, with new (enough) classified events there would be satisfactory indication to label anomalies that are close to each other, taking into consideration thresholds as euclidean distance threshold and aggregated_effect_threshold that manage respectively the effects and the value of effects that has to be reached in order to classify a data point for the specific Labeling Function.

According to an embodiment a solution is described both in graphical and written ways as follows. A clarification to be made is that the “label augmentation” [1] is applied to all Labeling Functions but for simplicity only LF1 will be considered in the following explanation.

The initial label augmentation can be applied to mark unlabeled points that are nearest to the existing labeled points before a clustering step (Step 0 and Step 1), see Fig. 1 . Component for data representation and clustering

This component of the embodiment creates a data representation given the initial (augmented) labeling matrix and iteratively updated matrices where the labeling matrix is updated at each iteration as illustrated in Fig. 1 and 2. The component first decides how to represent the data considering all possible scenarios for a data point. Later, the component applies a special clustering technique where the input comes from the data representation and output is based on utilizing active learning recommendation.

Step 0 represents the anomaly described above: data points have two features, and x 2 ; e g-, defining virtual or physical coordinates of a place, and have been labeled as non-anomalies - circles - and anomalies - triangles - by a certain set of labeling functions. For simplicity, we include in Fig. 1 , only one labeling function LF 1 . At the same time, LF 1 could abstain - empty circles/triangles. Data points could be labeled in different ways from other labeling functions, but at the moment only LF is considered.

For this step of representing the data points, the labeling done by all or any LFs can be considered as the labels. For instance, a labeled point may mean the point labeled by all LFs, the majority of LFs, only one LF, or a particular LF, as in the example shown in Fig. 1. Furthermore, the labeling can consider the probabilistic latent labels that are the outputs of the generative process, or uncertainties of given data points.

The labeling can be applied as “anomaly” or “non-anomaly”, regardless of the real class that a data point belongs to - as circle or triangle.

In Step 1 , clustering is applied on the dataset with input features and existing classification labels from the labeling functions and possible augmented labels which label non-labeled data points that are close to the labeled points. Depending on the choice and implementation of the clustering model, the clustering may include all points or a subset of all points. The clusters may have different sizes. The clustering can be performed in various ways such as the two examples below:

1) Using data features to decide on the clusters

2) Using existing anomaly labels by Labeling Functions and/or Active Labeling for clustering

Clustering takes into account both labeled data points and unlabeled data points by “Labeling Functions”.

The above Step 0 data representation possibilities such as LF majority, labeling status, or probabilities, uncertainties can be used to implement an applicable clustering algorithm.

Component for active learning recommendation and active labeling

In Step 2, active learning recommendation according to an embodiment is applied on the clustered data. The results of the clustering such as data sizes of each cluster, their classification labels and the status of classifications, as well as any other available/applicable statistics can be used for guiding the active learning recommendation. For instance, a cluster with many data points that are close to each other may be chosen and a central point in the cluster can be recommended for active learning in order to reduce the labeling effort in the subsequent step and iteration of the active learning.

In Fig. 1 , data points are illustrated for the Active Learning Recommendation, see Step 2. It would be useful to choose objects with high impact in terms of effects, so that also Reinforced Labeling in Step 3 can automatically classify others. High impact data points are the ones that have an enough number of neighbor points that make them be placed on the same cluster after clustering.

This step also includes a decision on “which data points” to be labeled after the active labeling by the oracle. For instance, in some embodiment, the active labeling recommendation may mean active labeling for only a particular data point, or a particular cluster, where the active learning annotation would label the data points in the clusters that may or may not be labeled. Similarly, in other embodiments, a subset of the data points in the cluster or even data points in nearby or far away clusters can be labeled. Thus, various active labeling strategies are considered.

Reinforced labeling for data augmentation

In Step 3 Reinforcement Labeling is applied. This step “augments” the data points with the existing labels - already classified data points - to the data points that do not have labels - previously abstained - data points as described in [1],

According to an embodiment Steps 1 , 2 and 3 should be applied iteratively, and for each labeling function, until the dataset will be composed of enough labeled anomaly/non-anomaly points. In this regard, a new threshold will be introduced, namely data coverage, defined as a float in range [0,1], The termination condition will be that the proportion of labeled data points out of the entire dataset is greater than the data coverage threshold.

Lastly - can be regarded Step 4 -, a machine learning model can be trained for anomaly detection. Any applicable machine learning model such as supervised models can be utilized to make use of the labeling coming from the above steps.

Shortly, according to an embodiment the following steps are proposed:

0) Labeling function application to the dataset;

1) Label augmentation to generate labels automatically without active learning;

2) Clustering application as described above; and

3) For Active Labeling, for the given LF, a data point is recommended to enable more reinforced labeling, e.g., data points in the same cluster with the most number of abstains.

After the steps applied, the anomaly detection can be trained through a machine learning model, e.g., a supervised machine learning model. The energy management application is applied for the smart campus environment where the application detects anomalies in the environment for feeding the Digital Twin of Buildings that would make efficient decisions of heating, ventilation, and air conditioning, HVAC. For instance, in the anomaly of high occupancy in a conference room in the university building during summer, ventilation or air conditioning is automatically activated by the Digital Twin of the Smart Building.

Various environment sensors are deployed in the buildings for real-time monitoring of behaviors in the building and they feed the Digital Twin of the Building. The measurement sensors include CO2, humidity, building/room occupancy sensors, parking sensors, solar energy, temperature, mobile sensors, and others. The predictions of anomalies are made using the sensors, such that a situation can be defined as anomaly by the building management. For different anomaly situations, e.g., high occupancy situation vs. unexpectedly low attendance to an event, different HVAC decisions can be implemented according to the detected situations.

Since the ground-truth data collection for anomalies are costly and in most cases infeasible as real anomaly collection takes a long time, domain experts can write heuristic rules to label data through “labeling functions”. For instance, labeling functions can be written on each sensor data (timeseries) where when a certain range of measurement happens. The label can be either for anomaly or nonanomaly situations.

In Fig. 2, the newly introduced components according to an embodiment are the following: Decision making for labeling percentage, after having the “Augmented labels” from the “Generative Process”. Based on the augmented labels’ percentage through the complete dataset - sensor data - and the existing labeled data points in the Generative Process, the data clustering can be applied again. The clustering can take into account unlabeled data as well as labeled data for more efficient clustering. Afterwards, given the clusters and their content - labeled and unlabeled data -, a recommender system called “Active Learning Recommender” can make recommendations to the active learning. The active learning can be a human data annotator, e.g., domain expert, or an Oracle that makes guesses given the dataset and additional knowledge such as knowledge from external data sources and knowledge graphs.

Fig. 3 illustrates an embodiment of the system’s application to the smart campus building occupancy anomaly detection. The high occupancy situations are considered as anomalies. The anomalies would be used for HVAC decisions in the energy management.

Based on the augmented labels through the complete campus building data, the data clustering can be applied again. The clustering takes into account unlabeled data - no estimation for building occupancy anomaly - as well as labeled data - estimation for building occupancy anomaly or non-anomaly - for more efficient clustering. Afterward, given the clusters and their content - anomaly/non-anomaly or no estimation data -, a recommender system called “Active Learning Recommender” makes recommendations to the active learning.

The active learning can be a human data annotator, e.g., domain expert, or an Oracle that makes guesses given the dataset and additional knowledge such as knowledge from external data sources and knowledge graphs. For example, the human annotator, e.g., a building manager, might be aware of a specific day where more people than normal were present because of an exceptional event. After active learning, new labels are generated for the generative process and passed on to the “Label Augmenter”. The label augmentation can be performed for generating additional labels based on the existing labels - based on the closeness of the no estimation data points to the anomaly or non-anomaly data points - and the annotated data by active learning. A new set of Augmented Labels is generated.

After a number of iterations of active labeling, the desired percentage for labeled data points is satisfied. Thus, the anomaly/non-anomaly for occupancy labels and campus building data features are utilized by a supervised machine learning model, e.g., a Neural Network model. The supervised machine learning model predicts the high occupancy anomalies in the campus building. For implementation of “Unlabeled & Labeled Data Clustering”, existing clustering techniques such as DBScan or K-neares neighbors, KNN, are utilized. On the other hand, since there are existing anomaly/non-anomaly points, it is up to the implementation of the clustering to include them during the clustering or exclude them before applying clustering. Similar decisions can be made for efficiency and optimization of the clustering component.

For the implementation of the “Active Learning Recommender”, techniques such as ranking and uncertainty estimations can be leveraged. The previous clustering phase give guidance on the Active Learning Recommender, such that the Active Learning Recommender would take into account the cluster sizes and the content of the clusters, e.g., including many anomalies/non-anomalies vs. being fully unestimated. The choices of ranking for Active Learning would be a design choice. For instance, the priority can be given to fully unestimated clusters or clusters with conflicting labels, including both anomaly and non-anomaly.

The occupancy anomaly predictor uses the generated ML model to arise anomaly events. These events are the inputs to more cleverly control the HVAC system.

In some embodiments, the loop to generate the model is repeated periodically to include new data and to adapt to new conditions avoiding the model drift that is the degradation of model performance due to changes in data and relationships between input and output variables. Indeed, during time, the cluster calculator might individuate a new cluster with high uncertainty. The active learning recommender might present this cluster to be annotated.

In another embodiment, this invention can be used to quick start HVAC controller operation based on unlabeled data with minimal configuration given by the labeling functions and active learning recommender with decent HVAC control efficiency. Through time, iteration by iteration, the model and therefore the HVAC control will improve their performance with minimal intervention by human. This section, see Fig. 4, aims to highlight some numerical details, taking the situation pictured in Fig. 1 - 23 data points labeled by LF1 - and showing the four steps from a second perspective.

Here can be noticed how the whole set of LFs is applied to the dataset - Step 0 - and how these functions can output different labels. Steps 1 and 2 produce new augmented labels and group similar data points; Step 3 proposes an unlabeled data point in a cluster, the result will be a new label produced through Active Learning, this may be valid for each LF since it has been created by a domain expert. Iterating the previous steps, new augmented labels are generated starting from the latter, as for data points 19 and 20 in Step 1.

Many modifications and other embodiments of the invention set forth herein will come to mind to the one skilled in the art to which the invention pertains having the benefit of the teachings presented in the foregoing description and the associated drawings. Therefore, it is to be understood that the invention is not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.