A METHOD TO PREVENT CAPTURING OF MODELS IN AN ARTIFICIAL INTELLIGENCE BASED SYSTEM

Title:

A METHOD TO PREVENT CAPTURING OF MODELS IN AN ARTIFICIAL INTELLIGENCE BASED SYSTEM

Document Type and Number:

WIPO Patent Application WO/2020/259946

Kind Code:

Abstract:

The invention discloses a system and a method to prevent unauthorized capturing of models in an Artificial Intelligence based system (AI system) (100). The method comprises the steps: receiving an input (103) from a user; checking whether the input (103) is right input or wrong input; computing information gain extracted by said user when the input (103) is wrong input; locking out said system (100) when said information gain exceeds a pre-defined threshold.

Inventors:

HIMAJIT AITHAL (IN)
PARMAR MANOJKUMAR SOMABHAI (IN)

Application Number:

PCT/EP2020/064838

Publication Date:

December 30, 2020

Filing Date:

May 28, 2020

Export Citation:

Click for automatic bibliography generation Help

Assignee:

BOSCH GMBH ROBERT (DE)
ROBERT BOSCH ENGINEERING AND BUSINESS SOLUTIONS PRIVATE LTD (IN)

International Classes:

G06N3/08; G06N7/00; G06N20/00; G06N3/04; G06N5/00

Domestic Patent References:

WO2019014487A1

2019-01-17

Foreign References:

US20200134391A1	2020-04-30
US20190095629A1	2019-03-28

Other References:

MANISH KESARWANI ET AL: "Model Extraction Warning in MLaaS Paradigm", 20181203; 1077952576 - 1077952576, 3 December 2018 (2018-12-03), pages 371 - 380, XP058421558, ISBN: 978-1-4503-6569-7, DOI: 10.1145/3274694.3274740
JUUTI MIKA ET AL: "PRADA: Protecting Against DNN Model Stealing Attacks", 2019 IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P), IEEE, 17 June 2019 (2019-06-17), pages 512 - 527, XP033600710, DOI: 10.1109/EUROSP.2019.00044
DORJAN HITAJ ET AL: "Have You Stolen My Model? Evasion Attacks Against Deep Neural Network Watermarking Techniques", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 3 September 2018 (2018-09-03), XP080913139

Attorney, Agent or Firm:

BEE, Joachim (DE)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

We Claim: 1. A method to prevent capturing of a model (112) in an Artificial Intelligence based system (AI system) (100), said method comprising the steps: receiving an input (103) from an user;

checking whether said input (103) is right input or wrong input;

- computing information gain extracted by said user when said input is

(103) wrong input;

locking out said system (100) when said information gain exceeds a pre-defmed threshold.

2. A method to prevent capturing of models in an Artificial Intelligence based system (100) according to claim 1, wherein said method further comprises the step of generating wrong input dataset which are used to train said model (112), wherein said wrong input dataset is generated using random queries. 3. A method to prevent capturing of a model (112) in an Artificial Intelligence based system (100) according to claim 1 , wherein said input (103) is treated as right input if said input (103) falls in one of the classes in a data set (104) maintained in said system (100) 4. A method to prevent capturing of a model (112) in an Artificial Intelligence based system (100) according to claim 1 , wherein said input is treated as wrong input if said input does not fall in any of the classes in a data set (104) maintained in said system (100)

5. A method to prevent capturing of a model (112) in an Artificial Intelligence based system (100) according to claim 1, wherein said information gain is computed using information gain model.

6. A method to prevent capturing of a model (112) in an Artificial Intelligence based system (100) according to claim 1, wherein said system (100) locks itself out when cumulative information gain exceeds a pre-defmed threshold.

7. A method to prevent capturing of a model (112) in an Artificial Intelligence based system (100) according to claim 1, wherein a profde of said user is computed whenever said user provides a wrong input

8. A method to prevent capturing of a model (112) in an Artificial Intelligence based system (100) according to claim 1 , wherein said profile of said user is computed using at least one of the methods from, types of the wrong inputs provided by the said user,

number of times said user provided said wrong inputs,

the time of the day when said user provided said wrong inputs, - the physical location of said user

the digital location of said system

the similarity of said users

the demographic information of said user

9. A method to prevent capturing of a model (112) in an Artificial Intelligence based system (100) according to claim 6, wherein said profile of said user is used to determine an unlocking event for said system (100).

10. A method to prevent capturing of model (112) in an Artificial Intelligence based system (100) according to claim 1, wherein a class from said data set (104) is returned for said right input.

11. A method to prevent capturing of models in an Artificial Intelligence based system (100) according to claim 1, wherein a notification (114) is returned for said wrong input.

12. A method to prevent capturing of models in an Artificial Intelligence based system (100) according to claim 9, wherein unlocking criteria is at least one of a fixed duration of time, a fixed number of right inputs and a manual override.

13. An Artificial Intelligence based system (100) comprising:

an input interface (102) to receive inputs (103);

an output interface (118) to provide outputs (116);

a model (112) to process received inputs (103), said model (112) using artificial intelligence techniques;

a data set (104) where different classes are stored;

the system (100) is adapted to

receive an input (103) from an user;

check whether said input (103) is right input or wrong input;

compute information gain extracted by said user when said input is wrong input;

lockout said system (100) when said information gain exceeds a pre defined threshold

14. An Artificial Intelligence based system (100) according to claim 1, wherein said system (100) is further adapted to update said user profile whenever said user provides a wrong input.

15. An Artificial Intelligence based system (100) according to claim 1, wherein said system (100) is further adapted to use said user profile to determine an unlocking event for said system (100).

Description:

Title:

A method to prevent capturing of models in an Artificial Intelligence based system

Field of the invention

[001] The present disclosure relates to processing of data and making decisions using artificial intelligence (AI). In particular the disclosure is related to prevent attacks on the AI based systems, where the attacks are aimed at stealing the models used in the AI which are deployed in target environment. Here stealing the models refers to trying to get/copy the functionality of the model through attack vectors using reverse engineering. The model here refers to any logic, algorithms, methods used in the processing of the data.

Background of the invention

[002] Now days, most of the data processing and decision making systems are implemented using artificial intelligence modules. The artificial intelligence modules use different techniques like machine learning, neural networks, deep learning etc.

[003] Most of the AI based systems, receive large amounts of data and process the data to train AI models. Trained AI models generate output based on the use cases requested by the user. Typically the AI systems are used in the fields of computer vision, speech recognition, natural language processing, audio recognition, healthcare, autonomous driving, manufacturing, robotics etc. where they process data to generate required output based on certain rules/intelligence acquired through training.

[004] To process the inputs, the AI systems use various models/algorithms which are trained using the training data. Once the AI system is trained using the training data, the AI systems use the models to analyze the real time data and generate appropriate result. The models may be fine-tuned in real-time based on the results.

[005] The models in the AI systems form the core of the system. Lots of effort, resources (tangible and intangible), and knowledge goes into developing these models.

[006] It is possible that some adversary may try to captur e/copy/ extract the model from AI systems. The adversary may use different techniques to capture the model from the AI systems. One of the simple techniques used by the adversaries is where the adversary sends different queries to the AI system iteratively, using his own test data. The test data may be designed in a way to extract internal information about the working of the models in the AI system. The adversary uses the generated results to train his own models. By doing these steps iteratively, it is possible to capture the internals of the model and a parallel model can be built using the same logic. This will cause hardships to the original developer of the AI systems. The hardships may be in the form of business disadvantages, loss of confidential information, loss of lead time spent in development, loss of intellectual properties, loss of future revenues etc.

[007] There are methods known in the prior arts to identify such attacks by the adversaries and to protect the models used in the AI system. The prior art US 2019 / 0095629 AI discloses one such method.

[008] The method disclosed in above prior art receives the inputs, the input data is processed by applying a trained model to the input data to generate an output vector having values for each of the plurality of pre-defined classes. A query engine modifies the output vector by inserting a query in a function associated with generating the output vector, to thereby generate a modified output vector. The modified output vector is then output. The query engine modifies one or more values to disguise the trained configuration of the trained model logic while maintaining accuracy of classification of the input data.

Brief description of the accompanying drawing

[009] Different modes of the invention are disclosed in detail in the description and illustrated in the accompanying drawing:

[010] FIG. 1 illustrates a block diagram of a AI based system capable of preventing stealing of a model implemented in the AI system, according to one embodiment of the invention

Fig. 2 illustrates flow chart for a method to prevent stealing of a model implemented in the AI system

Detailed description of the embodiments

[Oil] Shown in fig. 1 is a block diagram of an AI based system 100 capable of preventing stealing of a model 112 implemented in the AI system 100, according to one embodiment of the invention. The AI based system 100 is also referred just as AI system or just as system or as a data processing system in this document. The term stealing is also referred as capturing in this document. The model 112 is also referred as AI module in this document.

[012] Only the important components of the system 100 are disclosed in this document as all other components are commonly known. The system 100 comprises an input interface 102 to receive inputs 103, a data set 104, a pre-processor 106, an information gain computing module 108, a blocker module 110, the AI module 112, a blocker notifier 114 and an output interface 118. Here the input 103 refers to what is provided by the user to the system. The data set 104 refers to a set of classes stored in the system 100. The output 116 refers to one of the classes out of the data set 104, which is provided to the user through the output interface 118.

[013] The AI module 112 processes the inputs using AI techniques and generates required output. The AI module may be implemented as a set of software instructions, combination of software and hardware or any combination of the same. The AI module is also referred as module in this document. The input interfacel 02 may be a keyboard, a touch screen, images, videos, suitable stimulus etc. It is also possible that the inputs may come over a bus or wirelessly or through any other communication. The output interface 118 may comprise a display or a bus. The output 116 may be displayed on a display or sent through the bus which can be read by other devices. The module may be implemented using a neural network. Neural network is only an example here as there are other techniques available for implementing AI modules.

[014] Neural networks are a set of algorithms, modeled after the human brain and cognition theory that are designed to recognize patterns.

[015] Neural networks help us cluster and classify the data. They help to group unlabeled data according to similarities among the training (example) inputs, and they classify data when they have a labeled dataset to train on. Neural networks can also extract features that are fed to other algorithms for clustering and classification.

[016] The AI module may comprise a neural network according to one embodiment of the invention. It is also possible that the AI module may be implemented using other techniques. The neural network typically has an input layer, hidden layers and an output layer. These layers are commonly known in AI modules and hence not shown in fig. 1. In deep neural networks there may be multiple hidden layers. The data is processed in hidden layers. The output of one hidden layer is passed as input to next hidden layer for further processing. There may be different weightages assigned to the inputs at different layers.

[017] The layers are typically made of nodes. A node is just a place where computation happens, loosely patterned on a neuron in the human brain, which fires when it encounters sufficient stimuli. A node combines input from the data with a set of coefficients, or weights that either amplify or dampen that input, thereby assigning significance to inputs with regard to the task the algorithm is trying to learn; e.g. which input is most helpful in classifying data without error? These input-weight products are summed and then the sum is passed through a node’s so-called activation function, to determine whether and to what extent that signal should progress further through the network to affect the ultimate outcome, say, an act of classification. If the signals passes through, the neuron has been“activated.” These techniques are commonly known and not described in detail in this document.

[018] Deep learning neural network maps inputs to outputs. It finds correlations/patterns between the inputs 102 and outputs 118 The neural networks can learn to approximate an unknown function f(x) = y between any sets of input x and any sets of output y, assuming they are related. In the process of learning, a neural network finds the right f, or the correct manner of transforming and mapping x into y.

[019] And determining the function which correlates and/or finds the pattern to map the input to the output, forms the AI module which may be stolen by an adversary.

[020] Some of the typical tasks performed by AI systems are classification, clustering, regression etc. [021] Majority of classification tasks depend upon labeled datasets; that is, the data sets are labelled manually in order for a neural network to learn the correlation between labels and data. This is known as supervised learning. Some of the typical applications of classifications are: face recognition, object identification, gesture recognition, voice recognition etc.

[022] Clustering or grouping is the detection of similarities in the inputs. The cluster learning techniques do not require labels to detect similarities. Learning without labels is called unsupervised learning. Unlabeled data is the majority of data in the world. One law of machine learning is: the more data an algorithm can train on, the more accurate it will be. Therefore, unsupervised learning models/algorithms has the potential to produce accurate models as training dataset size grows.

[023] As the module forms the core of the AI system, the module needs to be protected against stealing by adversaries. The invention proposes a method to prevent any such attempts to steal the model.

[024] A model stealing attack is a kind of attack vector that can make a digital twin/replica/copy of a pre-trained machine learning model. This attack was demonstrated in different research papers, where the model was captured/copied/extracted to build a substitute model with similar performance.

[025] The attacker typically generates random queries of the size and shape of the input specifications and starts querying the model with these arbitrary queries. This querying produces input-output pairs for random queries and generates a secondary dataset that was inferred from the pre-trained model. The attacker then take this I/O pairs and trains the new model from scratch using this secondary dataset. This is black box model attack vector where no prior knowledge of original model is required. As the prior information regarding model is available and increasing, attacker moves towards more intelligent attacks. The attacker chooses relevant dataset at his disposal to extract model more efficiently. This is domain intelligence model backed attack vector.

[026] With these approaches, it is possible to demonstrate model stealing attack across different models and datasets. With regard to this attack, the invention discloses a method to prevent such attacks for stealing the models.

[027] The invention proposes to counter any attack by an adversary through the pre processor 106, the information gain computing module 108, the blocker module 110, and the blocker notifier 114.

The working of the invention is explained below.

[028] During the training of the AI module 112, a known set of inputs and a known set of classes are used to train the module. The module is trained to classify the inputs 103 into one of the known classes represented in 104.

[029] As explained, the module 112 will try to approximate any given input 103 to one of the known classes which is the closest to the predefined class 104.

[030] During the training, the pre-processor 106 is trained to identify right input and wrong input. During the training of the pre-processor 106, any input 103 which falls in any of the known classes in the data set 104 is identified as right input and any input 103 which does not fall in any of the known classes in the data set 104 is identified as wrong input. The pre-processor 106 is trained using methods such as a' simple Convolutional neural network to do the binary classification of input 103 as right input or wrong input. During the training, the wrong input class is sampled and ensured that wrong input class is sufficiently represented to avoid any class imbalance problem. Here pre-labeled data is classified as right input. Then the optimal attack vector and other sub-optimal attack vectors are determined. These attack vectors are random queries to capture the model 112. This attack vector is labelled as wrong input.

During the runtime of the system, there may be two scenarios.

[031] In a first scenario, the user may provide right input 103. The pre-processor 106 identifies it as right input. Then the system 100 provides the class corresponding to the input 103, as output 118.

[032] In a second scenario the user provides wrong input 103 to the system 100. The input 103 is provided to the pre-processor 106. The pre-processor 106 identifies it as wrong input. Then the information gain achieved by user is computed using information gain model. This is done by information gain computing module 108. Information gain model is a simple way to represent knowledge extracted with random queries.

[033] In information theory and machine learning, information gain is a synonym for Kullback-Leibler divergence; the amount of information gained about a random variable or signal from observing another random variable. However, in the context of decision trees, the term is sometimes used synonymously with mutual information, which is the conditional expected value of the Kullback-Leibler divergence of the univariate probability distribution of one variable from the conditional distribution of this variable given the other one. This explains one of the information gain models. As the information gain model is well known, the same is not discussed in detail.

[034] Apart from computing information gain, the user’s profile is computed by the pre processor 106. The user profile may be computed by studying the user behavior. The factors used in studying user behavior for profiling the user may include the types of the wrong inputs provided by the user, number of times the user provided the wrong inputs, the time of the day when the user provided the wrong inputs etc. The cumulative information gain of the user is computed using these factors.

[035] If the cumulative information gain extracted for the user does not exceed a pre defined threshold, then the system 100 returns the most approximated class corresponding to the wrong input, as output 118.

[036] If the cumulative information gain extracted by the user exceeds a pre-defmed threshold, then the blocker 110 locks the system 100 so that user is not able to use the system 100 further. The system 100 may be unlocked only after an unlocking criteria is met. The unlocking criteria may be a certain event, for example, a fixed duration of time, a fixed number of right inputs, a manual override etc. Manual override is where unlocking is done by providing manual inputs.

[037] Once the system 100 is locked, a notification is generated by the blocker notifier 114. The notification may be in the form of a message to the user. The message may be displayed on the display or sent out through the port.

[038] In addition the user profile may be used to determine whether the user is habitual attacker or was it one time attack or was it only incidental attack etc. Depending upon the user profile, the steps for unlocking of the system may be determined. If it was first time attacker, the user may be locked out temporarily. If the attacker is habitual attacker then a stricter unlocking steps may be suggested.

[039] In the absence of a valid input-output pair, the attacker cannot exploit model stealing attack. [040] Fig. 2 shows a flow chart of the proposed invention according to one embodiment of the invention.

[041] In step SI, the user input is received. In step S2 the input 103 is checked by the pre-processor whether it is right input or wrong input. If the input is right input, in step S3, matching class is returned as output 116. If the input 103 is wrong input, in S4, the information gain is computed and user profde is updated. In S5, it is checked whether the cumulative information gain exceeds the predefined threshold. If yes, in S6, the system is locked. In S7, the criteria to unlock are received. In S8, it is checked whether unlocking criteria are met. If yes, then system is unlocked and goes to S 1. If no, then system remains locked.

[042] The invention proposes a simple system which has capability to prevent any attack from an adversary to steal a module which is based on AI. The system comprises: an input interface (102) to receive inputs (103); an output interface (118) to provide outputs (116); a model (112) to process received inputs (103), the model (112) using artificial intelligence techniques; a data set 104 where different classes are stored; the system 100 is adapted to receive an input (103) from an user; check whether the input (103) is right input or wrong input; compute information gain extracted by the user when said input is wrong input; locking out the system when the cumulative information gain exceeds a pre-defined threshold

[043] The pre-processor 106, the information gain computing module 108, the blocker module 110 and the blocker notification module 114 may be implemented as a set of instructions stored and executed in the memory of the AI system 100. The same may be also implemented as a combination of software and hardware logic. The implementation of these modules is commonly known. It is also possible that some or all of these modules may be combined. [044] The invention proposes a method to prevent attack from an adversary to steal a model. The method comprises steps of: receiving an input (103) from an user; checking whether said input (103) is right input or wrong input; computing information gain extracted by said user when said input is (103) wrong input; locking out said system (100) when said information gain exceeds a pre-defmed threshold

[045] The system and method are simple and can be implemented as a function in the AI module, without needing any special hardware or software.

Previous Patent: AN ANTENNA SYSTEM

Next Patent: DETERGENT COMPOSITION