Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DATA ANONYMIZATION METHOD
Document Type and Number:
WIPO Patent Application WO/2022/108562
Kind Code:
A1
Abstract:
The present invention relates to a method that classifies data anonymization requests coming from a closed or open network connected to the Internet, determines the anonymization levels for classes and transmits the anonymous data to clients by performing anonymization according to the determined anonymization levels. In said method, the anonymizer (1), which represents the anonymizing device, creates an encryption key according to the information of the client (2) and transmits the same securely to the client (2) together with the data.

Inventors:
ŞENOL SINAN (TR)
TÜRK YEKTA (TR)
Application Number:
PCT/TR2021/051226
Publication Date:
May 27, 2022
Filing Date:
November 17, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ASELSAN ELEKTRONIK SANAYI VE TICARET ANONIM SIRKETI (TR)
International Classes:
G06F21/62; G06F21/10; H04W12/02; H04W12/08
Foreign References:
US20140047551A12014-02-13
US20150186674A12015-07-02
US20170124336A12017-05-04
EP2228735A22010-09-15
Attorney, Agent or Firm:
DESTEK PATENT, INC. (TR)
Download PDF:
Claims:
CLAIMS A data anonymization method, characterized by comprising of the following process steps;

• creating the anonymization level search table,

• determining the anonymization level of the client (2) by means of the anonymizer (1 ) according to the information found in the anonymization level search table of the client (2) requesting the anonymized data,

• receives the original data from the server (3) and anonymizing the data according to the specified level of the client (2) and below the anonymization process by means of the anonymizer (1 ),

• generating a unique encryption key and securely transmitting the same to the client (2) by means of the anonymizer (1 ),

• encrypting the anonymized data to be transmitted to the client (2) with the generated key and sending the same to the client (2). The method according to claim 1 , characterized in that; the generated keys are checked with a control mechanism whether they are repeated or not. The method according to claim 1 , characterized in that; the generated encryption key is stored in the anonymizer (1 ). The method according to claim 1 , characterized in that; it uses a key exchange protocol or a customized protocol for secure key sending. The method according to claim 1 , characterized in that; the anonymization level search table is created manually by an operator (4) or automatically by interactions between the anonymizer (1 ) and the application/database servers.

Description:
DATA ANONYMIZATION METHOD

Field of the Invention

The present invention relates to a method which performs the classification of data anonymization requests coming from a closed or open network connected to the Internet, the determination of anonymization levels for classes and the transmission of anonymous data to clients by performing anonymization according to the determined anonymization levels.

State of the Art

Data anonymization is the process of hiding an object by protecting the identifiers that may be associated with the same. If we make reference to the anonymization processes, these are the processes of replacing the content of the data with other values, removing I deleting / replacing some parts of the data or encrypting the data with cryptographic operations. Therefore, individuals or software that need to perform an analysis on the values of the data (for example, to calculate statistical or behavioural analysis) operate on data that has been encrypted, modified or partially modified with anonymization, instead of the actual data value. An example status flow for the anonymizing process is shown in Figure 1 . There are 4 columns on the left side of the database. The first column is the ID column that is used to identify each row in the database. Other columns contain the person's name, surname, and date of birth, respectively. Changes after data anonymization are shown in the database on the right. A hash operation was applied to the second column of the database in Figure 1 , the third column is a permutation (blend) operation and the fourth column is renumbered by changing the values. The database table, which is the result of the specified processes and where the data is anonymized, is on the right in Figure 1 .

Similar information in a database can be stored in a plurality of sources. If similar identifiers are used in databases from different sources, In this case, a malevolent individual or software can obtain correlation information about the true value of the anonymized data. As an example, we can actually identify who is sick, which is private information, if we have access to a database of names and addresses of people in a city and another database of addresses and illnesses of patients in a hospital. Similar situations apply to the storage of trade secret data. For this reason, anonymization processes should be done very carefully and should be completely independent of the connection of newly assigned anonymous values with the object. Anonymization methods should not provide meaningful results that allow cross-reference between sources and reveal personal/private information.

Many studies in the literature aim to increase the degree of anonymization of data by using different solutions and algorithms. Furthermore, there are also studies that use different encryption methods developed so to be used in the anonymization processes. Providing diversity by developing these methods has been the main focus of many academic and industrial research topics. Furthermore, the classification methods used for anonymizing data and separating sensitive data by anonymizing devices positioned as a separate node in a network have also been the subject of patents and articles.

When studies in the state of the art are examined, a system that detects requests from a network, classifies requests and performs anonymization processes customized according to these classes does not draw the attention. Actually, a software/hardware which enables data anonymization must first detect and respond to anonymized requests. However, incoming requests may come from the internal or external network, as well as from people with different levels of authority and/or different people/institutions.

It shall be noted as an important point that anonymizing implementation in a database (or database table) requires a processing time in terms of software and hardware. It would be a mistake to expect the same anonymization process to be applied to each request, thus generating the same amount of processing time. In addition, the anonymization of data in a very complex or less complex way, it will cause weaknesses in the analyses to be made by different tasks independent of the intended use. For this reason, it is important to subject the data to a classification process for requests from different demand levels and to create anonymizing responses according to these requests. The application numbered W02001018631 A1 in the state of the art relates to a method for identifying the source of data in databases containing anonymized data. It is aimed to be used to allow a patient or medical provider to access patient data over a network where the data is stored in an anonymized data file. However, said application does not mention the classification of anonymous data requests or the realization of customized anonymization processes according to these classes.

As a result, due to the abovementioned disadvantages and the insufficiency of the current solutions regarding the subject matter, a development is required to be made in the relevant technical field.

Aim of the Invention

Anonymization requests coming from a network are securely delivered to the requesting host/server/IP address within the scope of the invention. The proposed solution allows data anonymization depending on the authorization levels of the requesters and provides an access control mechanism for these requests. Cryptographic principles are used so as to ensure the security of anonymization. The client cannot have sensitive information with the help of these cryptographic principles. Anonymization processing time will be shortened as different anonymous data levels will be created for different demand classes. A hierarchical relationship will be established between the anonymization levels of the ciaimers and the columns in the mapped database tables. This hierarchical association is also used so as to establish the complexity level of the data anonymization process. For example, anonymization will be less complex, which will reduce processing time for someone with the highest security clearance within an organization.

The invention enables the anonymization process to work automatically in a network and provides data anonymization as a network service for users.

The structural and characteristic features of the present invention will be understood clearly by the following drawings and the detailed description made with reference to these drawings.

Description of the Figures Figure 1 is an exemplary flow chart of the anonymization process in a database table.

Figure 2a shows the data flow when the data anonymizer is located as a separate network object in a network environment interacting with the client (requester).

Figure 2b shows the data flow when the data anonymizer is located as a separate network object in a network environment interacting with the client (requester).

Figure 3 shows the flow chart of the inventive method.

Figure 4a shows the flowchart of manually generating the anonymization level table by an operator.

Figure 4b shows the flowchart of the anonymization level table being automatically generated by interactions between the anonymizer and other application/database servers.

The figures are not required to be scaled and the details which are not necessary for understanding the present invention may be neglected.

Description of the Part References

1 . Anonymizer

2. Client

3. Server

4. Operator

5. CRM Server

6. SAP Server

7. SharePoint Server

8. DNS Server

Detailed Description of the Invention

In this detailed description, the preferred embodiments of the invention is described only for clarifying the subject matter in a manner such that no limiting effect is created.

The data anonymization process requested within a network is triggered by a user request within the network. Figure 2a and Figure 2b show the interactions of an anonymizer (1 ) with other units responding to a request from a client (2). Let's say anonymizer (1 ) to a device in a network that anonymizes data. This anonymizer (1 ) can be located as a separate device within the network as shown in Figure 2a or can be inside a server (3) as in Figure 2b. The task of the anonymizer (1 ) in the designed method is wherever and with what architecture (in the server (3) or as a separate device) in the network to send the anonymized data to the client (2) (requesters).

If we continue to examine it in terms of high-level design, as shown in Figure 2a, a customer sends an anonymization request for some data to be processed by him/her or the software he/she will use. Anonymizer (1 ) receives this request and determines which tables in the database and which columns in the tables the request belongs to (it is to be noted that this table and column information may also be forwarded to the anonymizer (1 ) on a predetermined basis). Then, the anonymizer (1 ) requests the original data from the database server (3). The server (3) transmits the data to the anonymizer (1 ). Then the data is anonymized and sent to the client (2). The same process is also described in Figure 2b, but this time the anonymizer (1 ) is located in a server (3) (as software/hardware).

Table 1 : A sample table consisting of client (2) IP addresses stored in the anonymizer (1 ), indicating the levels of anonymization by client (2)

Table 2: A sample table consisting of client (2) domain names stored in the anonymizer (1 ), indicating the levels of anonymization by client (2) Table 3: A sample table consisting of detailed network information of the client (2) stored in the anonymizer (1 ), indicating the levels of anonymization by client (2)

In the proposed solution, the anonymization level is determined by examining a table stored in the anonymizer (1 ). Table 1 represents an example showing the levels of anonymization. As can be seen, the clients (2) assigned to different anonymization levels have IP address and subnet mask values. The anonymization levels indicate the complexity (or degree of entropy) of the anonymized data. High complexity anonymization is performed for users with the highest value according to their anonymization level. Another enumeration method can be selected according to the anonymization level (for example, performing the least complex action for the user with the lowest anonymization level). An example can be given with a request from the IP address 192.168.10.1/24. This IP address is within the range of IP addresses defined by the first row in Table 1 . For this reason, the anonymization level is determined as “1 ”. In this case, this requestor, that is, the IP address 192.168.10.1/24, is assigned the value "1" as the anonymization level. Table 2, which shows the levels of anonymization stored in the anonymizer (1 ), has been created this time according to the domain names. Finally, a detailed table (IP address, domain name, user name, computer name, DNS name, geographical location, etc.) is created in the anonymizer (1 ), which assigns different anonymization levels specifically to each client (2), in Table 3. Suppose that one or a plurality of duplicate rows were accidentally written in said tables, including other rows. In this case, a principle can be included in the table where the generalized row containing other similar rows will apply so as to avoid confusion. Alternatively, this table may have a control mechanism that detects similar rows. The tables above can be created by an operator (4) as shown in Figure 4a or can be created within automation processes by interactions between the anonymizer (1 ) and other servers (3) in the network such as CRM server (5), SAP server (6), SharePoint server (7), Email, DNS server (8) as shown in Figure 4b. Configuration information can be sent to the anonymizer (1 ) as a result of this interaction.

There may be 2 methods that define the association between the anonymization level and the database, and they are defined as follows;

1. Predefined rules can be used so as to anonymize data. For example, let's anonymize the data for the user whose anonymization level value is "1 " (let’s say, the highest, namely, the most complex anonymization will be performed). In this case, encryption functions, which are the strongest methods of hashing for columns containing sensitive information, or completely masking these columns can be applied. Let's anonymize the data for the requestor with an anonymization level value of 8 (let's say the lowest, namely, the least complex anonymization will be performed). In this case, simple permutation or replacement functions with a different value can be applied to each column. These rules and the operation options to be made on the columns can be determined by the operator (4).

2. A tree-based graphic structure is created for the columns of the database. In other words, an approach is applied in a hierarchical and tree structure between the columns of the database. In Table 4, this hierarchical structure is depicted as an example. This example shows a database table stored in a human resources department of a company. In the proposed method, the relevant columns of the database are labelled (by an operator (4) or automatically) as columns containing sensitive data. A tree structure is then created based on the relationship between the top and bottom layer columns. The relationship creation process can be performed by an operator (4) or by means of software. The levels of the anonymization process and the related anonymization process are explained by the operator (4) and this can be seen in the left part of Table 4. Suppose that the operator (4) arranges the columns in order of importance from highest to lowest, sensitive value, text value, numeric value, date/hour/time stamp. In this case, the complexity of the operations for anonymization will vary according to the defined levels.

For example, let's say the lowest anonymization level, all data values (sensitive, text, numeric, date/time/time stamp) are processed with defined anonymization operations from the first level. The operations are mask, replacement, permutation, and renumbering, which are applied to different levels, respectively. In addition, stronger anonymization processes will be implemented. For example, let's define level 4 in Table 4 as the highest anonymization level. Then, it is decided by the operator (4) not to send sensitive data and numerical data at this level to the client (2) at that level. Therefore, these two columns are hidden during the anonymization process. Also, complex operations like noise or encryption for numeric and date/time/timestamp columns are implemented to make anonymization more complex for those making such requests.

Table 4: Hierarchical structure of columns of a sample database containing employee information in a company

After the anonymizer (1 ) completes processing on the data according to the anonymization level, the processed data must be delivered to the client (2). Prior to delivery, an encryption key is generated that will be unique to the client (2) specific to this delivery. This generated encryption key is also stored in the database in the anonymizer (1 ). Alternatively, this key can be stored as a separate table. The anonymized data can only be accessed by the client (2) of the data with this generation of the encryption key. Because this key will only be delivered to the client (2). Therefore, only the client (2) that has the key can access the anonymized data revealed by the anonymizer (1 ). A control mechanism that detects whether the keys generated for continuous incoming requests are repeated should be implemented in an environment where keys are continuously generated. When a key is regenerated or some columns in the row are omitted in the key calculation, a new key must be generated for security purposes. Once the key is generated from the requester information, only the requester with this generated key can access this anonymous data. Secure delivery of the key can be achieved with secure key exchange protocols such as IKE or other mechanisms can be used.

The general flow chart of the method is presented in Figure 3. When the general flowchart is examines, the process begins with the creation of the anonymization search table. This table is then examined so as to determine the level of anonymization based on the requester's information. If there is no match for the requester in the anonymization level search table, this request is cancelled and no further action is taken against this request. In the next step, this data is anonymized after the anonymizer has retrieved the original data from the server. An encryption key is generated and securely delivered to the requester before anonymized data is sent to the requester. As mentioned previously, any secure key exchange or customized protocol can be used for key exchange. Anonymized data to be transmitted to the client is encrypted with this generated key. If the key has been securely delivered, the requester gets access to the anonymized data.