Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SYSTEM FOR DATA SANITISATION
Document Type and Number:
WIPO Patent Application WO/2022/258293
Kind Code:
A1
Abstract:
A method (200) for sanitising data (110), wherein said data comprises a plurality of Personally Identifiable Information (PII) elements, the method comprising the steps of: identifying within the data at least one PII element (220); assigning, to each identified PII element, a permission level from a plurality of permission levels for defining access rights to PII elements; sanitising the data so as to hide each identified PII element, thereby to generate sanitised data (230); receiving a request from a user to retrieve the data, the user having assigned to them a permission level from the plurality of permission levels (240); in response to said request, reversing the sanitising of the data for only the PII element/s to which the user has permission to access in dependence on the permission levels assigned to the user and to each identified PII element, thereby to generate un-sanitised data; and outputting the un-sanitised data to the user (270).

Inventors:
ABU-TAIR MAMUN (GB)
PERRY PHILIP (GB)
VARGHESE BLESSON (GB)
ZOUALFAGHARI MOHAMMAD HOSSEIN (GB)
Application Number:
PCT/EP2022/062855
Publication Date:
December 15, 2022
Filing Date:
May 11, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BRITISH TELECOMM (GB)
International Classes:
G06F21/62; H04L9/40
Foreign References:
US20200074105A12020-03-05
US20180285591A12018-10-04
GB2535183A2016-08-17
US20200028823A12020-01-23
US20190377901A12019-12-12
Attorney, Agent or Firm:
BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY, INTELLECTUAL PROPERTY DEPARTMENT (GB)
Download PDF:
Claims:
CLAIMS

1 . A method for sanitising data, wherein said data comprises a plurality of Personally Identifiable Information, PI I, elements, the method comprising the steps of: identifying within the data at least one Pll element; assigning, to each identified Pll element, a permission level from a plurality of permission levels for defining access rights to Pll elements; sanitising the data so as to hide each identified Pll element, thereby to generate sanitised data; receiving a request from a user to retrieve the data, the user having assigned to them a permission level from the plurality of permission levels; in response to said request, reversing the sanitising of the data for only the Pll element/s to which the user has permission to access in dependence on the permission levels assigned to the user and to each identified Pll element, thereby to generate un-sanitised data; and outputting the un-sanitised data to the user.

2. A method according to Claim 1 , further comprising the steps of: assigning a set of unique identifiers for each Pll element; assigning a unique identifier for the sanitised data; and associating the set of unique identifiers with the unique identifier, said association being used for reversing the sanitising of the data.

3. A method according to any preceding claim, further comprising the steps of: identifying location information of each Pll element, said location information representing a location of the Pll element in the data; and arranging a Pll element in the data using said location information when reversing the sanitising of the data.

4. A method according to any preceding claim, wherein the step of sanitising the data comprises deleting in the data a part that represents each Pll element, such that the sanitised data is devoid of said part.

5. A method according to any preceding claim, further comprising the step of storing each Pll element and storing the sanitised data, wherein the step of reversing the sanitising is performed by retrieving and recombining the Pll element/s and the sanitised data.

6. A method according to any preceding claim, further comprising the step of deleting the data upon generating the sanitised data.

7. A method according to any preceding claim, wherein the un-sanitised data comprises none of the PI I elements in response to identifying that the permission level of the user does not authorise access to any of the PI I elements.

8. A method according to any of Claims 1 to 6, wherein the un-sanitised data comprises some, but not all, of the PI I elements in response to identifying that the permission level of the user authorises only access said some PI I elements.

9. A method according to any of Claims 1 to 6, wherein the un-sanitised data comprises all of the PI I elements in response to identifying that the permission level of the user authorises access to all of the PI I elements.

10. A method according to any preceding claim, further comprising the step of a recording device generating the data, wherein at least the step of sanitising each identified Pll element is performed by or local to the recording device.

11 . A method according to any preceding claim, wherein the data is uncompressed data.

12. A method according to any preceding claim, wherein said method is performed by a single apparatus.

13. A method according to any preceding claim, wherein each Pll element is encrypted using an encryption key, and wherein said encryption key is stored in a blockchain system.

14. A computer-readable carrier medium comprising a computer program, which, when the computer program is executed by a computer, causes the computer to carry out the steps of any one of the preceding claims.

15. An apparatus for sanitising data, wherein said data comprises a plurality of Personally Identifiable Information (Pll) elements, the apparatus comprising: a processor configured to: identify within the data at least one Pll element; assign, to each identified Pll element, a permission level from a plurality of permission levels for defining access rights to Pll elements; sanitise the data so as to hide each identified Pll element, thereby to generate sanitised data; receive a request from a user to retrieve the data, the user having assigned to them a permission level from the plurality of permission levels; and in response to said request, reverse the sanitising of the data for only the Pll element/s to which the user has permission to access in dependence on the permission levels assigned to the user and to each identified Pll element, thereby to generate un-sanitised data; and output the un-sanitised data to the user.

16. An apparatus according to Claim 15, wherein the apparatus is further configured to generate the data.

Description:
METHOD AND SYSTEM FOR DATA SANITISATION

Field of Invention

The present invention relates to a method of reversible and selective data sanitisation for data comprising Personally Identifiable Information (Pll).

Background

Personally Identifiable Information (Pll) is information which can identify or permit tracking of an individual; in the interest of privacy, such information may be deemed sensitive, and its distribution should therefore be limited only to authorised parties. Different kinds of Pll may have different levels of sensitivity, for example an individual’s medical records may be deemed more sensitive than their contact details.

Data sanitisation is a process of concealing sensitive information (including Pll) in data, leaving only non-PII data accessible.

As data-gathering becomes more widespread (not least, from increasing deployment of data- capture devices such as CCTV systems, loT and wearable medical devices, as well as from digitisation of personal records) there is an ever-increasing body of data comprising Pll. Furthermore, each item of data may comprise multiple sets of Pll, with each set being associated with a different individual and/or having a different level of sensitivity. Still further, as digitisation advances, a greater number of parties may seek (legitimately or otherwise) to access Pll- comprising data.

As such, a challenge arises in efficiently and effectively managing legitimate access to Pll by only parties entitled to do so.

It is an aim of the present invention to at least alleviate some of the aforementioned problems.

Statements of Invention

According to a first aspect of the present invention, there is provided: a method for sanitising data, wherein said data comprises a plurality of Personally Identifiable Information (Pll) elements, the method comprising the steps of: identifying within the data at least one Pll element; assigning, to each identified Pll element, a permission level from a plurality of permission levels for defining access rights to Pll elements; sanitising the data so as to hide each identified Pll element, thereby to generate sanitised data; receiving a request from a user to retrieve the data, the user having assigned to them a permission level from the plurality of permission levels; in response to said request, reversing the sanitising of the data for only the Pll element/s to which the user has permission to access in dependence on the permission levels assigned to the user and to each identified Pll element, thereby to generate un-sanitised data; and outputting the un-sanitised data to the user.

According to a second aspect of the invention, there is provided: a method for sanitising data, wherein said data comprises a plurality of Personally Identifiable Information (Pll) elements, the method comprising the steps of: identifying within the data at least one Pll element; assigning, to each identified Pll element, a permission level from a plurality of permission levels for defining access rights to Pll elements; receiving a request from a user to retrieve the data, the user having assigned to them a permission level from the plurality of permission levels; in response to said request, sanitising the data for only the Pll element/s to which the user does not have permission to access in dependence on the permission levels assigned to the user and to each identified Pll element, thereby to generate un-sanitised data; and outputting the un-sanitised data to the user.

As used herein the terms “sanitising”, or similar such terms, preferably connote reversible encryption, removal and/or obfuscation of information; such a process may include data “censoring”, “erasure”, “denaturisation” and/or “redaction”.

Preferably, the plurality of permission levels are hierarchical. Optionally, the plurality of permission levels are compartmentalised permissions, for example such that a given permission level allows access to all, and only all, Pll elements for a particular data subject and/or of a given type (regardless of the sensitivity of that Pll), but not another data subject. Optionally, the permission levels correspond to a subject of the Pll element and/or a type of Pll element.

Preferably, the data is in the form of: an image; video or video frame; audio; and/or text document. Preferably, the request and the un-sanitised data is received and output, respectively, over a network, and in particular a wide area network.

Preferably, the method further comprises the steps of: assigning a set of unique identifiers for each Pll element; assigning a unique identifier for the sanitised data; and associating the set of unique identifiers with the unique identifier, said association being used for reversing the sanitising of the data.

Preferably, the method further comprises the comprises the steps of: identifying location information of each Pll element, said location information representing a location of the Pll element in the data; and arranging a Pll element in the data using said location information when reversing the sanitising of the data. In this way, the PI I element may be returned to the correct original position in the data. Preferably, the location information is a positional location and/or temporal location.

Preferably, the step of sanitising the data comprises deleting in the data a part that represents each Pll element, such that the sanitised data is devoid of said part. Optionally, the step of sanitising the data comprises: redaction and/or ‘filleting’; masking and/or replacement; and/or graphic filtering.

Preferably, the method further comprises the step of storing each Pll element and storing the sanitised data, wherein the step of reversing the sanitising is performed by retrieving and recombining the Pll element/s and the sanitised data. Preferably, each Pll element is stored separately and/or independently of the data and of the sanitised data. Optionally, the step of reversing the sanitising requires both the Pll elements and the sanitised data, that is the data is sanitised such that the sanitising cannot be reversed solely from the sanitised data. Preferably, each Pll element is stored in a secure (optionally, encrypted) data store.

Preferably, the method further comprises the step of deleting the data upon generating the sanitised data.

Preferably, the un-sanitised data comprises none of the Pll elements in response to identifying that the permission level of the user does not authorise access to any of the Pll elements.

Preferably, the un-sanitised data comprises some, but not all, of the Pll elements in response to identifying that the permission level of the user authorises only access said some Pll elements.

Preferably, the un-sanitised data comprises all of the Pll elements in response to identifying that the permission level of the user authorises access to all of the Pll elements.

Preferably, the method further comprises the step of a recording device generating the data, wherein at least the step of sanitising each identified Pll element is performed by or local to the recording device. Preferably, the recording device is in the form of a sensor, and in particular in the form of a: camera, and in particular a CCTV camera; and/or microphone.

Preferably, the data is uncompressed data.

Preferably, the aforementioned method/s is/are performed by a single apparatus. Preferably, each Pll element is encrypted using an encryption key, and wherein said encryption key is stored in a blockchain system.

According to another aspect of the invention, there is provided a computer-readable carrier medium comprising a computer program, which, when the computer program is executed by a computer, causes the computer to carry out the steps of the method/s described above.

According to yet another aspect of the invention, there is provided an apparatus for sanitising data, wherein said data comprises a plurality of Personally Identifiable Information (Pll) elements, the apparatus comprising: a processor configured to: identify within the data at least one Pll element; assign, to each identified Pll element, a permission level from a plurality of permission levels for defining access rights to Pll elements; sanitise the data so as to hide each identified Pll element, thereby to generate sanitised data; receive a request from a user to retrieve the data, the user having assigned to them a permission level from the plurality of permission levels; and in response to said request, reverse the sanitising of the data for only the Pll element/s to which the user has permission to access in dependence on the permission levels assigned to the user and to each identified Pll element, thereby to generate un-sanitised data; and output the un sanitised data to the user.

Preferably, the apparatus is further configured to generate the data. Alternatively, the apparatus may receive the data from another device.

The invention includes any novel aspects described and/or illustrated herein. The invention also extends to methods and/or apparatus substantially as herein described and/or as illustrated with reference to the accompanying drawings. The invention is also provided as a computer program and/or a computer program product for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein, and a computer-readable medium storing thereon a program for carrying out any of the methods and/or for embodying any of the apparatus features described herein. Features described as being implemented in hardware may alternatively be implemented in software, and vice versa.

The invention also provides a method of transmitting a signal, and a computer product having an operating system that supports a computer program for performing any of the methods described herein and/or for embodying any of the apparatus features described herein. Any apparatus feature may also be provided as a corresponding step of a method, and vice versa. As used herein, means plus function features may alternatively be expressed in terms of their corresponding structure, for example as a suitably-programmed processor.

Any feature in one aspect of the invention may be applied, in any appropriate combination, to other aspects of the invention. Any, some and/or all features in one aspect can be applied to any, some and/or all features in any other aspect, in any appropriate combination. Particular combinations of the various features described and defined in any aspects of the invention can be implemented and/or supplied and/or used independently.

As used throughout, the word 'or' can be interpreted in the exclusive and/or inclusive sense, unless otherwise specified.

The invention extends to a method, an apparatus and a system as described herein and/or substantially as illustrated with reference to the accompanying drawings. The present invention is now described, purely by way of example, with reference to the accompanying diagrammatic drawings, in which:

Figure 1 shows a system for sanitising data;

Figure 2 shows a process for sanitising data; and

Figures 3 show representations of data that is subject to the process for sanitising data.

Specific Description

Figure 1 is a schematic representation of a system 100 for sanitising data, in which the system comprises: data 110; a Sanitising Unit (SU) 120; and a plurality of users 130.

The data 110 is in the form of media (i.e. image, video and/or audio) and/or text data that comprises Personally Identifiable Information (PI I) elements.

Pll elements are items of information that may allow (themselves or in combination with other information) identification and/or tracking of an individual. Accordingly, to help maintain the privacy of an individual, distribution of Pll elements should be limited to only authorised parties. An authorised party may be, for example, a/an : medical practitioner; care worker; law enforcement worker; and/or individual that a Pll element identifies.

There are available to be different types of Pll elements. For example, a Pll element is available to be in the form of: personal records, including medical, employment and criminal; information as to personal characteristics and appearance, including a face, build, gender, tattoos or other distinguishing features; biometric information, including fingerprints, genetic information, voice information; assigned identifiers, including a name and an ID ( e.g . as part of social security, vehicle registration, passport and national insurance); contact details, including an address, email and telephone number; and/or location information, such as current or future location of an individual.

The SU 120 comprises a: permissions module 140; Pll identifier 150; processor 160; Pll store 170; and sanitised data store 180.

By means of these components, the SU 120 is configured to receive the data 110, sanitise (or anonymise) the data, and then selectively to un-sanitise (or de-anonymise) only Pll elements that a user is authorised to access according to process 200, as shown in, and described with reference to, Figure 2.

Figures 3 show exemplary representations of data processed according to process 200, and in particular: an original (pre-sanitised) item of data 110 in Figure 3a; a sanitised version 310 of the data in Figure 3b; and an un-sanitised version 320 of the data (for a given one of the users 130) in Figure 3c.

Process 200 commences with a step 210 of receiving the data 110 at the SU 120. As exemplarily shown in Figures 3, the data 110 is in the form of an image (which is available to form part of a video) that comprises Pll elements 330 in the form of faces of individuals.

At a next step 220, the SU 120 performs, by means of the Pll identifier 150, identification of the Pll elements 330 within the data 110. Such identification is available to be performed automatically, for example by detecting the presence of information in a predefined field (such as text in a form), and/or by using pattern detection or machine vision (in particular face detection and Optical Character Recognition) in the case of image or video data.

Having identified Pll elements 330 within the data 110, the Pll identifier 150 assigns metadata to each Pll element. The metadata includes a unique identifier and a permission level. The unique identifier is used by the SU to uniquely identify each Pll element. The permission level is a grading, according to a hierarchy of grades, for selectively restricting access to Pll, in which corresponding access rights are required by a user so as that they may access a given Pll element. More sensitive Pll elements are attributed a higher permission level (so that, generally, fewer users may have access) than less sensitive Pll elements. For example, with reference to Figures 3, the data 110 shows Pll elements 330 in the form of faces of a minor 330-1 and of an adult 330-2, the former may be considered more sensitive than the latter, and so the Pll identifier 180 allocates to data representing the face of the minor 330-1 a higher permission level than the data representing the face of the adult 330-2. The permission level of a given Pll element is stored in the permissions module 140 in association with the unique identifier of said Pll element.

The Pll elements (and specifically the underlying data that forms the Pll elements) are identified to the processor 160 by the Pll identifier 150. Each Pll element 330 is then individually stored in the Pll store 170, which is a secure form of storage. Each Pll element is also available to be encrypted.

The processor 160 subsequently manipulates the data 110 so as to sanitise the Pll elements, for example by performing: excision; redaction and/or ‘filleting’; masking and/or replacement; and/or filtering (such as pixelization or high-pass filtering).

By manipulating the data so as to sanitise the Pll elements, the processor 160 thereby generates a version of the data in which all identified Pll elements are sanitised (also referred to as “fully sanitised data”) 310. Figure 3b shows an example of such fully sanitised data 310, in which the Pll elements 330 (i.e. both faces) have been removed 340.

The fully sanitised data is assigned, by the processor 160, a unique identifier. The fully sanitised data is then stored in the sanitised data store 180. The unique identifiers for the fully sanitised data and for the Pll elements are then associated with one another and said association is stored in the SU 120.

At a next step 240, one of the users 130 submits a request to the SU 120 to access the data 110. To submit a request, the users 130 are provided with access credentials for accessing the SU 120, thereby to help verify their identities. Furthermore, the users 130 are pre-assigned access rights for denoting which permission level(s) of Pll element the user is authorised to access; these access rights are stored in the permissions module 140. For example, a first user 130-1 is assigned access rights that authorises them to access Pll elements having a higher permission level than a second user 130-2, who is assigned access rights that authorises them to access Pll elements having a lower permission level.

In response to the request, the processor 160 retrieves from the permissions module 140 access rights for the user from whom the request was received. The processor 160 then retrieves the fully sanitised data 310 corresponding to the requested data and proceeds to identify the Pll elements 330 associated with the fully sanitised data 310 by retrieving the stored association between the unique identifiers of the fully sanitised data and the Pll elements. Having identified the Pll elements, the processor 160 retrieves the permission levels for each Pll element.

Next, the processor 160 compares the access rights of the user with the permissions levels of each of the Pll elements 330, thereby identifying the Pll elements that the user is authorised to access.

Subsequently, the processor 160 retrieves (and decrypts, as appropriate) from the Pll store 170 only the Pll elements that the user is authorised to access; these Pll elements are subsequently reinstated into the fully sanitised data 310, thereby reversing sanitisation (i.e. “ un-sanitising ”) of only Pll elements that the user has sufficient access rights to access and generating the un sanitised version 320 of the data for the user. The un-sanitised version of the data is then output to the user.

With reference to Figures 3, and by way of example, the first user 130-1 is available to access the faces of the adult 330-2 and of the minor 330-1 , whereas the second user 130-2 may only access the face of the adult 330-2. A third user 130-3 may have access rights that denies them access to any Pll element, thereby only being able to retrieve the fully sanitised data 310. As such, the image of Figure 3c is available to be output to the second user, whereas the first user is available to retrieve an image corresponding to that of Figure 3a.

Alternatives and Modifications

In one example, the data 110 is deleted upon storing the Pll elements in the Pll store 170 and storing the fully sanitised data 310 in the sanitised data store 180. In this way, no complete version of the data 110 is stored, and a process, such as per process 200, is required in order to reconstruct the data 110, thereby helping to reduce the risk of unauthorised access of Pll elements.

The data 110 is available to be generated by a data capture system or data storage system (not shown in the figures). For example, the data capture device is available to be in the form of a CCTV system (in which case the data is in the form of video, images and/or audio) and the data storage system is a database or a file system. In one example, the SU 120 (or parts thereof, including at least the processor 160 and PI I store 170) forms part of the data capture system from which the data is generated. In this way, sanitising of the data 110 is available to be performed locally at the data capture system, therefore avoiding wide area communication of non-sanitised data, in turn helping to reduce the risk of unauthorised access of PI I elements.

In one example, the metadata associated with each PI I identifier is further available to include: positional information as to where the PI I element is located in the data ( e.g . pixel co-ordinates in relation to an image); and/or temporal information as to where the Pll element is arranged in the data {e.g. a timestamp, duration and/or frame number in relation to video and audio). This metadata is available to be used by the processor 160 in order to assist reconstruction of the un sanitised data 320 at step 270 (where multiple Pll elements are associated with corresponding fully sanitised data).

In one example, the Pll store 170 and the sanitised data store 180 are in the form of distributed remote storage (e.g. Cloud storage) systems.

In one example, and in particular where data compression is used, at least step 230 is performed on raw data, such that Pll elements may be extracted from the data 110 and the extracted Pll elements and the sanitised data are entirely readable independently of one another.

Although reference has been made to Pll, it will be appreciated that the aforementioned may be used instead, or additionally, to sanitise any form of confidential or sensitive information.

Where the Pll elements are encrypted for storage in the Pll store 170, an encryption key used for such encryption is stored in a public or private blockchain.

Each feature disclosed herein, and (where appropriate) as part of the claims and drawings may be provided independently or in any appropriate combination.

Any reference numerals appearing in the claims are for illustration only and shall not limit the scope of the claims.