Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SYSTEM FOR UNIFYING DE-IDENTIFIED DATA FROM MULTIPLE SOURCES
Document Type and Number:
WIPO Patent Application WO/2023/096870
Kind Code:
A1
Abstract:
A method and a system are disclosed for generating a global identifier for linking or unifying a plurality of de-identified customer data received from multiple source environments. The plurality of customer data is de-identified based on a master salt and a master token is generated. The master token is encrypted using a source-encryption key to generate a source token. The source token is also encrypted using a target-encryption key to generate a transfer token. At a central environment or a central storage unit, the transfer token is decrypted and the source token is obtained. Thereafter, source token is decrypted to obtain the master token. At the central storage unit, the master token is hashed with a target salt to generate the global identifier which is subsequently used to unify the plurality of de-identified customer data.

Inventors:
KUMAR PRASHANT (IN)
JAIN VARUN (IN)
SARAN MRIDUL (IN)
Application Number:
PCT/US2022/050616
Publication Date:
June 01, 2023
Filing Date:
November 21, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INNOVACCER INC (US)
International Classes:
G06F21/10; G06Q20/06; H04L9/08; H04L9/14; H04W12/02; H04L9/30
Foreign References:
US20200074461A12020-03-05
US20160156611A12016-06-02
US20080147554A12008-06-19
Attorney, Agent or Firm:
LANDAU, Daniel H. et al. (US)
Download PDF:
Claims:
CLAIMS

What claimed is: . A method for unifying a plurality of de-identified customer data from one or more source storage units, the method comprising: accessing, via the one or more source storage units over a communication network, a plurality of customer data; configuring a source processing unit, the source processing unit being in communication to the one or more source storage units and a source server, the source processing unit configured for: de-identifying the plurality of customer data based on a master salt to generate a master token; encrypting the master token using a source-encryption key to generate a source token; encrypting the source token using a target-encryption key to generate a transfer token; and allowing an authorized access of the transfer token via a secured channel at a central storage unit; and configuring a central processing unit, the central processing unit being in communication to the central storage unit and a central server, the central processing unit configured for: decrypting the transfer token using a target decryption key to obtain the source token; decrypting the source token using a source decryption key to obtain the master token; hashing the master token with a target salt to generate a global identifier; and unifying the plurality of de-identified customer data by using the global identifier. The method as claimed in claim 1, wherein each of the plurality of customer data is associated with a corresponding identifier containing person identity information (PII) and non-PII information of respective customers. The method as claimed in claim 2, wherein the master salt generates same master token at the one or more source storage units for any of the corresponding identifiers. The method as claimed in claims 1, 2 or 3, wherein at least one of the master salt or target salt is a randomly generated multi-digit alphanumeric code. The method as claimed in claims 1, 2, 3, or 4, wherein the source-encryption key is a private key to encrypt the master token. The method as claimed in claims 1, 2, 3, 4, or 5, wherein the target encryption key is a public key to encrypt the source token. The method as claimed in claims 1, 2, 3, 4, 5, or 6, wherein the source token is used by an authorized user to link respective customer data by accessing the source decryption key. A system for unifying a plurality of de-identified customer data from one or more source storage units, the system comprising: one or more source storage units 104 storing a plurality of customer data; a source processing unit connected in communication to the one or more source storage units and a source server in a communication network, the source processing unit configured for: de-identifying the plurality of customer data based on a master salt to generate a master token; encrypting the master token using a source-encryption key to generate a source token; encrypting the source token using a target-encryption key to generate a transfer token; and allowing an authorized access of the transfer token via a secured channel at a central storage unit; and a central processing unit connected in communication to the central storage unit and a central server in the communication network, the central processing unit is configured for: decrypting the transfer token using a target decryption key to obtain the source token; decrypting the source token using a source decryption key to obtain the master token; hashing the master token with a target salt to generate a global identifier; and unifying the plurality of de-identified customer data by using the global identifier. The system as claimed in claim 8, wherein each of the plurality of customer data is associated with a corresponding identifier containing person identity information (PII) and non-PII information of respective customers. The system as claimed in claim 9, wherein the master salt generates same master token at the one or more source storage units for any of the corresponding identifiers. The system as claimed in claims 8, 9, or 10, wherein the master salt is a randomly generated multi-digit alphanumeric code. The system as claimed in claims 8, 9, 10, or 11, wherein the target salt is a randomly generated multi-digit alphanumeric code. The system as claimed in claims 8, 9, 10, 11, or 12, wherein the source-encryption key is a private key to encrypt the master token. The system as claimed in claims 8, 9, 10, 11, 12, or 13, wherein the target encryption key is a public key to encrypt the source token. The system as claimed in claims 8, 9, 10, 11, 12, 13, or 14, wherein the source token is used by an authorized user to link respective customer data by accessing the source decryption key.

18

Description:
METHOD AND SYSTEM FOR UNIFYING DE-IDENTIFIED DATA FROM

MULTIPLE SOURCES

[0001] The present subject matter relates to data analytics, and more particularly to methods and systems for providing a global identifier for unifying de-identified data received from multiple sources.

[0002] Various business sectors perform data analytics for the purpose of improvement and better management. The data being generated, transacted, or stored during a business process also carry sensitive information. In order to perform data analytics, it is often required to move the data from a source environment to one target environment or a central environment. In other words, the data is aggregated to a central environment for the purpose of analysis on a larger set of data. While moving the data to a central environment, the sensitive information of the data must be protected from getting mishandled and any unauthorized access of data must also be avoided. In some sectors, government regulatory organizations are set up to provide strict norms and standards to protect identity information of the customers associated with a business sector or organization. For example, analytics on healthcare data help to assess problems in a particular treatment process or to identify steps for improvisation, and accordingly the analytics on healthcare data must be performed only when sensitive information, such as Personal Identifiable Information (PII) or Personal Health Information (PHI) of a patient is protected when moving the data between the environments as per the standards and protocols required by medical regulatory authorities.

[0003] The personal identity information of customers may include their name, unique identification number issued to health care providers (for example, National Provider Identifier or NPI), identification number, account number etcetera. Such identity information is generally associated with any data as unique identifiers which remains unique at the different data sources located at different geographical locations. The various identifiers associated with respective data are de-identified before carrying out any data processing or analytics that is external to that organization or environment. The de-identification process has to be carried out to hide the personal identity information of a customer, user or organization.

[0004] The de-identification process is an encryption process that may be performed by using hash keys at the customer end before the data is moved to another environment or central environment. Although the de-identification process encrypts the data along with its associated identifier, a de-identified data at the customer end or at the source environment is often prone to dictionary attacks by unauthorized entities. Further, if the salt used to hash (or encryption keys used to encrypt) gets leaked, the entire de- identified data will be at risk and pose a single point of failure. The unauthorized entities may attempt to obtain the key to decrypt the customer information. Further, for performing analytics, data from multiple sources are de-identified and merged together. Some of the data coming from different sources may have common identifiers which may be hidden or de-identified when moved to a target environment. Therefore, at the target environment, it becomes difficult to identify and link the data which have same identifiers.

[0005] In view of the above limitations, a heretofore unaddressed need exists in the industry to address the aforementioned deficiencies and inadequacies.

[0006] In order to provide a holistic solution to the above-mentioned limitations, it is necessary to unify de-identified data received from multiple sources.

[0007] An object of the present subject matter is to facilitate de-identification of identifiers associated with multiple data across multiple sources.

[0008] An object of the present subject matter is to de-identify the identifiers such that the universal identifiers associated with multiple data, remain unique after de- identification.

[0009] Another object of the present subject matter is to transfer the de-identified data to a central environment through a secure channel.

[0010] Yet another object of the present subject matter is to prevent unauthorized linking of de-identified data. [0011] Yet another object of the present subject matter is to facilitate a ‘No single point of failure’ .

[0012] Yet another object of the present subject matter is to eliminate redundant computation during de-identification execution for making the overall solution time and space efficient.

[0013] Yet another object of the present subject matter is to enable analytics on de-identified data linked from multiple sources while still hiding the identity of sensitive data.

[0014] According to an embodiment of the present subject matter, there is provided a method for unifying a plurality of de-identified customer data from one or more source storage units. The method comprises: accessing, via the one or more source storage units over a communication network, a plurality of customer data; configuring a source processing unit, the source processing unit being in communication to the one or more source storage units and a source server, the source processing unit is configured for: de-identifying the plurality of customer data based on a master salt to generate a master token; encrypting the master token using a source-encryption key to generate a source token; encrypting the source token using a target-encryption key to generate a transfer token; allowing an authorized access of the transfer token via a secured channel at a central storage unit; configuring a central processing unit, the central processing unit being in communication to the central storage unit and a central server, the central processing unit is configured for: decrypting the transfer token using a target decryption key to obtain the source token; decrypting the source token using a source decryption key to obtain the master token; hashing the master token with a target salt to generate a global identifier; and unifying the plurality of de-identified customer data by using the global identifier.

[0015] According to an embodiment of the present subject matter, each of the plurality of customer data is associated with a corresponding identifier containing person identity information (PIT) and non-PII information of respective customers. [0016] According to another embodiment of the present subject matter, the first salt generates same master token at the one or more source storage units for any of the corresponding identifiers.

[0017] According to yet another embodiment of the present subject matter, the first salt is a randomly generated multi-digit alphanumeric code.

[0018] According to yet another embodiment of the present subject matter, the target salt is a randomly generated multi-digit alphanumeric code.

[0019] According to yet another embodiment of the present subject matter, the source-encryption key is a private key to encrypt the master token.

[0020] According to yet another embodiment of the present subject matter, the target encryption key is a public key to encrypt the source token.

[0021] According to yet another embodiment of the present subject matter, the source token is used by an authorized user to link respective customer data by accessing the source decryption key.

[0022] According to an embodiment of the present subject matter, a system for unifying a plurality of de-identified customer data from one or more source storage units is provided. The system comprises: one or more source storage units storing a plurality of customer data; a source processing unit connected in communication to the one or more source storage units and a source server in a communication network, the source processing unit is configured for: de-identifying the plurality of customer data based on a master salt to generate a master token; encrypting the master token using a sourceencryption key to generate a source token; encrypting the source token using a targetencryption key to generate a transfer token; allowing an authorized access of the transfer token via a secured channel at a central storage unit; a central processing unit connected in communication to the central storage unit and a central server in the communication network, the central processing unit is configured for: decrypting the transfer token using a target decryption key to obtain the source token; decrypting the source token using a source decryption key to obtain the master token; hashing the master token with a target salt to generate a global identifier; and unifying the plurality of de-identified customer data by using the global identifier.

[0023] The afore-mentioned objectives and additional aspects of the embodiments herein will be better understood when read in conjunction with the following description and accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. This section is intended only to introduce certain objects and aspects of the present invention, and is therefore, not intended to define key features or scope of the subject matter of the present invention.

[0024] The figures mentioned in this section are intended to disclose exemplary embodiments of the claimed system and method. Further, the components/modules and steps of a process are assigned reference numerals that are used throughout the description to indicate the respective components and steps. Other objects, features, and advantages of the present invention will be apparent from the following description when read with reference to the accompanying drawings:

[0025] Figure 1 illustrates system architecture, according to an exemplary embodiment of the present subject matter.

[0026] Figure 2 illustrates a process flow for generating various tokens by the system architecture, according to an exemplary embodiment of the present subject matter. [0027] Figure 3 illustrates a method for providing global identifiers, according to an exemplary embodiment of the present subject matter.

[0028] Figure 4 illustrates a computer environment according to an exemplary embodiment of the present subject matter.

[0029] Like reference numerals refer to like parts throughout the description of several views of the drawings.

[0030] This section is intended to provide explanation and description of various possible embodiments of the present invention. The embodiments used herein, and various features and advantageous details thereof are explained more fully with reference to non-limiting embodiments illustrated in the accompanying drawings and detailed in the following description. The examples used herein are intended only to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable the person skilled in the art to practice the embodiments used herein. Also, the examples/embodiments described herein should not be construed as limiting the scope of the embodiments herein. Corresponding reference numerals indicate corresponding parts throughout the drawings. Use of the term “exemplary” means illustrative or by way of example only, and any reference herein to “the invention” is not intended to restrict or limit the invention to exact features or steps of any one or more of the exemplary embodiments disclosed in the present specification. References to “exemplary embodiment,” “one embodiment,” “an embodiment,” “various embodiments,” and the like, may indicate that the embodiment(s) of the invention so described may include a particular feature, structure, or characteristic, but not every embodiment necessarily includes the particular feature, structure, or characteristic. Further, repeated use of the phrase “in one embodiment,” or “in an exemplary embodiment,” do not necessarily refer to the same embodiment, although they may.

[0031] The specification may refer to “an”, “one”, “different” or “some” embodiment(s) in several locations. This does not necessarily imply that each such reference is to the same embodiment(s), or that the feature only applies to a single embodiment. Single features of different embodiments may also be combined to provide other embodiments.

[0032] Embodiments of the present subject matter discloses an optimized, deterministic, and secure mechanism to link de-identified universal attributes. Accordingly, a method and a system are provided for generating a global identifier for linking a plurality of de-identified customer data received from multiple source environments or customer environments. A plurality of customer data is de-identified based on a master salt and thereafter a master token is generated. The master token is used to generate a source token which is encrypted to generate a transfer token. At a central environment, the transfer token is decrypted to obtain the source token. Thereafter, source token is decrypted to obtain the master token. At the central server, the master token is hashed with a target salt to generate the global identifier. At the central environment, the global identifier facilitates to link the de-identified data coming from the multiple source environments.

[0033] As used herein, ‘processing unit’ is an intelligent device or module, that is capable of processing digital logics and also possesses analytical skills for analyzing and processing various data or information, according to the embodiments of the present subject matter.

[0034] As used herein, ‘storage unit’ refers to a local or remote memory device; docket systems; databases; each capable to store information including, data files, Person Identification Information (PII), Personal/Protected Health Information (PHI), metadata, transaction details, location details, GPS data, and encryption protocol information. In an embodiment, the storage unit may be a database server, a cloud storage, a remote database, a local database.

[0035] As used herein, ‘user device’ is a smart electronic device capable of communicating with various other electronic devices and applications via one or more communication networks. Examples of said user device include, but not limited to, a wireless communication device, a smart phone, a tablet, a desktop, a laptop, etcetera. The user device comprises: an input unit to receive one or more input data; an operating system to enable the user device to operate; a processor to process various data and information; a memory unit to store initial data, intermediary data and final data; and an output unit having a graphical user interface (GUI).

[0036] As used herein, ‘module’ or ‘unit’ refers to a device, a system, a hardware, a computer application configured to execute specific functions or instructions according to the embodiments of the present invention. The module or unit may include a single device or multiple devices configured to perform specific functions according to the present invention disclosed herein.

[0037] As used herein, ‘communication network’ includes a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a virtual private network (VPN), an enterprise private network (EPN), the Internet, and a global area network (GAN).

[0038] Terms such as ‘connect’, ‘integrate’, ‘configure’, and other similar terms include a physical connection, a wireless connection, a logical connection or a combination of such connections including electrical, optical, RF, infrared, or other transmission media, and include configuration of software applications to execute computer program instructions, as specific to the presently disclosed embodiments, or as may be obvious to a person skilled in the art.

[0039] Terms such as ‘send’, ‘transfer’, ‘transmit’ and ‘receive’, ‘collect’, ‘obtain’, ‘access’ and other similar terms refers to transmission of data between various modules and units via wired or wireless connections across a communication network.

[0040] Figure 1 illustrates architecture of a system 100 for unifying a plurality of de-identified customer data from one or more source storage units, according to an exemplary embodiment of the present subject matter. The system 100 according to the present subject matter comprises a plurality of components. For example, and by no way limiting the scope of the present subject matter, the system 100 comprises a central environment 116 receiving a plurality of customer data from multiple source environments (customer environments) 114. Each of the source environments comprises a source server 102, at least one source storage unit 104, and at least one source processing unit 106. The central environment 116 comprises a central storage unit 108, a central processing unit 112, and a central server 110. Each of the source server 102 and the central server 110 comprises a separate key management unit (not shown in the Figure) for providing necessary encryption and decryption keys.

[0041] The at least one Source Storage Unit 104, the at least one source processing unit 106, the source server 102 constitute a source data environment, whereas the central storage unit 108, the central processing unit 112, and the central server 110 constitute a central environment or central data environment 116. The central data environment 116 may also be termed as target environment or global data lake. The source data environment 114 may include the at least one source storage unit 104 communicatively connected to at least one source processing unit 106. The source server 102 may also be communicatively connected to the at least one source storage unit 104 and the at least one source processing unit 106 for enabling various encryptions and data transactions within the source data environment 114. Similarly, the central data environment 116 may include the at least one central storage unit 108 communicatively connected to the at least one central processing unit 112. The central server 110 may also be communicatively connected to the at least one central storage unit 108 and the at least one central processing unit 112 for enabling various decryptions and data transactions within the central data environment 114 The source data environment 114 and the central data environment 116 may be communicatively connected to each other via a communication network.

[0042] The source storage unit 104 may include one or more databases for accommodating data of one or more organizations, companies, institutions and individuals, hereinafter termed as ‘customer(s)’. The source database contains a plurality of customer data which are required to be de-identified before being moved to the central environment in order to perform analytics on a larger set of data, for example, to calculate benchmarks (cost, quality, risk, utilisation) aggregating data from multiple sources (customers) to give a more realistic picture. The source database may encompass multiple source data environments or customer environments pertaining to various sectors, such as the healthcare sector, which need to comply with strict laws around data sharing and privacy. In the healthcare sector, large longitudinal datasets may be required to be built for training machine learning and deep learning models in order to be able to predict better healthcare outcomes. Pharma and life science companies also need to run clinical trials on de-identified data Accordingly, the embodiments of the present subject matter, provides the system 100 that uses a combination of masking, hashing, and encryption to perform a de-identification process. The de-identification process hides or de-identifies its identity information as may be required by the authorized regulatory organizations. The de-identified customer data is then transferred to the target environment through a secure channel for protecting the data in third. Further, each of the plurality of customer data is associated with an identifier. An identifier contains identity information (ID) of respective customers. The identifiers associated with any customer data may therefore include person identifiable information (PII) containing person/organisation entities with other non PII entities. The de-identification process is carried out to hide or mask the associated identifiers of the plurality of data by encrypting the data. Also, there may be different customer data having same identifier, and these different customer data may be residing at different databases at different geographical locations. For example, ‘Customer Data 1’ with ‘Identifier 1’, may be stored at location A. Another data, ‘Customer Data 2’ associated with same identifier ‘Identifier 1 may be stored at location B. The Customer Data 1 and Customer Data 2 may be de-identified and moved to a target location. According to the embodiments of present subject matter, such data having same identifiers are de-identified by maintaining uniqueness of the identifiers across all the sources even after de-identification. Therefore, both ‘Customer Data 1’ and ‘Customer Data 2’ that are associated with ‘Identifier 1’, will be identified as data having identical identifiers that can be linked together at the target environment. Thus, at the target location, linking process of customer data having universal attributes may be facilitated according to the embodiments disclosed herein. Any un-authorized linking is also prevented at the target location. Further, a caching mechanism may be used to eliminate any redundant computation during de-identification execution, thereby making the overall solution time and space efficient. According to the embodiment of the present subject matter, the de-identified data is hashed and encrypted at multiple sources by multiple parties. Therefore, there is no ‘single point of failure’ and the de-identified data will be available at other points in case any of the points fails.

[0043] Figure 2 illustrates process flow 200 for generating various tokens by the system architecture, according to an exemplary embodiment of the present subject matter. With reference to FIGS. 1-2, the source data environment 114 includes the at least one source storage unit 104 and the at least one source processing unit 106. The at least one source storage unit 104 stores plurality of customer data and associated identifiers. The plurality of customer data is accessed, via the one or more source storage units 104 over the communication network. The source processing unit 106 is connected to the one or more source storage units 104 to access the plurality of customer data. The source server 102 includes the first key management unit to provide source hardware salt for hash functions and primary and secondary encryption keys. The source processing unit 106 is configured to de-identify the plurality of customer data based on a master salt to generate a master token. The master token is a primary token or a master token used for deidentifying the customer data received from multiple sources or the source environments. At the source data environment 114, the universal identifiers of multiple customer data are de-identified by hashing universal attributes using the master salt. Thus, the master token is generated at the source data environment 114.

[0044] As shown in the Figure, for a first customer data ‘Customer 1’ having an associated identifier ‘ 12131919131’, the master token (MM) is encrypted using a sourceencryption key (Ka) to generate a source token (SI). The source token is a site token. The source token (SI) or the site token is encrypted using a target-encryption key (Pg) to generate a transfer token or a transfer token (Tl). An authorized access of the transfer token (Tl) via a secured channel at the central storage unit 108, also termed as global data lake. The central processing unit 112 is connected in communication to the central storage unit 108 and the central server 110. The central server 110 includes the second key management unit to provide target hardware salt for hash functions and primary and secondary decryption keys. Thus, each server, whether at customer level or at the central environment, will have their own key management units. The central processing unit 112 is configured for decrypting the transfer token (Tl) using a target decryption key (Kg) to obtain the source token (SI). The source token (SI) is decrypted using a source decryption key (Pa) to obtain the master token (MM). The master token (MM) is hashed with a target salt (also termed as Global Data Lake Salt or GDL Salt) to generate a global identifier (GG).

[0045] Similarly, for second customer data ‘Customer Data 2’, which is having the same identifier ‘ 12131919131’, de-identification process is carried at the source data environment 114. The source processing unit 106 encrypts the master token (MM) is using a source-encryption key (Kb) to generate a source token (S2). The source token (S2) is encrypted using a target-encryption key (Pg) to generate a transfer token (T2). An authorized access of the transfer token (T2) via a secured channel at the central storage unit 108 or the global data lake. The central processing unit 112 is configured for decrypting the transfer token (T2) using a target decryption key (Kg) to obtain the source token (S2). The source token (S2) is decrypted using a source decryption key (Pb) to obtain the master token (MM). The master token (MM) is hashed with a target salt (GDL Salt) to generate the global identifier (GG).

[0046] Once the global identifier (GG) is generated, the plurality of de-identified customer data can be linked or unified by using the global identifier. According to the embodiments of the present subject matter, Ka, Kb, Pa, and Pb are the second-specific keys. Kg and Pg are global data lake keys. K denotes private key and remains in its environment. P denotes public key and can be shared with authorised environments. Each customer uses their own private keys. The master salt is a random generated code and is a single universal common salt that is deployed with de-identification service. GDL salt is kept in global data lake environment and also not present in the central storage. GDL salt is used in the central environment and is kept in the central environment in the key management unit. Key management unit is different from central storage, which stores the de-identified data. Further, the outputs of the intermediate steps are not persisted or stored. Only the final output (i.e., transfer token in case of customers and the global identifier in case of global data lake) is persisted in any form and stored.

[0047] The source token ensures that only authorized entities with whom the decryption key has been shared by the customer, are allowed to link the de-identified data. The transfer token ensures that the data intended for a target location can only be linked in the target location. For example, in a scenario, Party A agreed with Party B and Party E to share its de-identified data. Party E’s agreement expires while Party B’s still stays in force. Party E is not getting any new data from A. However, if Party E somehow manages to get Party A’s data while it is getting transmitted to Party B, then it would not be able to link it with the source. The transfer token thus facilitates in providing security in data transitions.

[0048] According to the embodiments of the present subject matter, the master salt generates same master token at the one or more source storage units 104 for any of the corresponding identifiers. The master salt and the target salt are randomly generated multi-digit alphanumeric codes. Further, the source-encryption key is a private key to encrypt the master token and the target encryption key is a public key to encrypt the source token. As described above, the source token is used by an authorized user or entity to link respective customer data by accessing the source decryption key.

[0049] Figure 3 illustrates a method for unifying the plurality of de-identified customer data from one or more source storage units 104, according to an exemplary embodiment of the present subject matter.

[0050] At step 302, the plurality of customer data is abstracted using a master salt and the resulting data is de-identified using a hashing algorithm to generate the master token. The plurality of customer data is de-identified based on a master salt to generate a master token. One or more source storage units 104 may store the plurality of customer data. The plurality of customer data may be accessed via the one or more source storage units 104 over the communication network. The master salt generates same master token at the one or more source storage units 104 for any of the corresponding identifiers. The master salt is a randomly generated multi-digit alphanumeric code.

[0051] At step 304, the master token is encrypted by using a source-encryption key to generate a source token at the source environment. The source environment includes the source processing unit 106 in communication to the one or more source storage units 104 and the source server 102 that provides a key management unit to encrypt the data at the source data environment 114. The source processing unit 106 is configured for encrypting the master token and the source token. Encryption of the source token takes place at the source data environment 114 by using a target-encryption key to generate the transfer token. [0052] At step 304, an authorized access of the transfer token is allowed via a secured channel at a central storage unit 108 to merge all the de-identified customer data at one place, i.e., into the central storage unit 108. The central processing unit 112 is configured to decrypting the transfer token at the central data environment 116 or the global data lake.

[0053] At step 306, the transit key is decrypted by using a target decryption key to obtain the source token. Thereafter, the source token is decrypted using a source decryption key to obtain the master token.

[0054] At step 306, the master token is hashed with a target salt to generate a global identifier. The global identifier is then used to unify the plurality of de-identified customer data at the central data environment 116. The target salt is a randomly generated multi-digit alphanumeric code.

[0055] Figure 4 illustrates computer environment according to an embodiment of the present subject matter. The system is implemented in a computer environment 400 comprising a processor unit connected to a memory 404. The computer environment may have additional components including one or more communication channels one or more input devices, and one or more output devices. The processor unit executes program instructions and may include a computer processor, a microprocessor, a micro-controller, and other devices or arrangements of devices that are capable of implementing the steps that constitute the method of the present subject matter. The memory 404 stores an operating system, program instructions, data transaction information, metadata for implementing the system and the method according to the present subject matter. The memory 404 also participates in providing an environment for operation of the deidentification and linking process steps in a non-volatile storage medium.

[0056] The input unit 408 may include, but not limited to, a keyboard, mouse, pen, a voice input device, a scanning device, or any another device that is capable of providing input to the computer system. In an embodiment of the present subject matter, the input unit 408 may be a sound card or similar device that accepts audio input in analog or digital form. The output unit 406 may include, but not limited to, a user interface on CRT or LCD screen, printer, speaker, CD/DVD writer, or any other device that provides output from the computer system.

[0057] It will be understood by those skilled in the art that the figures are only a representation of the structural components and process steps that are deployed to provide an environment for the solution of the present subject matter discussed above and does not constitute any limitation. The specific components and method steps may include various other combinations and arrangements than those shown in the figures.

[0058] The term exemplary is used herein to mean serving as an example. Any embodiment or implementation described as exemplary is not necessarily to be construed as preferred or advantageous over other embodiments or implementations. Further, the use of terms such as including, comprising, having, containing and variations thereof, is meant to encompass the items/components/process listed thereafter and equivalents thereof as well as additional items/components/process.

[0059] Although the subject matter is described in language specific to structural features and/or acts, it is to be understood that the subject matter defined in the claims is not necessarily limited to the specific features or process as described above. In fact, the specific features and acts described above are disclosed as mere examples of implementing the claims and other equivalent features and processes which are intended to be within the scope of the claims.