Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS AND SYSTEMS FOR CLUSTERING A PLURALITY OF USERS
Document Type and Number:
WIPO Patent Application WO/2024/076267
Kind Code:
A1
Abstract:
A method (1000) for clustering a plurality of users (5), comprising obtaining (110) a data set (7) for each user operating in a framework (30), calculating (120) a correlation coefficient (9) between the data sets of each user, thereby obtaining a correlation coefficient vector (3) for each user, encrypting (130) the correlation coefficient vectors using a homomorphic encryption scheme, thereby obtaining a homomorphically encrypted correlation coefficient vector (3') for each user, uploading (140) the homomorphically encrypted correlation coefficient vectors from the client to a server (20), clustering (150) the plurality of users into a plurality of clusters (13) based on the respective homomorphically encrypted correlation coefficient vector of each user using a CURE algorithm, and the respective homomorphically encrypted correlation coefficient vector(s) of at least one user of each cluster of the plurality of clusters is assigned as a representative point of said cluster.

Inventors:
CHENG ZHUAN (CN)
GAN SHUYU (CN)
WANG RUIDA (CN)
DUAN JINQIAO (CN)
GAO TING (CN)
JIAO YIN (SE)
LIU YANG (CN)
ZENG BIYUN (CN)
LI PENGBO (CN)
Application Number:
PCT/SE2022/050898
Publication Date:
April 11, 2024
Filing Date:
October 06, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
PRIVASEA AB (SE)
International Classes:
H04L9/00
Other References:
ANGELA JSCHKE ET AL: "Unsupervised Machine Learning on Encrypted Data", vol. 20180510:202908, 3 May 2018 (2018-05-03), pages 1 - 30, XP061025739, Retrieved from the Internet [retrieved on 20180503]
ALABDULATIF ABDULATIF ET AL: "Towards secure big data analytic for cloud-enabled applications with fully homomorphic encryption", JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, ELSEVIER, AMSTERDAM, NL, vol. 137, 5 November 2019 (2019-11-05), pages 192 - 204, XP085994929, ISSN: 0743-7315, [retrieved on 20191105], DOI: 10.1016/J.JPDC.2019.10.008
WEN-JIE LU ET AL: "PEGASUS: Bridging Polynomial and Non-polynomial Evaluations in Homomorphic Encryption", vol. 20201227:131823, 25 December 2020 (2020-12-25), pages 1 - 22, XP061041876, Retrieved from the Internet [retrieved on 20201225]
LI YU ET AL: "Privacy-Preserving Clustering Using Representatives over Arbitrarily Partitioned Data", INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, vol. 4, no. 9, 1 January 2013 (2013-01-01), XP093046190, ISSN: 2158-107X, Retrieved from the Internet DOI: 10.14569/IJACSA.2013.040932
ANONYMOUS: "Pearson correlation coefficient", WIKIPEDIA, 20 September 2022 (2022-09-20), XP093046183, Retrieved from the Internet [retrieved on 20230511]
ANONYMOUS: "CURE algorithm", WIKIPEDIA, 29 April 2022 (2022-04-29), XP093046328, Retrieved from the Internet [retrieved on 20230511]
BODDETI V N: "Secure Face Matching Using Fully Homomorphic Encryption", IEEE, 2018.VISHNU N B. SECURE FACE MATCHING USING FULLY HOMOMORPHIC ENCRYPTION[C]// 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON BIOMETRICS THEORY, APPLICATIONS AND SYSTEMS, 1 October 2018 (2018-10-01)
BRAKERSKI, ZVIKAGENTRY, CRAIGVAIKUNTANATHAN, VINOD: "Fully Homomorphic Encryption without Bootstrapping", ELECTRONIC COLLOQUIUM ON COMPUTATIONAL COMPLEXITY, 2011
ZVIKA BRAKERSKIVINOD VAIKUNTANATHAN: "Fully homomorphic encryption from ring LWE and security for key dependent messages", CRYPTO, vol. 6841, 2011, pages 501
ZVIKA BRAKERSKIVINOD VAIKUNTANATHAN: "Efficient fully homomorphic encryption from (standard) LWE", OSTROVSKY, pages 97 - 106
Attorney, Agent or Firm:
AWA SWEDEN AB (SE)
Download PDF:
Claims:
CLAIMS

1. A method (1000) for clustering a plurality of users (5), the method comprising the steps of: obtaining (110), at a client (10), a data set (7) for each user (5) of a plurality of users (5) operating in a framework (30); calculating (120), at the client, a correlation coefficient (9) between the data sets (7) of each user (5) of the plurality of users, thereby obtaining a correlation coefficient vector (3) for each user of the plurality of users; encrypting (130), at the client, the correlation coefficient vectors (3) using a homomorphic encryption scheme, thereby obtaining a homomorphically encrypted correlation coefficient vector (3') for each user (5); uploading (140) the homomorphically encrypted correlation coefficient vectors (3') from the client (10) to a server (20); clustering (150), at the server (20), the plurality of users (5) into a plurality of clusters (13) based on the respective homomorphically encrypted correlation coefficient vector (3') of each user (5) of the plurality of users using a Clustering Using REpresentatives, CURE, algorithm, and wherein the respective homomorphically encrypted correlation coefficient vector(s) (3') of at least one user (5) of each cluster (13) of the plurality of clusters is assigned as a representative point of said cluster (13).

2. The method (1000) according to claim 1, further comprising the step of: recommending (160) data to users (5) of a cluster (13) of the plurality of clusters based on preferences of the at least one user (5) which has its homomorphically encrypted correlation coefficient vector (3') assigned as a representative point of said cluster (13).

3. The method (1000) according to claim 1 or 2, wherein the data set (7) for each user (5) comprises a first data set (71) and a second data set (72).

4. The method (1000) according to claim 3, wherein the first data set (71) comprises basic user data, and the second data set (72) comprises historical behavior data, such as web browsing time, web page dwell time, residence time, and/or user clicks.

5. The method (1000) according to any of the preceding claims, the method further comprises the step of: prior to the step of calculating (120), performing (115) preliminary clustering of the data sets (7).

6. The method (1000) according to any of the preceding claims, the method further comprises the step of: prior to the step of encrypting (130), performing (125) binary decomposition of the correlation coefficient vectors (3).

7. The method (1000) according to any of the preceding claims, wherein the homomorphic encryption scheme is the Brakerski-Gentry-Vaikuntanathan, BGV, homomorphic encryption scheme.

8. The method (1000) according to any of the preceding claims, wherein the correlation coefficient (3) is the Pearson correlation coefficient.

9. The method (1000) according to any of the preceding claims, wherein the framework (30) is an application, a program or a website.

10. A system (100) for clustering a plurality of users (5), the system (100) comprising a client (10) and a server (20), wherein the client (10) comprises first circuitry (11) configured for: obtaining (Sil) a data set (7) for each user (5) of a plurality of users (5) operating in a framework (30); calculating (S12) a correlation coefficient (3) between the data sets (7) of each user (5) of the plurality of users, thereby obtaining a correlation coefficient vector (3) for each user (5) of the plurality of users; encrypting (S13) the correlation coefficient vectors (3) using a homomorphic encryption scheme, thereby obtaining a homomorphically encrypted correlation coefficient vector (3') for each user (5); and uploading (S14) the homomorphically encrypted correlation coefficient vectors (3') from the client (10) to a server (20); and wherein the server (20) comprises second circuitry (22) configured for: clustering (S15) the plurality of users (5) into a plurality clusters (13) based on the respective homomorphically encrypted correlation coefficient vector (3') of each user (5) of the plurality of users (5), using a CURE algorithm, and wherein the respective homomorphically encrypted correlation coefficient vector(s) (3') of at least one user (5) of each cluster (13) of the plurality of clusters is assigned as a representative point of said cluster (13).

11. The system (100) according to claim 10, wherein the second circuitry (22) is further configured for: recommending data to users (5) of a cluster (9) of the plurality of clusters based on preferences of the at least one user (5) which has its homomorphically encrypted correlation coefficient vector (3') assigned as a representative point of said cluster (13).

12. The system (100) according to claim 10 or 11, wherein the first circuitry (11) is further configured for: prior to the calculating (S12), performing (S17) preliminary clustering of the data sets.

13. The system (100) according to any of claims 10 to 12, wherein the first circuitry (11) is further configured for: prior to the encrypting (S13), performing (S18) binary decomposition of the correlation coefficient vectors (3).

14. The system (100) according to any of the claims 10 to 13, wherein the homomorphic encryption scheme is the Brakerski-Gentry-Vaikuntanathan, BGV, homomorphic encryption scheme.

15. The system (100) according to any of the claims 10 to 14, wherein the correlation coefficient (3) is the Pearson's correlation coefficient.

Description:
METHODS AND SYSTEMS FOR CLUSTERING A PLURALITY OF USERS

TECHNICAL FIELD

The present disclosure generally relates to the field of clustering of users. Specifically, the present disclosure relates to the field of privacy-ensured clustering of encrypted user data.

BACKGROUND

In digital spaces, such as within platforms, frameworks, software, and/or software engines, there is a great interest in providing recommendations to users which are deemed relevant to those users. The recommendations may relate a wide selection of topics or content, such as video clips, songs, products in online stores, and/or articles, to name a few. Further, recommendations to a specific user are often made based on the behavior of other users which have behaved similar to the specific user. In order to be able to provide recommendations, behavioral data needs to be gathered and analyzed. Such method and systems often make use of collaborative filtering and/or data clustering. However, a problem with known methods and systems is that they may be inefficient, as in requiring a high amount of computational power, and/or may pose a privacy risk in that the gathered user data may be leaked. Privacy becomes especially important when the recommendations and/or the behavioral data comprises sensitive information. Further, data protection and privacy is increasingly becoming a regulatory matter, such as GDPR, thereby increasing the need of improved privacy.

It is therefore of interest to provide improved user recommendation methods which may alleviate the above-mentioned problems.

SUMMARY

An object of the present disclosure is to provide user recommendation methods and systems which may provide an increased level of security and/or privacy, and an improved accuracy of the recommendations.

According to first aspect of the present disclosure, a method for clustering a plurality of users is provided. The method according to the first aspect comprises the steps of obtaining, at a client, a data set for each user of a plurality of users operating in a framework, calculating, at the client, a correlation coefficient between the data sets of each user of the plurality of users, thereby obtaining a correlation coefficient vector for each user of the plurality of users. The method further comprises encrypting, at the client, the correlation coefficient vectors using a homomorphic encryption scheme, thereby obtaining a homomorphically encrypted correlation coefficient vector for each user, uploading the homomorphically encrypted correlation coefficient vectors from the client to a server, and clustering, at the server, the plurality of users into a plurality of clusters based on the respective homomorphically encrypted correlation coefficient vector of each user of the plurality of users using a Clustering Using REpresentatives, CURE, algorithm. The respective homomorphically encrypted correlation coefficient vector(s) of at least one user of each cluster of the plurality of clusters is assigned as a representative point of said cluster.

According to a second aspect of the present disclosure, a system for clustering a plurality of users is provided. The system comprises a client and a server. The client comprises first circuitry configured for obtaining a data set for each user of a plurality of users operating in a framework, calculating a correlation coefficient between the data sets of each user of the plurality of users, thereby obtaining a correlation coefficient vector for each user of the plurality of users. The first circuitry is further configured for encrypting the correlation coefficient vectors using a homomorphic encryption scheme, thereby obtaining a homomorphically encrypted correlation coefficient vector for each user, and uploading the homomorphically encrypted correlation coefficient vectors from the client to a server. The server comprises second circuitry configured for clustering the plurality of users into a plurality clusters based on the respective homomorphically encrypted correlation coefficient vector of each user of the plurality of users, using a CURE algorithm. The respective homomorphically encrypted correlation coefficient vector(s) of at least one user of each cluster of the plurality of clusters is assigned as a representative point of said cluster.

The present disclosure is, at least partly, based on the concept of homomorphically encrypting the user data sets. Thereby, the homomorphic encryption ensures that the calculation (when using a homomorphized function or algorithm) of the encrypted data is consistent with a corresponding calculation of the corresponding unencrypted data. Further, the homomorphic encryption ensures that the unencrypted data, i.e. the plaintext, can't be accessed by an untrusted third party. More specifically, the present disclosure, at least partly by the use of homomorphic encryption, ensures that user data is not compromised during the clustering of said user data, thereby providing greatly improved privacy.

The present disclosure is further based on the concepts of using the CURE algorithm which enables using representative points, instead of single points, to make recommendations. Thereby, the computational complexity is reduced, which reduces the time needed to compute the method, in comparison to other clustering algorithms (such as, for example, the K-mean clustering algorithm). Further, the complexity of other clustering algorithms increases dramatically with increased numbers of users, due to the use of single points, which leads to a lower efficiency when compared to the CURE algorithm. The CURE algorithm may be a CURE hierarchical algorithm, which may have an even higher efficiency when compared to a conventional CURE algorithm.

The method according to the first aspect may further comprise, and/or the second circuitry of the system of the second aspect may be further configured for, recommending data to users of a cluster of the plurality of clusters based on preferences of the at least one user which has its homomorphically encrypted correlation coefficient vector assigned as a representative point of said cluster.

The data which may be, or has been, recommended may relate to the data sets of the user which has its homomorphically encrypted correlation coefficient vector assigned as a representative point, which may be referenced to as a representative user of a cluster. For example, the data sets of a representative user may relate to content which has been visited and/or consumed by the representative user within the framework. Thereby, the data which may be, or has been, recommended may be related to the content which has been visited and/or consumed by the representative user within the framework.

Basing the recommendation of data to users on homomorphically encrypted correlation coefficient vector ensures that the privacy of the users is ensured. Thereby, the present disclosure provides an increased degree of privacy in comparison to the prior art. The data sets of the plurality of users may be arranged, or stored, as vectors. The vectors of the data sets of the plurality of users may have the same length. The data set for each user may comprise a first data set and a second data set. The first data set may comprise basic user data. The basic user data may comprise, for example, a user's age, a user's gender, a user's nationality, a user's location, and/or a type of device which the user is using. The second data set may comprise historical behavior data, such as web browsing time, web page dwell time, residence time, and/or user clicks. The data which may be, or has been, recommended may be comprised by a second data set.

The method may further comprise, and/or the first circuitry of the system of the second aspect may be further configured for, prior to the step of calculating, performing preliminary clustering of the data sets. The preliminary clustering may be performed using a clustering algorithm that is computationally cheaper and/or that is less run time expensive than the CURE clustering algorithm. The use of preliminary clustering may reduce the total amount of run time required and/or the computational cost. However, such reduction(s) may be gained by having the client performing a portion of the clustering, i.e. the preliminary clustering, rather than the server performing all of the clustering.

The method may further comprise, and/or the first circuitry of the system of the second aspect may be further configured for, prior to the step of encrypting, performing binary decomposition of the correlation coefficient vectors. By performing binary decomposition the resulting decomposed correlation coefficient vectors may reduce the complexity of subsequent calculations using the decomposed correlation coefficient vectors. Therefore, performing binary decomposition may improve the efficiency of the methods and systems of the present disclosure, specifically the efficiency after the step of encrypting may be improved. However, the gain in efficiency may be at the cost of additional computation by the client.

The preliminary clustering and binary decomposition may be performed separately or in series. Thereby, it is possible to determine that varying degrees of computation shall be made at the client in order to subsequently improve the efficiency of the subsequent calculations at the server. Further, the client and the server may be in communication with regards to a present, and/or a future or incoming, computational load. The server may further request that the client performs any of the abovementioned steps in order to reduce the incoming computational load for the server.

The homomorphic encryption scheme may be the Brakerski-Gentry- Vaikuntanathan, BGV, homomorphic encryption scheme. The BGV scheme may be implemented using the appropriate algorithms discussed in, for example, any of the following publications: Boddeti V N . Secure Face Matching Using Fully Homomorphic Encryption^]. IEEE, 2018. VISHNU N B. Secure face matching using fully homomorphic encryption [C]// 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), October 01,2018; Brakerski, Zvika & Gentry, Craig & Vaikuntanathan, Vinod. (2011). (Leveled) Fully Homomorphic Encryption without Bootstrapping. Electronic Colloquium on Computational Complexity (ECCC). 18. 111. 10.1145/2090236.2090262:]; Zvika Brakerski and Vinod Vaikuntanathan. Fully homomorphic encryption from ring LWE and security for key dependent messages. In CRYPTO, volume 6841, page 501, 2011; and Zvika Brakerski and Vinod Vaikuntanathan. Efficient fully homomorphic encryption from (standard) LWE. In Ostrovsky [Ostll], pages 97-106. References are to full version: http://eprint.iacr.org/2011/344.

The correlation coefficient may be the Pearson correlation coefficient. The Pearson coefficient may be calculated according to the following equation: wherein r is Pearson correlation coefficient between a user * and a user y, Xj,yj are the i-th elements of the datasets of the user x and the user , respectively, and x,y are the mean of values of the datasets of the user x and the user y.

The framework may be an application, a program, a software or a website. For example, the framework may be running on a mobile phone, a tablet, a computer, and/or on server. The framework may be configured to collect, obtain or gather the data sets for the plurality of users operating in the framework. Further, the framework may be configured to send the data sets for the plurality of users the system, the client and/or the server. Other objectives, features and advantages of the enclosed embodiments will be apparent from the following detailed disclosure, from the attached dependent claims as well as from the drawings.

It is noted that embodiments of the invention relate to all possible combinations of features recited in the claims. Further, it will be appreciated that the various embodiments described for the method as defined in accordance with the first aspect, the embodiments described for the fault locator system according to the second aspect and the fault locator device according to the third aspect are all combinable with each other.

BRIEF DESCRIPTION OF THE DRAWINGS

This and other aspects of the present disclosure will now be described in more detail, with reference to the appended drawings showing embodiment(s) of the disclosure.

Fig. 1 shows flowchart of a method according to an exemplifying embodiment of the present disclosure.

Fig. 2 show a system according to an exemplifying embodiment of the present disclosure.

DETAILED DESCRIPTION

Fig. 1 shows flowchart of a method 1000 for clustering a plurality of users according to an exemplifying embodiment of the present disclosure.

The method 1000 comprises the steps of obtaining 110 a data set (not shown; see Fig. 2) for each user (not shown; see Fig. 2) of a plurality of users operating in a framework (not shown; see Fig. 2), calculating 120 a correlation coefficient between the data sets of each user of the plurality of users, thereby obtaining a correlation coefficient vector for each user of the plurality of users, encrypting 130 the correlation coefficient vectors using a homomorphic encryption scheme, thereby obtaining a homomorphically encrypted correlation coefficient vector for each user. The method 1000 further comprises uploading 140 the homomorphically encrypted correlation coefficient vectors from the client to a server (not shown; see Fig. 2), clustering 150 the plurality of users into a plurality of clusters based on the respective homomorphically encrypted correlation coefficient vector of each user of the plurality of users using a Clustering Using REpresentatives, CURE, algorithm, and wherein the respective homomorphically encrypted correlation coefficient vector(s) of at least one user of each cluster of the plurality of clusters is assigned as a representative point of said cluster.

The method 1000 may comprise the step of preliminary clustering 115 of the data sets, which may be performed before the step of calculating 120. Further, the step of preliminary clustering 115 may be performed after the step of obtaining 110. Furthermore, the method 1000 may comprise the step of performing binary decomposition 125 of the correlation coefficient vectors, which may be performed before the step of encrypting 130 and/or after the step of calculating 120.

Fig. 2 show a system 100 according to an exemplifying embodiment of the present disclosure.

The system 100 comprises a client 10 and a server 20. The client 10 may configured as, for example, a smartphone, a tablet, a computer. The client 10 is configured to be able to operate in a framework 30. The framework 30 may be, for example, a website, an application, or a digital platform, in which a user 5 may operate in via the client 10. By the term operate it may be meant, for example, browse, consume media, and/or access information. Thereby, a potential example is a user 5 accessing a website, i.e. a framework, via an application installed on a user device such as a smartphone, i.e. a client 10. The client 10 comprises at least first circuitry 11 which is configured for a plurality of functions. However, it is to be understood that the present disclosure is not limited to a client 10 comprising solely the first circuitry 11. Moreover, the first circuitry 11 is not limited to the functions described in the following. For example, the first circuitry 11 may be configured as a, for example, central processing unit, CPU, which may be configured to perform a vast set of potential operations based on received instructions. Therefore, the client 10 may, alternatively, be understood as being configured to perform a set of functions, or steps, based on the first circuitry 11 receiving instructions comprising the said set of functions, or steps.

In Fig. 2, the framework 30 is shown as housing a plurality of users 5, wherein each user 5 comprises a data set 7. The first circuitry 11 is configured to obtain Sil data sets 7 for the user 5 operating in the framework 30, and is shown in Fig. 2 as obtaining Sil data sets 7 from the framework 30. The first circuitry 11 is shown to be configured to calculate S12 a correlation coefficient 3 between the data sets 7 of the users 5. The correlation coefficient 3 may be the Pearson's correlation coefficient. However, Fig. 2 further shows that the first circuitry 11 may perform preliminary clustering S17 prior to the calculation S12 of the correlation coefficients 3, which may make the subsequent calculations S12 of correlation coefficients 3 require less processing power, i.e. being more efficient. The first circuitry 11 is further shown in Fig. 2 as being configured to encrypt S13 the correlation coefficient vectors 3 using a homomorphic encryption scheme, which may be the BGV, homomorphic encryption scheme, in order to obtain a homomorphically encrypted correlation coefficient vector 3'. Thereby, for a homomorphically encrypted correlation coefficient vector 3' is created for each user 5 of which its data set 7 has been obtained by the first circuitry 11. Further, the first circuitry 11 may be configured to perform binary decomposition S18 of the correlation coefficient vectors 3 before the encrypting S13 of the correlation coefficient vectors 3. Hence, the first circuitry 11 may configured to encrypt S13 binary decomposed correlation coefficient vectors 3.

The server 20 may be understood as, for example, a cloud server or a cloud service. The server 20 may be configured for performing operations involving data which has been received from one or more clients 10, of which only one is shown in Fig 2, and/or data which pertains to a plurality of users 5. Further, the present disclosure is based at least partly on the concept of using encryption of data sets 7 pertaining to users 5 to allow non-privacy invasive operations on said data sets 7 on a common server 20.

The server 20 comprises second circuitry 22 configured for clustering S15 the plurality of users 5 into a plurality clusters 13 based on the respective homomorphically encrypted correlation coefficient vector 3' of each user 5 of the plurality of users 5, using a CURE algorithm, and wherein the respective homomorphically encrypted correlation coefficient vectors 3' of at the plurality of at least one user 5 of each cluster 13 of the plurality of clusters 13 is assigned as a representative point of said cluster 13.

The server 20 may further be configured for recommending data to users 5 of a cluster 13 based on preferences of the at least one user 5 which has its homomorphically encrypted correlation coefficient vector 3' assigned as a representative point of said cluster 13.

While the present invention has been illustrated in the appended drawings and the foregoing description, such illustration is to be considered illustrative or exemplifying and not restrictive; the present invention is not limited to the disclosed embodiments. Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. In the appended claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. Any reference signs in the claims should not be construed as limiting the scope.