A METHOD AND USER INTERFACE FOR DISPLAYING AND CONTROLLING PRIVACY RISKS DURING MEDIA CONSUMPTION

Title:

A METHOD AND USER INTERFACE FOR DISPLAYING AND CONTROLLING PRIVACY RISKS DURING MEDIA CONSUMPTION

Document Type and Number:

WIPO Patent Application WO/2015/175622

Kind Code:

Abstract:

The present principles provide a method and user interface for displaying and controlling privacy risks that provides a user with privacy, transparency and control of personal attributes, while maintaining the quality of personalized recommendations the user receives. A user is informed about the risk of releasing data related to, for example, media preferences with respect to attributes the user considers private (e.g., political views, age, gender).

Inventors:

BHAMIDIPATI SUBRAHMANYA SANDILYA (US)
FAWAZ NADIA (US)
KVETON BRANISLAV (US)
ZHANG AMY (US)

Application Number:

PCT/US2015/030506

Publication Date:

November 19, 2015

Filing Date:

May 13, 2015

Export Citation:

Click for automatic bibliography generation Help

Assignee:

THOMSON LICENSING (FR)

International Classes:

G06F21/62; G06Q30/02; G06Q30/06; H04W12/02

Other References:

SALAMATIAN SALMAN ET AL: "How to hide the elephant- or the donkey- in the room: Practical privacy against statistical inference for large data", 2013 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, IEEE, 3 December 2013 (2013-12-03), pages 269 - 272, XP032566685, DOI: 10.1109/GLOBALSIP.2013.6736867
B. FUNG; K. WANG; R. CHEN; P. YU: "Privacy Preserving Data Publishing: A Survey of Recent Developments", ACM COMPUTING SURVEYS, 2010

Attorney, Agent or Firm:

SHEDD, Robert, D. et al. (4 Research Way Third Floo, Princeton New Jersey, US)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A method, comprising the steps of:

receiving a user input including at least one user attribute that is to remain private and data that is to be made public;

determining a risk of inferring the at least one user attribute that is to remain private from the data that is to be made public;

distorting the data that is to be made public using a previously determined mapping; determining a risk of inferring the at least one user attribute that is to remain private from the distorted data that is to be made public; and

displaying at least one of the risk of inference determined from the data that is to be made public and the risk of inference determined from the distorted data that is to be made public. 2. The method of claim 1 , comprising displaying selectable user attributes from which a user can select which user attributes are to remain private.

3. The method of claim 1, comprising displaying selectable rating values from which a user can select a rating for content.

4. The method of claim 3, wherein the data that is to be made public comprises rating values assigned to content by a user and the method comprises displaying at least one of the rating values assigned to content by a user and distorted ratings.

5. The method of claim 4, comprising displaying at least one of content

recommendation before user ratings are distorted and content recommendations after user ratings are distorted.

6. The method of claim 1 , wherein the risk of inferring a private attribute is determined according to the following equation:

, , H(A\B = b) ₊

Risk{A, b) = (1 ^ V* 100.

7. The method of claim 1, wherein user attributes that are to remain private comprises at least one of the group consisting of gender, age and political affiliation.

8. The method of claim 1, wherein said mapping is determined using the following Algorithm:

Input: prior p_{A C}

Solve: convex optimization

minimize E_Pc _ [d ( C, B ) ]

PB \C

Subject to l(A; B^≤ £, and pg\_c £ Simplex

Remap : p_{S {B}

Output: mapping pg\_B.

9. An apparatus, comprising:

a display;

a memory for storing at least one of program routines, content and data; and a processor for executing said program routines;

said apparatus configured to:

receive a user input including at least one user attribute that is to remain private and data that is to be made public;

determine a risk of inferring the at least one user attribute that is to remain private from the data that is to be made public;

distort the data that is to be made public using a previously determined mapping;

determine a risk of inferring the at least one user attribute that is to remain private from the distorted data that is to be made public; and

display on the display at least one of the risk of inference determined from the data that is to be made public and the risk of inference determined from the distorted data that is to be made public.

10. The apparatus of claim 9, wherein said display comprises a web interface.

11. The apparatus of claim 10, wherein said display further displays at least one of selectable user attributes, selectable rating values for content, at least one of the rating values assigned to content by a user and distorted rating values and at least one of content recommendation before user ratings are distorted and content recommendations after user ratings are distorted.

12. A user interface, comprising:

selectable user attributes from which a user can select which user attributes are to remain private;

selectable rating values from which a user can select a rating for content; and risk values depicting at least one of the risk of inference of a user attribute that is to remain private determined from at least one selected content rating and the risk of inference of a user attribute that is to remain private determined from at least one distorted content rating.

13. The user interface of claim 12, further comprising a display of at least one of the rating values assigned to content by a user and distorted rating values and at least one of content recommendation before user ratings are distorted and content recommendations after user ratings are distorted.

14. The user interface of claim 12, wherein said user interface comprises a web page.

Description:

A METHOD AND USER INTERFACE FOR DISPLAYING AND CONTROLLING PRIVACY RISKS DURING MEDIA CONSUMPTION

TECHNICAL FIELD

[0001] The present principles relate to statistical inference and privacy-preserving techniques. More particularly, the present principles relate to a user interface for displaying and controlling privacy risks during media consumption. BACKGROUND

[0002] With the advent of targeted advertising and the popularity of mining user data, users of content consumption systems are finding their privacy threatened. To address this rising concern, some privacy-preserving mechanisms have been proposed, such as the mechanism described in B. Fung and K. Wang and R. Chen and P. Yu, "Privacy Preserving Data Publishing: A Survey of Recent Developments," ACM Computing Surveys, 2010.

[0003] Such mechanisms provide strong theoretical guarantees, but often lack practicality. For instance, reaching a sufficiently high level of privacy often requires that the user data be distorted to the point where it is not usable. What is needed is an interactive privacy system, which can implement information-theoretic privacy to provide practical policies for protecting user profiles, while maintaining the utility of sanitized user data.

SUMMARY

[0004] The present principles propose a method and apparatus for displaying and controlling user privacy risks while enabling content consumption.

[0005] In one embodiment of the present principles a method of the present principles includes receiving a user input including at least one user attribute that is to remain private and data that is to be made public, determining a risk of inferring the at least one user attribute that is to remain private from the data that is to be made public, distorting the data that is to be made public using a previously determined mapping, determining a risk of inferring the at least one user attribute that is to remain private from the distorted data that is to be made public and displaying at least one of the risk of inference determined from the data that is to be made public and the risk of inference determined from the distorted data that is to be made public. [0006] In an alternate embodiment of the present principles, an apparatus includes a display, a memory for storing at least one of program routines, content and data and a processor for executing said program routines. In such embodiments the apparatus is configured to receive a user input including at least one user attribute that is to remain private and data that is to be made public, determine a risk of inferring the at least one user attribute that is to remain private from the data that is to be made public, distort the data that is to be made public using a previously determined mapping, determine a risk of inferring the at least one user attribute that is to remain private from the distorted data that is to be made public and display on the display at least one of the risk of inference determined from the data that is to be made public and the risk of inference determined from the distorted data that is to be made public.

[0007] In an alternate embodiment, a user interface of the present principles includes selectable user attributes from which a user can select which user attributes are to remain private, selectable rating values from which a user can select a rating for content and risk values depicting at least one of the risk of inference of a user attribute that is to remain private determined from at least one selected content rating and the risk of inference of a user attribute that is to remain private determined from at least one distorted content rating.

[0008] These and other aspects, features and advantages of the present principles will become apparent from the following detailed description of exemplary embodiments, which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] The teachings of the present principles can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:

FIG. la depicts a high level block diagram of a system for displaying and controlling user privacy risks while enabling content consumption and recommendation in accordance with an embodiment of the present principles;

FIG. lb depicts a functional diagram of the system of FIG. la in accordance with an embodiment of the present principles;

FIG. 2 depicts a high level block diagram of a user interface suitable for implementation in the system of FIG. 1 in accordance with an embodiment of the present principles; FIG. 3 depicts a high level illustrative depiction of a privacy dashboard usable in the user system of FIG. 1 in accordance with an embodiment of the present principles;

FIG. 4 depicts a high level illustrative depiction of a program display page usable in the user system of FIG. 1 in accordance with an embodiment of the present principles;

FIG. 5 depicts a high level illustrative depiction of a program history display page including ratings usable in the user system of FIG. 1 in accordance with an embodiment of the present principles;

Fig. 6 depicts an illustrative graph of the privacy-utility tradeoff in accordance with an embodiment of the present principles;

FIG. 7 depicts an illustrative a receiver operating characteristic (ROC) curve showing the performance of, for example, a logistic regression classifier attempting to infer a user's political views in accordance with an embodiment of the present principles;

FIG. 8a depicts the six (6) top TV show recommendations based on actual ratings given to programming by a user in accordance with an embodiment of the present principles;

FIG. 8b depicts the six (6) top TV show recommendations based on ratings distorted for privacy in accordance with an embodiment of the present principles; and

FIG. 9 depicts a flow diagram of a method for displaying and controlling user privacy risks in accordance with an embodiment of the present principles.

It should be understood that the drawing(s) are for purposes of illustrating the concepts of the various described principles and are not necessarily the only possible configuration for illustrating the principles.

To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. DETAILED DESCRIPTION

[0010] Embodiments of the present principles advantageously provide a method and apparatus for displaying and controlling risks to user privacy while enabling content consumption. Although the present principles will be described primarily within the context of video content consumption, such as TV programming, and program recommendations and program ratings, the specific embodiments of the present principles should not be treated as limiting the scope of the invention. It will be appreciated by those skilled in the art and informed by the teachings of the present principles that the concepts of the present principles can be advantageously applied to any content including audio, video and any combination thereof and can be interfaced with online video services, as well as TV and VoD services and can be implemented with publicly released data other than program ratings. The present principles can also be extended to other media content, such as music, books, news, and to other products, services, or locations rated online by users. The concepts of the present principles can also be adapted to display and control privacy risks in the context of social networks. For example, users can be informed of the privacy risks of actions such as likes, connecting to friends, etc., in embodiments, prior to taking those actions, and users can be provided means to control these risks. In such a context, data distortion could for example amount to simply avoiding taking some actions, or avoiding the release of some data.

[0011] In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The present principles as defined by such claims reside in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.

[0012] Moreover, all statements herein reciting principles, aspects, and embodiments of the present principles, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.

[0013] The functions of the various elements shown in the figures can be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions can be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which can be shared. Moreover, explicit use of the term "processor" or "controller" should not be construed to refer exclusively to hardware capable of executing software, and can implicitly include, without limitation, digital signal processor ("DSP") hardware, read-only memory ("ROM") for storing software, random access memory ("RAM"), and non-volatile storage. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future (i.e., any elements developed that perform the same function, regardless of structure).

[0014] Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative system components and/or circuitry embodying the principles of the invention. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.

[0015] Furthermore, because some of the constituent system components and methods depicted in the accompanying drawings can be implemented in software, the actual connections between the system components or the process function blocks may differ depending upon the manner in which the present principles are programmed. Given the teachings herein, one of ordinary skill in the pertinent art will be able to contemplate these and similar implementations or configurations of the present principles.

[0016] Embodiments of the present principles provide an interactive privacy-preserving personalized video consumption system, which enables the display and control of risks to a user's privacy during content consumption and interaction. Embodiments of the present principles implement an information theoretic framework to display and control the results of a utility- aware privacy- preserving mapping that perturbs a user's video ratings to prevent inference of user attributes that can be considered by the user as private, such as political affiliation, age, gender, while, in one embodiment, maintaining the utility of released perturbed/distorted ratings for recommendation.

[0017] FIG. la depicts a high level block diagram of a system 100 for displaying and controlling risks to user privacy while enabling content consumption in accordance with an embodiment of the present principles. The system 100 of FIG. la illustratively comprises a user client 105, a privacy server 110, and a recommendation server 115. In one embodiment of the system 100 of FIG. la, the user client 105 can comprise a web interface. In various embodiments of the present principles such web interface can be written in HTML5 and javascript. Similarly, in an embodiment of the system 100 of FIG. la, the privacy server 110 and the recommendation server 115 can comprise flask interfaces, which is a python based micro web framework. In the embodiment of FIG. la, the privacy server 110, and the recommendation server 115 comprise a privacy agent 125. Although in the illustrative embodiment of FIG. la, the user client 105, the privacy server 110, and the recommendation server 115 are depicted as comprising separate components, in alternate embodiment of the present principles, the functionality of each of the components can be integrated into a single component or any other combination of components. The components of FIG. la can, in one embodiment, comprise at least a portion of a content recommendation system.

[0018] FIG. lb depicts a functional diagram of the system of FIG. la in accordance with an embodiment of the present principles. More specifically, in FIG. lb, data from at least one user which includes personal user attributes and data a user might consider public, such as program ratings, is communicated to the privacy agent. The privacy agent performs quantization of the data to quantize B into C, which will be described in greater detail below. In the embodiment of FIG. lb, an estimation of a distribution that links the quantized data to data at least one user considers private, such as user attributes including age, gender and political preference, is performed. Again, such process is described in greater detail below. The estimated distribution is then implemented to determine and design a privacy mapping of data a user is willing to make public to a distorted version of such data, such that inference of user attributes that might be considered private from data that a user is willing to make public becomes much more difficult or specifically cannot outperform an uninformed random guess. In the embodiment of FIG. lb, such privacy mapping is determined as a result of an Algorithm described in greater detail below. As such and as depicted in FIG. la, user data a user is willing to make public, such as program recommendations, can be manipulated by the mapping of the present principles to create a distorted version of such data to be communicated to, for example a service provider/content source, to be used by the service provider/content source for providing, for example, personalized content recommendations for the user, without being able to determine from the distorted version of the data, personal attributes of the user, which the user has indicated the user wants to remain private. In the embodiment of FIG. lb, such information is also communicated back to the user to be presented to the user via, for example, a user interface to inform the user of privacy risks associated with the data released by the user (described in greater detail below). The user can further use the user interface to communicate data to a system of the present principles; the data including at least which attributes a user wants to keep private and data a user is willing to publically release, for example, program ratings. The user interface can also be implemented to display to a user the differences in user privacy risks/threats if data is made public by a user versus releasing a version of the same data distorted in accordance with the present principles.

[0019] In the system 100 of FIG. 1 (FIG. la and FIG. lb, collectively), the user client 105 enables a user to interact with available privacy settings, provide ratings for consumed content, such as TV shows, and displays recommendations based on the user's privacy settings and privatized ratings. In the system 100 of FIG. 1, the privacy server 110 and the recommendation server 115 serve client requests (web pages) and store and retrieve data from databases, such as user and privacy mapping data for the privacy server and content and data for the recommendation server 115. The privacy server 110 further performs rating privatization based on the user's privacy settings and sends privatized ratings to the recommendation server 115 and to the user client 105, which will be described in further detail below. The recommendation server 115 generates recommendations based on the user's privatized ratings, and communicates such recommendations to the user client 105.

[0020] The system 100 of FIG. 1 can further comprise at least one database 120. In one embodiment of the present principles the database can comprise one or more MongoDB databases. The database(s) 120 can store user privacy settings and user interactions with the content (e.g. ratings) and data related to privacy mapping. Such database(s) 120 are accessed by the privacy server 110 to gain access to such data. The database(s) 120 can further store content metadata used to display on the web interface at the client side and as such can be accessed by the client server 105. The database(s) 120 can further store content profiles for recommendation purposes and can be accessed by the recommendation server 115. Although in FIG. la, the database is illustratively a separate component, in alternate embodiments of the present principles, a database of the present principles can be an integrated component at least one of the privacy server 110, the recommendation server 115 and the user client.

[0021] FIG. 2 depicts a high level block diagram of a user client/interface 105 in accordance with an embodiment of the present principles. The user client 105 of FIG. 2 comprises a processor 210 as well as a memory 220 for storing control programs, instructions, software, video content, data, user ratings, determined risks and the like. The processor 210 cooperates with conventional support circuitry 230 such as power supplies, clock circuits, cache memory and the like as well as circuits that assist in executing the software routines stored in the memory 220. As such, it is contemplated that some of the process steps discussed herein as software processes can be implemented within hardware, for example, as circuitry that cooperates with the processor 210 to perform various steps. The user client 105 also contains input-output circuitry 240 forming an interface between the respective functional elements communicating with the user client 105. As noted throughout this disclosure, the memory 220 can be a hard disk storage device, a static RAM, a DRAM, ROM, etc., or combinations of the same. The user client 105 can include a display 145.

[0022] Although the user client 105 of FIG. 2 is depicted as a general purpose computer that is programmed to perform various control functions in accordance with the present invention, the invention can be implemented in hardware, for example, as an application specified integrated circuit (ASIC). As such, the process steps described herein are intended to be broadly interpreted as being equivalently performed by software, hardware, or a combination thereof.

[0023] In the system 100 of FIG. 1, the user client 105 can include a privacy dashboard which displays the privacy settings of the user and a privacy monitor. FIG. 3 depicts a high level illustrative depiction of a privacy dashboard 300 usable in the user client 105 of the system 100 of FIG. 1 in accordance with an embodiment of the present principles. The privacy settings of the privacy dashboard 300 in accordance with the illustrative embodiment of FIG. 3 enable a user to select any combination of attributes that a user deems private and wishes to protect. For example, in the embodiment of FIG. 3, the privacy dashboard 300 illustratively includes three attributes — age, gender, and political views— from which a user can select and indicate that the user wishes to keep private and protect from, for example, the threat of inference. It should be noted that in accordance with the present principles, it is not required for the user to reveal what his political view, age, or gender are, but only whether the user considers any of these features as sensitive information that the user wants to remain private.

[0024] In the embodiment of the privacy dashboard 300 of FIG. 3, the privacy monitor 305 displays the inference threat for each private attribute from the actual TV show ratings entered by the user, and from the distorted privacy-preserving ratings. Thus, the privacy monitor 305 enables the user to compare what his privacy risk would have been if he did not activate the privacy protection of the present principles with the risk to his privacy after the inference that could be made from his ratings after they were sanitized using the privacy- preserving mechanism of the present principles.

[0025] In considering privacy, it is considered that a user has two types of data: some data that the user would like to remain private such as political views, age, gender, and some data that the user is willing to release publicly and from which the user will derive some utility, such as the release of media preferences (TV show ratings) to a service provider. The release of such data, for example, enables the user to receive content recommendations. In one embodiment, A denotes the vector of personal attributes that the user wants to keep private, and B denotes the vector of data the user is willing to make public. The user private attributes, A, are linked to B by the joint probability distribution p _{A B} . Thus, a party interested in determining personal information about the user, such as a service provider or a third party with whom the user may exchange data, can observe the B data and infer some information about the personal attributes, A, that a user wishes to keep private.

[0026] In accordance with embodiments of the present principles, to reduce this inference threat, instead of releasing B, the user will release a distorted version of B, denoted B. In one embodiment of the present principles, B is generated according to a conditional probabilistic mapping pg _|B considered herein as the privacy-preserving mapping. The privacy-preserving mapping, pg\ , is designed in such a way that it renders any statistical inference of the A data based on the observation of B harder, yet, at the same time, preserves some utility to the released data B, by limiting the distortion generated by the mapping. The privacy-preserving mapping is designed to control the privacy leakage, modeled as the mutual information I(A; B) between the private attributes, A, and the publicly released data B, subject to a utility requirement, modeled by a constraint on the average distortion E _Bg

renders the released data B statistically independent from the private data A, and any inference algorithm that tries to infer the private data, A, from the released data, B, cannot outperform an uninformed random guess. The privacy and utility metrics and the design of the privacy-preserving mapping, are discussed in greater details below.

[0027] It should be noted that in the local setting, perfect privacy I(A; B) = 0 is equivalent to statistical independence between A and B. That is, =

Ρ _Β Φ), for all a, a' and b, which in turn is equivalent to B being locally 0-differential private with respect to A. Indeed, in the local setting, on one hand the local database A is of size 1 as it contains only the data of a single individual user, thus all databases a, a' are neighboring databases. On the other hand, the service provider asks for the query B, which due to its correlation with A can be considered as a randomized function of A, and receives the sanitized version, B. Thus, in the local privacy setting at perfect privacy, the information theoretic privacy metric and the differential privacy metric are equivalent with respect to the private data, A.

[0028] In one embodiment of the present principles, to model the inference threat for each private attribute from a particular rating vector representing the user history of ratings, a privacy risk metric on a scale [0,100] is implemented. For a private attribute, A, and a specific vector of ratings B = b, the privacy risk can be defined according to equation one (1), which follows:

H A\B = b)

Risk A, b) = (1 - _* loo

{A) (i) where H(A) = —∑ _ap _A (a \og p _A (a denotes the entropy of the variable A distributed according to p _A a), and represents the inherent uncertainty on A. Similarly, it should be noted that H(AIB = b) denotes the remaining entropy of A given the observation B = b, and represents the remaining uncertainty on A after observing B = b. Intuitively, the privacy risk Risk(A, b) measures the percentage by which the uncertainty on A decreases due to the observation of B = b, relative to the original uncertainty prior to observing B. A privacy Risk(A, b) = 0 means that the rating vector B = b does not provide any information about the private attribute A, while a Risk(A, b) = 100 implies that no uncertainty is left about the attribute A from observing the rating vector B = b. The privacy risk based on the user's actual rating vector B = b is Risk(A, b), while the privacy risk based on the distorted ratings B = b is Risk(A, b), and is obtained by replacing B = b in (l) by B = S. The mutual information between the private data A and the distorted data B o (2), which follows:

(2)

which is related to the average of the privacy risks over all possible distorted rating vectors B. Achieving perfect privacy (I(A; B) = 0) ensures a 0-privacy risk, meaning that any inference algorithm that would try to infer A from B would not outperform an uninformed random guess.

[0029] Referring back to the privacy dashboard 300 of FIG. 3, after completing the entering of the user's privacy settings, the user can then access a program guide, such as a TV guide (not shown), and pick a program that the user would like to consume/watch. In various embodiments of the present principles, on a program display page, the user can give a star rating to the program. For example, FIG. 4 depicts a high level illustrative depiction of a program display page usable in the user system 100 of FIG. 1 in accordance with an embodiment of the present principles. On the program display page 400 of FIG. 4, a privacy risk monitor 405 reminds the user of the user' s privacy risk based on available history of the user's previous program ratings. When the user hovers a pointing/selection device above the star ratings for the displayed program, for each possible rating, for example { 1,..,5 }, the privacy risk monitor 405 dynamically updates the risk numbers to inform the user of how the privacy risk would evolve if the user selected a particular rating.

[0030] In one embodiment of the present principles, the privacy risk monitor 405 displays the updated risk based on actual ratings, before sanitization in accordance with the present principles. In such an embodiment, once the user selects a rating, the rating value is sanitized in accordance with the present principles. A user is then able to use the privacy dashboard 300 to check that the privacy risk after the sanitization of the rating value has decreased and, in the best case scenario, is equal to zero (0) for the attributes the user selected as private.

[0031] In one embodiment, the system 100 of FIG. 1 implements a privacy-preserving process for the release of the user program ratings to, for example, a service provider, that ensures perfect privacy (I(A; B ) = 0) against statistical inference of the user's private features, while at the same time minimizing the distortion to the released data/ratings. FIG. 5 depicts a high level illustrative depiction of a program history display page 500 including ratings usable in the user system 100 of FIG. 1 in accordance with an embodiment of the present principles. The program history display page 500 of FIG. 5 displays the user's actual ratings and the distorted ratings on a display of a respective program.

[0032] The privacy-preserving mapping, of embodiments of the present principles requires characterizing the value of Ρ _β\ _ΰ (β\^ for all possible pairs (b, b £ 2 x 2, i.e. solving the convex optimization problem over |2| |2 | variables. When 2 = 2, and the size of the alphabet 121 = 6 ⁵⁰ is large, solving the convex optimization over |2| ² variables may become intractable. Quantization can be used to reduce the number of optimization variables, from |2| ² to K ² , where K denotes the number of quantization levels. It should be noted that the choice of K is a tradeoff between the size of the optimization and the additional distortion introduced by quantization. Quantization assumes that vectors B lie in a metric space. Directly applying quantization on the original rating vector, B, where unrated shows are assigned a 0 rating, would cause the perception of unrated programs as strongly disliked by the user, when such programs

actually may not be disliked, but simply unknown to the user for example. To circumvent this issue, the rating vector B is completed into B _c using low rank matrix factorization, a standard collaborative filtering technique. The completed rating vector B _c is then input into a quantization process that maps B _c to a cluster center, C. In one embodiment of the present principles, for quantization, K-means clustering is used, with K = 75 cluster centers, where a choice of K was guided empirically. The cluster center, C, is then fed to a privacy optimization algorithm, that finally outputs a distorted rating vector B .

[0033] For example, in one embodiment of the present principles, the following algorithm can be used to describe the quantized privacy-preserving mapping of the present principles:

Algorithm 1 Quantized privacy-preserving mapping

Input: prior p _{A C}

Solve: convex optimization

minimize E _Pc _ [d ( C, B ) ]

PB \C

Subject to l(A; B ^{^}) < e, and pg\ _c £ Simplex

Remap : p _{S B} = p _{S lc(B)}

Output: mapping

In summary, the design of the privacy-preserving mapping described in the embodiment of Algorithm 1, follows the Markov chain A → B → B _c → C → B.

[0034] It should be noted that computing the privacy Risk(A, b), as well as finding the privacy-preserving mapping as a solution to privacy convex optimization rely on the fundamental assumption that the prior distribution p _{A B} that links private attributes A and data B is known and can be used as an input to the algorithm. In practice, the true distribution may not be known, but can rather be estimated from a set of sample data that can be observed, for example from a set of users who do not have privacy concerns and publicly release both their attributes A and their original data B. However, such a dataset may contain a small number of samples or be incomplete, which makes the estimation of the prior distribution challenging.

[0035] In the Algorithm 1 illustrated above, the prior distribution between private data and quantized completed data is used. The distribution is estimated using Kernel Density Estimation, with a Gaussian kernel with width σ = 9.5. That is, in Algorithm 1 , e bounds the amount of information about the private data A that is leaked by the distorted data B , and thus represents the level of privacy requirement on the user side. Varying e enables the study of the tradeoff between privacy requirement and distortion. FIG. 6 depicts an illustrative graph of the privacy-utility tradeoff in accordance with an embodiment of the present principles. The graph of FIG. 6 depicts the graphing of mutual information I (A; B) against end-to-end distortion (quantization + privacy mapping) per rating. K-means quantization introduces a distortion 1.08 per rating and yields a mutual information I (A; C) = 0.2. With 0.14 additional distortion, the privacy-preserving mapping achieves perfect privacy I (A; B) =

0 for an end-to-end distortion of 1.22.

[0036] In one embodiment of the present principles, the focus is on perfect privacy

1 (A; i?) = 0, thus on e being close to 0. At perfect privacy, any inference algorithm that tries to infer A from B can only perform as well as an uninformed random guess. In such an embodiment, intuitively, B is statistically independent from A and, thus the privacy mapping of the present principles has statistically 'erased' any information about the private data A that can be inferred from B .

[0037] FIG. 7 depicts an illustrative receiver operating characteristic (ROC) curve showing the performance of, for example, a logistic regression classifier attempting to infer a user's political views from the original rating vector from a binarized version of the rating vector where ratings >= 4 are mapped to 1 (like), and ratings <= 3 are mapped to zero (dislike), or from rating vectors distorted according to a privacy-preserving mapping with average distortion <= 1 (second curve from the bottom), or distortion <= 2 (bottom curve). In the embodiment of FIG. 7, 10-fold cross validation was used and the false positive rate (e.g., Democrats falsely classified as Republicans) were plotted against the true positive rate (e.g., Republicans correctly classified). The top curve in FIG. 7 depicts the privacy risk on inferring the political views from the original rating vectors and the second curve from the top depicts that merely binarizing the ratings is not enough to ensure privacy. The bottom, straight diagonal line of FIG. 7 depicts the results of an un-informed random guess. As seen in FIG. 7, the bottom, straight diagonal line is very similar to the bottom curve, which demonstrates that with distortion <= 2, the privacy-preserving mechanism of the present principles successfully ensures privacy against logistic regression of political views from distorted ratings. Further inference attacks were performed with other classifiers, including Naive Bayes, and SVM, and similar results were observed.

[0038] The privacy-preserving process of the present principles preserves the relevance of recommendations even when ratings distorted for privacy are used. FIG. 8 depicts a high level illustrative depiction of a recommendations page in accordance with an embodiment of the present principles. In FIG. 8, FIG. 8a depicts the six (6) top TV show recommendations based on actual ratings given to programming by a user and FIG. 8b depicts the six (6) top TV show recommendations based on ratings distorted for privacy in accordance with the present principles. In one embodiment of the present principles, low rank matrix factorization (MF) is used to predict missing show ratings from ratings provided by the user for shows not rated. The MF recommender engine (not shown) was trained by alternating regularized least square. In one embodiment of the present principles, the recommender engine can be a component/function of the recommendation server 115 of FIG. 1. As depicted in FIG. 8, there is an overlap of 4 out of 6 recommendations without and with privacy, which illustrates that privacy-preserving process of the present principles preserves maintains utility while protecting user privacy.

[0039] Further testing was performed to illustrate that privacy-preserving process of the present principles is able to eliminate the privacy threat from B for chosen attributes A with little effect on the quality of recommendations. For example, in one experiment, 5-fold cross validation was used to split a dataset into a training set containing 80% of the data, and a test set containing the remaining 20% of the data on which the MF recommender engine was tested both with and without privacy activated to compare the relevance of recommendations in these two cases. The random splitting into training and test sets was performed 5 times, as shown in the first row of Table I, below.

Set 1 2 3 4 5

RMSE1 (f) 1.2434 1.3208 1.2657 1.3359 1.2928

RMSE2 (r) 1.3469 1.3522 1.4182 1.3969 1.3708

Table I: Rating prediction RMSE [0040] More precisely, in each test set, 10% of the ratings were removed and attempted to be predicted. Table I above depicts the RMSE in rating prediction based on actual ratings and on distorted ratings. In Table I above, r denotes predicted ratings based on the actual ratings provided by users for other shows, while r denotes predicted ratings based on the ratings distorted for privacy. The prediction RMSE for r (RMSE1, privacy not activated) and for r (RMSE2, privacy activated) are calculated on the 10% of ratings that were removed. Table I shows that the RMSE for rating prediction does not degrade much when privacy protection is activated with respect to rating prediction without privacy. It should be noted that the results presented above are for the case of perfect privacy (I(A; B) = 0), meaning that any inference algorithm that would try to infer A, e.g. political views, from ratings B would not outperform an uninformed random guess. If the privacy requirements were less stringent, for example (I(A; B ) < e), for some e > 0, then the RMSE for rating prediction with privacy protection would be even closer to the RMSE without privacy. Finally, it should be noted that using a more advanced and optimized recommendation engine, instead of the aforementioned standard MF recommendation engine, would result in better rating prediction quality both without and with privacy protection.

[0041] FIG. 9 depicts a flow diagram of a method for displaying and controlling risks/threats to user privacy in accordance with an embodiment of the present principles. The method 900 begins at step 902 in which a user input is received regarding at least one user attribute that is to remain private and data that is to be made public. The method 900 can then proceed to step 904.

[0042] At step 904, a risk of inference of the at least one user attribute that is to remain private from the data that is to be made public is determined. The method 900 can then proceed to step 906.

[0043] At step 906, the data that is to be made public is distorted using a previously determined mapping. The method 900 can then proceed to step 908.

[0044] At step 908, a risk of inference of the at least one user attribute that is to remain private from the distorted data that is to be made public is determined. The method 900 can then proceed to step 910.

[0045] At step 910, at least one of the risk of inference determined from the data that is to be made public and the risk of inference determined from the distorted data that is to be made public is displayed. The method 900 can then be exited. [0046] In alternate embodiments of the present principles, the method 900 can optionally further include displaying selectable user attributes from which a user can select which user attributes the user wishes to keep private.

[0047] In alternate embodiments of the present principles, the method 900 can optionally further include displaying selectable rating values from which a user can select a rating for content and/or content programming. In such embodiments, the user ratings would be considered data that the user is willing to make public.

[0048] In alternate embodiments of the present principles, the method 900 can optionally further include displaying at least one of rating values assigned to content by a user and the distorted ratings distorted in accordance with the teachings herein.

[0049] In alternate embodiments of the present principles, the method 900 can optionally further include displaying at least one of content recommendation before the user ratings are distorted and content recommendations after the user ratings are distorted.

[0050] The present description illustrates embodiments of the present principles. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the present principles and are included within its scope. That is, having described various embodiments of a method and apparatus for displaying and controlling risks to user privacy while enabling content consumption and recommendation (which are intended to be illustrative and not limiting), it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes can be made in the particular embodiments of the present principles disclosed which are within the scope and spirit of the invention. While the forgoing is directed to various embodiments of the present principles, other and further embodiments of the invention may be devised without departing from the basic scope thereof.

Previous Patent: SECURED ON-DEMAND ENERGY SYSTEMS

Next Patent: COVER ASSEMBLIES, KITS AND METHODS FOR COVERING ELECTRICAL CABLES AND CONNECTIONS