Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS FOR DETECTING LATENT SOURCES AND USER PREFERENCES
Document Type and Number:
WIPO Patent Application WO/2015/057356
Kind Code:
A1
Abstract:
In a particular implementation, we decompose a sequences of arrival data representing users responses to common stimuli as a function of latent sources and a weight loading matrix, wherein the latent sources relate to the inherent characteristics of the stimuli, and the loading matrix reflects users tastes or preferences associated with 5 the latent source characteristics. In one embodiment, we model the arrival data for a user as an inhomogeneous Poisson process. The rates of the Poisson processes for different users share information through the latent sources, and also depend on the users preferences. The latent sources may be modeled as binary signals, using binary semi-Markov Jump Processes 10 (bsMJP). The loading matrix may be modeled as Gaussian variables. The model parameters can be solved using a Markov Chain Monte Carlo algorithm combined with a continuous-time thinning-based sampling approach.

Inventors:
ERIKSSON BRIAN CHARLES (US)
LIAN WENZHAO (US)
Application Number:
PCT/US2014/056885
Publication Date:
April 23, 2015
Filing Date:
September 23, 2014
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
THOMSON LICENSING (FR)
International Classes:
G06K9/00; G06K9/62
Other References:
FERNANDO SILVEIRA ET AL: "Predicting audience responses to movie content from electro-dermal activity signals", PROCEEDINGS OF THE 2013 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING, 8 September 2013 (2013-09-08), pages 707 - 716, XP055160960, ISBN: 978-1-45-031770-2, DOI: 10.1145/2493432.2493508
WENZHAO LIAN ET AL: "Modeling Correlated Arrival Events with Latent Semi-Markov Processes", INTERNATIONAL CONFERENCE ON MACHINE LEARNING, 27 January 2014 (2014-01-27), XP055160973
MONEY A G ET AL: "Analysing user physiological responses for affective video summarisation", DISPLAYS DEVICES, DEMPA PUBLICATIONS, TOKYO, JP, vol. 30, no. 2, 1 April 2009 (2009-04-01), pages 59 - 70, XP026007488, ISSN: 0141-9382, [retrieved on 20090318], DOI: 10.1016/J.DISPLA.2008.12.003
SILVEIRA, F.; ERIKSSON, B.; SHETH, A.; SHEPPARD, A.: "A biometrics based approach to characterize viewer responses to movies", ACM UBICOMP CONFERENCE, 2013
RIEHLE, A.; GRUN, S.; DIESMANN, M.; AERTSEN, A.: "Spike synchronization and rate modulation differentially involved in motor cortical function", SCIENCE, vol. 278, 1997, pages 1950 - 1953
SMITH, A.; BROWN, E.: "Estimating a state-space model from point process observations", NEURAL COMPUTATION, vol. 15, 2003, pages 965 - 991
ESCOLA, S.; FONTANINI, A.; KATZ, D.; PANISKI, L.: "Hidden markov models for the simulus-response relationships of multistate neural systems", NEURAL COMPUTATION, vol. 23, 2011, pages 1071 - 1132
YU, M BYRON.; CUNNINGHAM, J. P; SANTHANAM, G.; RYU, S. I; SHENOY, K. V; SAHANI, M.: "Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity", JOURNAL OF NEUROPHYSIOLOGY, vol. 102, no. 1, 2009, pages 614 - 635
SCOTT, S. L.; SMYTH, P.: "The Markov modulated Poisson process and Markov Poisson cascade with applications to web traffic modeling", BAYESIAN STATISTICS, vol. 7, 2003, pages 1 - 10
FELLER, W.: "On semi-markov processes", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, vol. 51, no. 4, 1964, pages 653
GRIFFITHS, T. L.; GHAHRAMANI, Z.: "Infinite latent feature models and the Indian buffet process", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, vol. 18, 2006
BHATTACHARYA, A.; DUNSON, D.: "Sparse bayesian infinite factor models", BIOMETRIKA, vol. 98, no. 2, 2011, pages 291 - 306
RAO, V.; TEH, Y.: "MCMC for continuous-time discrete state systems", NIPS CONFERENCE, 2012
FRUWIRTH-SCHNATTER: "Data augmentation and dynamic linear models", J. TIME SER. ANAL., vol. 15, 1994, pages 183 - 202
Attorney, Agent or Firm:
SHEDD, Robert, D. et al. (2 Independence Way Suite 200Princeton, New Jersey, US)
Download PDF:
Claims:
CLAIMS:

1. A method for providing recommendations, comprising:

accessing (210) a sequence of data, the sequence of data representing a particular user's response to a set of latent sources;

generating (220, 230) a model for the sequence of data responsive to the set of latent sources weighted by a respective weight of a set of weights, each weight indicating relevance of a respective latent source to the particular user;

determining preference of the particular user responsive to the set of weights; and generating the recommendations for the particular user responsive to the determined preference.

2. The method of claim 1 , wherein the generating a model is based on at least one of a Markov Chain Monte Carlo (MCMC) algorithm and a sampling approach. 3. The method of claim 1, wherein the set of latent sources is modelled as a Poisson process, an arrival rate of the Poisson process being modeled responsive to the set of latent sources weighted by the respective weight of the set of weights.

4. The method of claim 1 , wherein the set of latent sources are modelled as binary semi-Markov Jump Processes.

5. The method of claim 1, wherein there are a plurality of users, the plurality of users including the particular user, and wherein the accessing includes accessing a respective sequence of data for each of the plurality of users, and wherein the generating a model includes generating a model for the respective sequence of data responsive to the set of latent sources and a respective set of weights for each of the plurality of users.

6. The method of claim 5, further comprising:

clustering the plurality of users based on the respective sets of weights.

7. The method of claim 5, further comprising:

predicting whether the particular user provides similar user ratings as another user from the plurality of users based on the set of weights for the particular user and a set of weights for the another user.

8. The method of claim 1, further comprising:

determining the number of latent sources in the set of latent sources using a shrinkage approach.

9. The method of claim 1, wherein the sequence of data corresponds to a sequence of arrival data and each latent source of the set of latent sources corresponds to a plurality of external events, an external event resulting in arrivals in the sequence of arrival data.

10. The method of claim 9, further comprising:

monitoring the sequence of data; and

modifying the plurality of external events responsive to the preference of the particular user.

11. The method of claim 9, wherein each one of the plurality of external events is modelled as a binary signal, indicating a presence or absence of the each one of the plurality of external events.

12. An apparatus for providing recommendations, comprising:

a data collection unit (1112, 1114) configured to access a sequence of data, the sequence of data representing a particular user's response to a set of latent sources;

a latent source detector (1120) configured to generate a model for the sequence of data responsive to the set of latent sources weighted by a respective weight of a set of weights, each weight indicating relevance of a respective latent source to the particular user; and

a recommendation engine (1130) configured to determine preference of the particular user responsive to the set of weights and to generate the recommendations for the particular user responsive to the determined preference.

13. The apparatus of claim 12, wherein the latent source detector is configured to generate the model based on at least one of a Markov Chain Monte Carlo (MCMC) algorithm and a sampling approach.

14. The apparatus of claim 12, wherein the set of latent sources is modelled as a Poisson process, an arrival rate of the Poisson process being modeled responsive to the set of latent sources weighted by the respective weight of the set of weights. 15. The apparatus of claim 12, wherein the set of latent sources are modelled as binary semi-Markov Jump Processes.

16. The apparatus of claim 12, wherein there are a plurality of users, the plurality of users including the particular user, and wherein the data collection unit is configured to access a respective sequence of data for each of the plurality of users, and wherein the latent source detector is configured to generate the model for the respective sequence of data responsive to the set of latent sources and a respective set of weights for each of the plurality of users.

17. The apparatus of claim 16, wherein the latent source detector is further configured to cluster the plurality of users based on the respective sets of weights.

18. The apparatus of claim 16, wherein the recommendation engine is further configured to predict whether the particular user provides similar user ratings as another user from the plurality of users based on the set of weights for the particular user and a set of weights for the another user.

19. The apparatus of claim 12, wherein the latent source detector is further configured to determine the number of latent sources in the set of latent sources using a shrinkage approach.

20. The apparatus of claim 12, wherein the sequence of data corresponds to a sequence of arrival data and each latent source of the set of latent sources corresponds to a plurality of external events, an external event resulting in arrivals in the sequence of arrival data.

21. The apparatus of claim 20, further comprising

a processor configured to monitor the sequence of data, and to modify the plurality of external events responsive to the preference of the particular user.

22. The apparatus of claim 20, wherein each one of the plurality of external events is modelled as a binary signal, indicating a presence or absence of the each one of the plurality of external events.

Description:
Method and Apparatus for Detecting Latent Sources and User Preferences

CROSS-REFERENCE TO RELATED APPLICATIONS

[1] This application claims the benefit of the filing date of the following U.S.

Provisional Application, which is hereby incorporated by reference in its entirety for all purposes: Serial No. 61/892,809, filed on October 18, 2013, and titled "Method and

Apparatus for Detecting Latent Sources and User Preferences."

TECHNICAL FIELD

[2] This invention relates to a method and an apparatus for detecting latent sources and user preferences.

BACKGROUND

[3] The study and development of complex dynamical systems has led to the availability of increasingly large datasets, recording events evolving asynchronously and at multiple time-scales. Modeling such data by discretizing time at the resolution of the fastest events is usually inefficient and inelegant, especially when the relevant time-scales themselves must be inferred. It is much more natural to work directly in continuous -time, and there has been a growth of such applications in the statistics and machine learning communities.

SUMMARY

[4] The present principles provide a method for providing recommendations, comprising: accessing a sequence of data, the sequence of data representing a particular user's response to a set of latent sources; generating a model for the sequence of data responsive to the set of latent sources weighted by a respective weight of a set of weights, each weight indicating relevance of a respective latent source to the particular user; determining preference of the particular user responsive to the set of weights; and generating the recommendations for the particular user responsive to the determined preference. The present principles also provide an apparatus for performing these steps. [5] The present principles also provide a method for providing recommendations, comprising: accessing a sequence of data, the sequence of data representing a particular user's response to a set of latent sources; generating a model for the sequence of data using a Poisson process based on at least one of a Markov Chain Monte Carlo (MCMC) algorithm and a sampling approach, wherein an arrival rate of the Poisson process is modeled in response to the set of latent sources weighted by a respective weight of a set of weights, each weight indicating relevance of a respective latent source to the particular user, and wherein the set of latent sources are modelled as binary semi-Markov Jump Processes and the set of weights are modeled as Guassian variables; determining preference of the particular user responsive to the set of weights; and generating the recommendations for the particular user responsive to the determined preference. The present principles also provide an apparatus for performing these steps.

[6] The present principles also provide a computer readable storage medium having stored thereon instructions for providing recommendations according to the methods described above. BRIEF DESCRIPTION OF THE DRAWINGS

[1] FIG. 1 A is a pictorial example depicting a sequence of point process data based on GSR (Galvanic Skin Response) arousal reactions of a first user; FIGs. IB and 1C are pictorial examples depicting arrival rates, of the first user, which are estimated jointly (based on all users) and separately (based on the first user alone), respectively; FIG. ID is a pictorial example depicting a sequence of point process data based on GSR arousal reactions of a second user; FIGs. IE and IF are pictorial examples depicting arrival rates, of the second user, which are estimated jointly (based on all users) and separately (based on the second user alone), respectively. [2] FIG. 2 is a flowchart depicting an exemplary method for determining latent sources and loading matrix, according to an embodiment of the present principles.

[3] FIG. 3 is a flowchart depicting an exemplary method for estimating the parameter values of all the non-bsMJP variables, according to an embodiment of the present principles.

[4] FIG. 4. is a graphical representation of a mathematical model, according to an embodiment of the present principles.

[5] FIG. 5. is a pictorial example illustrating the normalized estimation error of instantaneous Poisson rate for a proposed method and binning methods.

[6] FIGs. 6A and 6B are pictorial examples illustrating estimation error of latent sources and estimation error of factor loadings, respectively, with different observation lengths and base rates.

[7] FIG. 7A is a pictorial example illustrating ground truth for sources, FIGs. 7B and 7C are pictorial examples illustrating inferred sources using bsMJP and bMJP models, respectively, and FIG. 7D is a pictorial example illustrating the posterior distribution of a shape parameter inferred using bsMJP, according to an embodiment of the present principles. [8] FIG. 8. is a pictorial example depicting posterior distribution of the number of latent sources, according to an embodiment of the present principles.

[9] FIGs. 9A and 9B are pictorial examples depicting ROC curves predicting user arousal intensity and user rating similarity from loading vectors w u , respectively, according to an embodiment of the present principles.

[10] FIG. 1 OA is a pictorial example depicting the posterior distribution over the possible number of dominant sources, and FIG. 1 OB is a pictorial example depicting 5 dominant latent sources, according to an embodiment of the present principles.

[11] FIG. 11 is a block diagram depicting an exemplary recommendation system 1100, according to an embodiment of the present principles.

[12] FIG. 12 is a block diagram depicting an exemplary system that has multiple user devices connected to a recommendation engine according to an embodiment of the present principles.

DETAILED DESCRIPTION

[13] The analysis of correlated point process data has wide applications, ranging from biomedical research to network analysis. In one embodiment, we analyze biometric galvanic skin response (GSR) data. User arousal (or excitation) events translate to the generation of specific observed GSR waveforms, as described in an article by Silveira, F., Eriksson, B., Sheth, A., and Sheppard, A., entitled "A biometrics based approach to characterize viewer responses to movies," in ACM Ubicomp Conference, Zurich,

Switzerland, 2013 (hereinafter "Silveira"). Here, we consider GSR observations taken from a number of users exposed simultaneously to a common video stimulus. The ability to accurately characterize the latent stimulus events generating the observed biometric excitation reactions has applications in the areas of, for example, but not limited to, recommendation, market research and advertising. More generally, the present principles are also applicable to point process data from other domains, like neuroscience, biometrics, network data analysis, and ecology.

[14] Early work on analyzing arrival data did not exploit correlation across multiple arrival streams (see Riehle, A., Grun, S., Diesmann, M., and Aertsen, A., "Spike synchronization and rate modulation differentially involved in motor cortical function," Science,

278: 1950-1953, 1997). The work described in an article by Smith, A. and Brown, E., entitled "Estimating a state-space model from point process observations," Neural

Computation, 15:965-991, 2003, introduces a single first-order autoregressive latent state model with a proposed EM-algorithm to estimate parameter values from multi-user arrival data. A more complex hidden Markov Model (HMM) approach using multiple latent states is presented in an article by Escola, S., Fontanini, A., Katz, D., and Paniski, L., entitled

"Hidden markov models for the simulus-response relationships of multistate neural systems," Neural Computation, 23:1071-1132, 2011, wherein the time evolving stimulus has a simple structure and is explicitly known.

[15] In the Gaussian process factor analysis (GPFA) approach as described in an article by Yu, M Byron., Cunningham, J. P, Santhanam, G., Ryu, S. I, Shenoy, K. V, and Sahani, M., entitled "Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity," Journal of neurophysiology, 102(l):614-635, 2009b, the intensities are modeled as a series of correlated neural spike trains by linearly transforming a small set of latent Gaussian processes. By contrast, the binary switching signals in our model captures specific aspects of the latent structure, and our shrinkage approach allows us to infer the number of sources. Finally, the simpler structure of the bsMJP means our model is scalable to longer observation intervals.

[16] Our modeling also relates to previous work on finite state systems in discrete and continuous time. We generalize the Markov-modulated Poisson process (MMPP) (see an article by Scott, S. L. and Smyth, P., entitled "The Markov modulated Poisson process and Markov Poisson cascade with applications to web traffic modeling," Bayesian Statistics, 7: 1-10, 2003, hereinafter "Scott") by allowing correlated Poisson intensities, and by allowing more flexible (i.e., non-exponential) holding times. While we took a shrinkage approach to coupling the latent sources, our methods easily extend to genuine, truncation-free

nonparametric models based on the IBP (Indian Buffet Process).

[17] In all these applications, one observes point process data exhibiting significant variability, or inhomogeneity, over time. In the context of GSR observations from movie stimulus, an intense action scene may have many GSR arousal reactions (i.e., arrivals) in a short time, while a quiet dialog scene may elicit very few arrivals over a potentially long interval. In other applications, the rate of webpage requests can vary with external events, while the spiking of a neuron can vary with stimulus. Without explicit knowledge of the external events, we look to extract the relevant characteristics of these hidden external events, or latent structure, which affect the observed point process data. [18] For a single observed point process, an appropriate modeling framework is that of Markov-modulated Poisson processes (MMPP) (e.g., as described in the Scott reference). This is a doubly-stochastic Poisson process, where the unknown arrival rate is modeled as a realization of a continuous-time Markov jump process (MJP). We observe a number of correlated inhomogeneous Poisson processes, which we couple via a common

low-dimensional underlying process. Specifically, we assume that the arrival rates are described by a small number of binary switching signals, indicating the presence or absence of a relevant external event. Examples for biometric point processes from movie stimuli could include action scenes, dramatic dialog, comedic events, and a main character appearing. We model these switching signals as a collection of continuous-time binary signals linked by a factor-loading matrix to the intensities of the observed processes. Rather than risking under-fitting with a small number of factors, we allow the model to choose from a large number of relevant sources by placing a shrinkage prior on a source loading matrix.

Furthermore, rather than modeling the binary sources as memoryless (as in MMPPs), we allow more flexibility in the dynamics general hazard functions.

[19] FIG. 1 A illustrates an exemplary sequence of point process data based on GSR arousal reactions, wherein each vertical bar represents arrival of a GSR-extracted arousal reaction. The sequence of point process data is recorded when a single user watches a film. Around t = 20 minutes, there is an action scene which elicits dense arrivals. Between t=60 and t=80 minutes, there is a quiet dialogue scene which elicits very few arrivals. Around t=l 10 minutes, the main character appears in the film climax which elicits dense arrivals. Overall we can see that arrivals occur at various rates over time.

[20] FIG. ID illustrates another exemplary sequence of point process data based on GSR arousal reactions for another user. Even though this another user watches the same movie as the user in FIG. 1 A, the point process data is quite different because this another user may have a different taste. For example, around t = 20 minutes, the arrival data is denser, and around t=l 10 minutes, there are fewer arrival data. On the other hand, because both users watch the same movie, there are clearly correlation between these two sequences of point process data. For example, there are very few arrivals at the beginning of the movie, and both have dense arrivals around t=20 minutes. In one embodiment, the correlation between the two users may be used to derive the inherent characteristics about the movie, and the difference between the two users may be used to derive users' different tastes.

[21] As can be seen from FIGs. 1 A and ID, some events in the movie may constitute stimulus to certain users and may elicit many GSR arousal reactions. In the present application, we refer to these events as "stimuli" and "external events" interchangeably. Individual users react to the stimuli differently. To model the reaction of an individual user, we decompose the reaction of the individual user as a function of the latent sources and a user-specific weight loading matrix, wherein the latent sources relate to the inherent characteristics of the stimuli, and the loading matrix reflects users' tastes or preferences associated with the latent source characteristics.

[22] If there are K latent sources, we denote the latent sources by K functions, {s t (t) }, k=l,...,K, where s t (t) represents the arrival of the k* latent source over time. The value of Sk(t) at a given time may be binarized, representing whether or not the k* latent source occurs.

[23] Because users may react to the stimulus differently, we introduce a weight loading matrix to account the variation among the users' preferences. For example, a user's weight may be 0 for a certain type of latent source derived from the stimulus (e.g., action scene) because the user is indifferent to action scenes. However, this particular user's weight to another type of latent source (e.g., main character appearing) may be 1 as the user enjoys the main character.

[24] In the present application, we assume the arrival data can be represented by a function of latent sources ({ s t (t) }) and loading matrix (W), wherein the latent sources relate to the inherent characteristics of the movie, and the loading matrix accounts variation among the users. After decomposing the user arrival data into latent sources and loading factors, we exploit these two latent structures (i.e., the latent sources and the user weights) to better analyze the stimulus content and better cluster the audience members into different groups.

[25] To ease understanding, we summarize some terminologies in the left column of Table 1 , and provide corresponding exemplary explanations in the right column. The explanations are based on the example where users watch the same movie simultaneously and their GSR data is recorded. Table 2 summarizes some abbreviations used in the present application. Table 3 summarizes some notations used in the present application.

Table 1

Term Description (Example terminology)

external events (stimuli) Event that results in user arrival data (Film)

arrival data (observed data, point GSR-extracted arrival data (user reaction, user response) process data)

correlated point process data GSR-extracted arrival data from a number of users exposed simultaneously to a common movie (audience reaction) latent sources Occurrence of a certain type of events (for example,

occurrence of action scene or comedic events)

User-specific weight loading matrix Weight associated with each latent source (User's

(loading matrix, loading factor) preferences, user's tastes)

Table 2

GSR galvanic skin response

MMPP Markov-modulated Poisson processes

MCMC Markov Chain-Monte Carlo

MJP Markov jump process

bsMJP binary semi-Markov Jump Processes

IBP Indian Buffet Process

MGP multiplicative gamma process

HMM hidden Markov Model

GPFA Gaussian process factor analysis

FFBS forward filtering backward sampling

MH Metropolis-Hastings

ACF autocorrelation functions

ESS effective sample size

ROC Receiver Operating Characteristic

[26] The present principles are directed to a method and apparatus for detecting the latent sources and the user-specific loading matrix. In one embodiment, we collect arrival data from one or more users. Based on the collected arrival data (also denoted as observed data), we estimate the latent sources and the user-specific loading matrix, assuming the arrival data follows a Poisson process, by using a modified Markov Chain-Monte Carlo (MCMC) algorithm. Table 3

Vu,- entire arrival stream of user u

Yu{t) instantaneous rate

x u user-specific base Poisson rate

Sfc (t) latent sources

W loading matrix

h^ (v), ((¾ (1) (v)) hazard function giving the rate of transition from state

O-to-l(l-to-O), v time units after entering state 0 (1)

7T fc an distribution over initial states

shape parameters of Weibull distribution for state s

S ) scale parameters of Weibull distribution for state s

[27] We use audience viewing a common film as an example in the above explanations, but the present principles can be applied to any observation of arrival data generated by a common external event.

[28] Mathematically, we wish to estimate the latent source and the user-specific loading matrix from a collection of U sequences of events over an interval [0, T] . For "user" u G {1, ■■■ , U], we denote the set of arrival times as {y u }, with y u being the time stamp of the j -th event in stream u (i.e., the sequence of events for user u) and y u ., representing the entire arrival stream of user u. Each sequence y u> . is an inhomogeneous Poisson process with instantaneous rate y u (t). The latter is expressed as a user-specific base Poisson rate λ η modulated by a stimulus-determined function over time. As discussed above, a user' s reaction is based on the latent sources in the stimuli and the users' preferences. Mathematically, we can represent the arrival rate for a particular user as a function of the latent sources and the users' preferences. The rates of these U Poisson processes share information through K binary latent sources s k (t) with k G {1,— , K}. Below, we use s.(t) to denote at time t, the column vector consisting of all K sources s k (t). Calling the loading matrix W (with w u G M. K , a row vector specific to user u), we have yu,- ~ ?ΠΥη(·)), = i, - , u (l) Yu ) = u exp(w u s. (t)), t E [0, 7] (2)

Here, u represents the baseline arrival rate which captures the underlying variability of the arrival rates between users. Meanwhile, the elements of w u indicate how relevant each of the K sources is to user u and may assign different weights to different sources. The weights also vary with users, and thus are user specific parameters. Our goal is to estimate both these user specific weighting parameters as well as the latent sources s k (t) from the set of arrival times {y u }.

[29] In one embodiment, we model each of the K latent sources as a binary signal, switching on and off depending on whether the associated external characteristic is active. While it is simplest to model the latent sources, s k (t), as Markov jump processes (MJPs), the resulting memory less property can be inappropriate, allowing unnecessary switching between states. Thus, we model these as binary semi-Markov Jump Processes (bsMJP) (see an article by Feller, W., entitled "On semi-markov processes," Proceedings of the National Academy of Sciences of the United States of America, 51(4):653, 1964). Realizations of a bsMJP are right-continuous, piecewise constant binary functions where, unlike an MJP, the rate of state transitions vary with the time since the last transition. This is formalized by a hazard function (v) , giving the rate of transition from state 0-to-l, v time units after entering state 0 (or (1— to— 0) for i^ (v) is similarly defined, and we do not allow for self-transitions). F is a constant, resulting in a memoryless property; in contrast, by making values for small values of v, we can discourage the system from leaving a state it has just entered. In one embodiment, we assume h belongs to the Weibull family, with

(3) This corresponds to the interval length between two state transitions following a Weibull distribution

(s) (s)

and are the shape and scale parameters of the Weibull distribution for state s.

Setting to 1 recovers the exponential distribution (and our switching process reduces to an MJP).

[30] The hazard functions govern the dynamics of the bsMJP; we also need a distribution n k over initial states. Then, we have

s k (-) ~ bsMJP(n k , (4)

[31] In most applications, the number of latent sources is unknown, and must be inferred. We may place a prior on this quantity, and try to infer it from the data. A more flexible approach is a nonparametric solution: allow for an infinite number of potential sources, with each user influenced by a finite, and unknown, subset of them. This corresponds to a binary matrix with infinite columns, and with element (u, k) indicating whether or not the bsMJP k is relevant to user u. A popular prior on the resulting binary association matrix is the Indian Buffet Process (IBP) (see an article by Griffiths, T. L. and Ghahramani, Z., entitled "Infinite latent feature models and the Indian buffet process," In Advances in Neural Information Processing Systems, volume 18, 2006). However, despite its elegance, inference of the IBP is complicated and the resulting Markov chain Monte Carlo algorithm mixes poorly as it explores the combinatorial space of binary matrices. This raises a need for alternate approaches to flexibly model the unobserved latent structure. [32] We control the complexity of the factor loading matrix using a multiplicative gamma process (MGP) shrinkage prior (see an article by Bhattacharya, A. and Dunson, D., entitled "Sparse bayesian infinite factor models," Biometrika, 98(2):291-306, 2011). Rather than placing a spike-and-slab prior on the columns of W, we model each row vector of the factor loading matrix (w u £ M. K ) as a collection of Gaussian variables increasingly concentrated around the origin. Specifically,

w u ~ Τ θ, Λ-^, Λ = diagCi-L Tz, - )

T k = ξ ι , ξ ι ~ Gamma(ct:, 1) (5)

[33] By choosing > 1, the {¾ are on average larger that 1, encouraging r k to increase with k. This in turn causes the amplitudes of the w uk to shrink close to (while not exactly equaling) 0. The construction results in stochastically ordering the latent sources, with early components having large variance and corresponding to sources relevant to most users. As k increases, these weights are shrunk more and more strongly towards zero, while still allowing a few to escape this pull (allowing us to model sources specific to certain users). Thus, we potentially obtain an infinite number of latent sources {s k (t)}, stochastically ranked by their contribution to the rate function {y u (t)}. In one embodiment, we truncate at a sufficiently large K.

[34] Finally, we describe prior distributions placed over the remaining variables. We model the base Poisson rate X u of each user as independent gamma variables:

u ~ Ga(c, d) (6)

[35] Using standard Bayesian plate notation, FIG. 4 illustrate mathematical models used in the present application. Smaller squares indicate fixed deterministic parameters, while circles indicate random variables. Filled-in shapes indicate observed, known values. The plate indication [K] means the variables inside the specified plate are vectors of size K. [36] Model Inference

[37] To infer the numerous variables in our model, the central problem reduces to sampling from the posterior given the Poisson arrival times, {y u }. A standard challenge in continuous -time models is the problem of posterior inference. To solve, we adopt a Markov Chain Monte Carlo (MCMC) algorithm combined with a continuous-time thinning-based sampling approach. In summary, we wish to infer the posterior distribution over the variables {{w u }, {¾(·)}, {λ η } > < } < }}· Our algorithm is a Gibbs sampler, sampling each variable conditioned on the remaining variables. For ease of notation, we use ρ(· I ~) to denote the posterior distribution of one variable conditioning on all other variables.

[38] FIG. 2 illustrates an exemplary method 200 for determining latent sources and loading matrix. Method 200 starts at step 205. At step 210, the arrival data from each user is gathered, and we repeatedly sample (220) possible bsMJP trajectories of the latent sources using our modified MCMC technique. After specified number of iterations (240), the posterior distribution for all the unknown variables is estimated (230). Method 200 ends at step 299.

[39] FIG. 3 illustrates an exemplary method 300 for estimating the parameter values of all the non-bsMJP variables in one embodiment. At step 310, we perform Gibbs sampling. At step 320, we infer factor loading matrix based on a Metropolis-Hastings (MH) approach. At step 330, we infer hyperparameters β, μ, ξ and λ, wherein a hyperparameter refers to a parameter of a prior distribution. Step 330 may be performed before step 320. Method 300 can be used in step 230 for determining the loading matrix variables and other parameters. In the following, we discuss inference of hyperparameters (330) and inference of factor loading matrix (320) in further detail. (s) (s)

[40] Inference of hyperparameters , μ , ^ fe , u )

[41] From the mathematical model in FIG. 4 and our choice of prior distributions, sampling from the resulting posteriori distributions results in inference of the

hyperparameters of our model.

[42] We apply the following transformation: β' = β, μ' = μ ~ Ρ , to obtain a new representation of the Weibull distribution with p.d.f. and c.d.f for the length of time between state changes, v,

[43] Using these representations, a gamma prior on μ' is the conjugate, and with hyperparameters (e, /), we can directly sample μ' from the posterior, PQi

Here A k 1 = φ ί +1 - φ ί .

[44] The components of the MGP shrinkage prior, {¾, can be sampled from the conditional posterior:

[45] We place a uniform prior for β on a set of discretized grid points and sample from the candidates according to their posterior mass.

[46] Finally, the base arrival rates for each user, { u }, can be calculated from counting the number of arrivals for each user, {y u }.

[47] Inference of Factor Loadings (w u ) [48] The Gaussian prior on w u and the nonlinear likelihood in Eq. (2) result in a nonstandard posterior distribution. While a Metropolis-Hastings (MH) approach could be performed to infer the factor loadings, we instead use the following proposal distribution specifically tailored to our model in order to converge faster.

[49] Noting that p(w u | ~) is log-concave, we use Newton' s method to find the MAP w with gradient and Hessian

(11) g(Wu) =∑ s - (y uj ) - j e xp(w u s.(t))A u s.(t)dt - Aw£

7=1

(12) (wJ = - / e xp(w u , (t ))A w ,( 5 .( ^ - A

o

Here, N u is the number of arrivals observed for user u. Because latent sources, s k (t), are binary functions, the integrals above reduce to finite summations and are easily computable. Thus, at each iteration of the sampler, we propose

where q is a tuning parameter, which we set in the experiments to 5.

[50] Specific to Metropolis-Hastings inference is the need to accept or reject samples from this distribution with some probability (to avoid fitting poorly to the true distribution). Given our model , we state that the new proposal is accepted with probability ( 1 4 )

We sample from the distribution in Eq. (13) until an accepted is found (with probability drawn from Eq. (14)).

[51] Inference of Latent Sources (s k (·))

[52] Here we discuss the inference of the binary, continuous -time latent sources, {¾(·)}. Modeling dynamics in continuous-time is a difficult challenge, and prior methods often depend on performing inference on a single discretized sampling of the continuous-time data, which has the possibility of discarding valuable data points. By contrast, we take repeated discretizations of the continuous-time data, with sampling points informed by prior sampling observations. This repeated refinement approach allows for faster and more accurate estimation of the true underlying latent sources.

[53] Given our specified bsMJP model, an estimated latent source s k is entirely determined by the set of transition times 0 fe = {φκ^, ··· , φ ¾,π¾ } > an d the states evaluated at these times {s k (t), t £ ø ¾ .}. The bsMJP model is parameterized by a hazard function h k (v , where at v time-units after entering state 0, the rate of transitioning from state 0 to 1 is given by for state 1). We define an equivalent continuous-time system but with self-transitions, occurring with constant rates of Ω^ and Ω^ for states 0 and 1. These self-transitions will serve as auxiliary variables facilitating the easy resampling of new state values. This offers a clear distinction from prior work described in an article by Rao, V. and Teh, Y., entitled "MCMC for

continuous -time discrete state systems," In NIPS Conference, Lake Tahoe, NV, 2012, where a multiplicative bound on the transition rates resulted in self-transition rates depending on the current holding time.

[54] Using these transition probabilities, we initially describe our generative process for sampling a feasible bsMJP-generated latent source. For state s, candidate transition times

(whether self-transitions or not) are drawn from a final hazard function H k (v) = h k (v) +

Cl k . From this hazard function, candidate state events can be determined sequentially. For example, assuming I time units have passed since the last state transition (when we (s) entered state s), then, we sample the next candidate event time from (·), conditioning on it being larger than I. Given sampled value I + Δ, we advance time by Δ, and assign a transition out of state 5 with probability -fe (otherwise this event is treated as a self-transition). We update the value of /, and repeat this process until the current time exceeds the specified time limit, T. Algorithm 1 described in Table 4 gives details of this generative process to sample a bsMJP latent source ({■¾(·)}) from the hazard functions

Table 4

Algorithm 1 Generative process for a K-dimensional bsMJP path in [0, T]

Input Hazard function of each state and each latent feature h ok (-),h lk (-),k = 1,···,Κ, constant hazard rates Ω , Ω , and initial state distribution π 0 .

1: while k G {1,2,-, K] do

2: Initialize l 0 = 0,i = 0,<> fe0 = 0, φ ¾ = s k ($ kfl ~ π 0 ,

3: while <j> k i < T do

4: increment i

5: Sample Δ; ~ H S ~ k ^ k ._ i)k (-). Set j) ki = <p kii _ t +

6: Draw δ ~ Unif (0, 1)

7: if S< W^^ ! ^O then

h¾(0fc,i-i .fc (i i-_ 1+ldi +i2 ¾(0fc,i-i^

8: Set li = 0, 5 fe (0 fe i ) = 1 - s k ^ kil _ , <f> k = <f> k U [φ^]

9: else

10 Set li = + Δ ί+1 ,5 ¾ (φ ) = 5 ¾ (φ _ ! )

11 end if

12 end while

13 Φ¾ = Φ¾ u ' }» {Φ¾< 5 fe = f G 0fe) is a generated bsMJP path ,

14 end while

Output A K-dimensional sMJP path (<> fc , s k (<p k )}

[55] From this construction above we denote the set of self-transitions as 0 fe , with the set of actual transitions given by 0 fe . It follows that given the current bsMJP trajectory s k , the set 0 fe is conditionally distributed as an inhomogeneous Poisson with piecewise-constant In particular, whenever the system is in state s, the Poisson rate k ' and we can reconstruct 0 fe .

[56] Denote all the candidate times as <t> fe = 0 fe U 0 fe . Having introduced 0 fe , we sample a new path, now restricting to paths that change state at some subset of <t> fe (rather than searching over all paths with all possible transition times). Consequently, sampling a new path conditioned on <t> fe amounts to reassigning labels "transition" or "self-transition" to each of the elements of <t> fe . Our problem now reduces to sampling a trajectory of discrete-time model, and can be carried out using a forward filtering backward sampling (FFBS) algorithm (see an article by Fruwirth-Schnatter, entitled "Data augmentation and dynamic linear models," J. Time Ser. Anal., 15: 183-202, 1994). The forward filtering is defined as:

and the backward sampling is defined as:

° (% f c ,i, 0 f c ,i+i ) κ Π«=ι exp(w uk SiN ui exp(- J* i+1 exp(w u s(t)) where the waiting time likelihood, P(A i+ 1 \s k jZj) can be computed using

P(A i+1

The transition probability, P(s k i , l t can be computed as

[57] We cycle through all K sources, resampling each bsMJP path s k conditioned on all other variables. To do so, we use the algorithm as described before: for each source, conditionally introduce the Poisson distributed thinned events 0 fe , and then conditionally resample the new trajectory using the modified FFBS algorithm. [58] Experiments

[59] We consider two synthetic datasets of arrival data to study our model and sampler, before considering a biometric application. For the latter, we assess the model' s ability to capture and explain observed explicit user feedback (i.e., user ratings). As a baseline, we used the kernel methods to estimate the inhomogeneous Poisson rates, and thus the underlying model parameters (e.g., instantaneous rate, factor loadings, etc.). We found that with a carefully chosen kernel width, the kernel shape does not matter much, and for simplicity, we used a uniform kernel (giving a time-discretized binning method). A bin size was selected after testing multiple sizes.

[60] For our first dataset, we generated K = 3 bsMJP paths over an interval of length T with initial state distribution n k = [0.5,0.5] T and Weibull hazard rates. Weibull shape and scale parameters were uniformly drawn from [1,5] and [50,100] , while each row of the loading matrix W was independently generated from a standard normal, N(0 K , I K ) , where 0^- is a A ' -dimensional vector of zeros, and I K is the K x K identity matrix. The columns of W were reordered, so that the energy contained in w. k decreased monotonically. Our observations consisted of U = 12 sequences of event arrivals, each with a base Poisson rate X u drawn from a truncated Gaussian distribution centered at λ 0 with small variance (« λ 0 ). As explained below, we varied both the average base rate, λ 0 , and the observation time length, T. For inference, the fixed hyperparameters of the sampler were set as: a = 3, c = d = e = f = 10 "3 , and n k = [0.5,0.5 . We ran 5000 MCMC iterations of our MCMC sampler, discarding the first 2000 as burn-in, with posterior samples collected every 5 iterations. The running time of a typical trial (with T = 1000 and about 120 event arrivals for each user) was about 3000 seconds with unoptimized Matlab code on a computer with 2.2GHz CPU and 8GB RAM. [61] To evaluate the mixing behavior of the sampler, we use R-coda to compute the effective sample size (ESS), as well as Markov chain autocorrelation functions (ACF) of various model parameters. Table 5 shows these statistics, with the ACF shown for the parameters A l 5 w l and n l 5 the number of transitions of the first latent source. These numbers are typical of MCMC samplers, and show the sampler mixes well. Table 5. Effective sample size (ESS) and autocorrelation function (ACF) of the base rate (λ^,

number of transitions of source s k (t) (n fe ), and the factor loadings (w uk )

PARAMETERS X u n k Wuk

ESS/ITERATION 0.046 0.278 0.161

ESS/SECOND 0.076 0.467 0.266

ACF (LAG 5) 0.609 0.101 0.225

ACF (LAG 10) 0.415 0.049 0.100

ACF (LAG 50) -0.047 -0.016 -0.039

[62] Instantaneous Rate Estimation: One of the main advantages to the proposed model is the ability to exploit correlations across streams of observed arrival data by considering more than a single data stream of arrival data. This is especially important when the base arrival rate of each user is quite low. In this experiment, we examine the ability to accurately recover {y u (t)}, the instantaneous arrival rate of each user, choosing the mean of base rates λ 0 from values in (0.01,0.02,0.05,0.10,0.20). We keep the observation time length constant at T = 2000.

[63] We measured the estimation error of the instantaneous rate by discretizing the interval [0, 7] using N = 1000 evenly spaced grid points. We compute the posterior mean estimation error at each grid point, normalizing with respect to that point's true rate, and record the average normalized estimation error for 15 repeats. Rate inference is performed using both the proposed model, and the binning approach, as described in an article by Zhang, T., and Kou, S. C, entitled "Nonparametric inference of doubly stochastic Poisson process data via the kernel method," The annals of applied statistics, 4(4): 1913,2010, using bandwidth values of 1,3,10 times the inverse of mean arrival rate. The results are shown in FIG. 5.

[64] For very low base arrival rates (i.e., λ 0 = 0.01) all methods perform similarly. As λ 0 increases, our model performs significantly better than the competing binning method. For a mean base rate of λ 0 = 0.05, we find that the error rate is less than half the error rate of the best choice of binning bandwidth.

[65] Factor Loading Matrix Inference: As FIG. 5 suggests, at low arrival rates there is not enough information to recover the instantaneous Poisson rates (and thus the state of the latent sources). This is shown in the FIG. 6A: for base rates λ 0 = (0.01,0.02,0.10), as T increases (taking values in T cand = (200,500,1000,2000,5000)), the error in the latent sources (estimated over a grid as in the previous section) increases slightly: this is because the 'number of variables' in a bsMJP path increases with T.

[66] In such situations, it is still of interest of estimate parameters like the factor loading matrix W: even if we cannot reconstruct exactly what a user had in mind at any previous time, we would still like to characterize their behaviour to make predictions in the future.

FIG. 6B shows that the estimation error of W decreases monotically as the observation interval increases, implying that these parameters can be recovered even if the posterior distribution over latent sources never concentrates. Here, for each posterior sample, W , the

· , l|iv-w|| 2 estimation error with respect to the true factor loadings, W, is computed as — .

[67] Deviation from Markovianity: The Weibull distribution includes the exponential as a special case, and setting the shape parameter β to 1 reduces the latent sources to memoryless binary MJPs (bMJPs). To demonstrate the flexibility afforded by this parameter, we consider latent sources that are square waves, switching between T and '0' at a fixed frequency. FIG. 7 A shows the ground truth, and FIGs. 7B and 7C show inferences over a latent source using a bMJP and a bsMJP prior, respectively. We see that the state intervals inferred by bMJP are more irregular, showing unnecessary switching between states. For the bsMJP, we placed a uniform prior on [1,5] on the shape parameter, allowing the model to estimate the state persistence. FIG. 7D shows the posterior over this parameter places significant mass away from 1, forcing more regular state holding times.

[68] Latent Factor Number Estimation: In many applications, the number of latent sources is usually unknown a priori. Our MGP shrinkage prior allows us to infer the number of dominant sources. Again, we vary the observation length from T cand =

(500,2000,5000), set the base arrival rate λ 0 = 0.10, and set the true number of latent sources as K = 3. When doing inference, we truncate the number of sources, picking a large enough number K + = 10 to avoid under- fitting. Though the sampler is not sensitive to a, a setting → 1 leads to a higher chance of sampling precision sequences which are not monotonically increasing. As mentioned before, is set as 3 in our experiments. For each posterior sample, we identify the relevant sources by thresholding the associated weight-vector w. k , picking the smallest collection of weights containing 90 percent of total energy of W . FIG. 8 demonstrates the behavior of the posterior distribution with respect to the inferred number of latent sources. We find that as an increasing number of observations are available, the posterior mass quickly concentrates around the true value K = 3. [69] We also apply our model to a dataset from a real-world biometrics application. This dataset was collected with 10 volunteers (users) watching a feature-length film while wearing Galvanic Skin Response (GSR) sensors (the Affectiva Q-Sensor, (aff, 2012)) measuring skin conductance. Using a current state-of-the-art decomposition approach as described in the Silveira reference, we extract user arousal responses from the observed skin conductance signals to obtain arrival data for each user across the two-hour, eighteen minute film. As shown in FIG. 1 , a typical user has about 150 event arrivals during the recording, similar to the λ 0 = 0.02 scenario in the synthetic experiments.

[70] Below, we analyze users' arousal level based on their GSR signals. We also analyze the similarity between users and explore dominant latent events in the movie using our model. We validate our analyses using two types of explicit feedback obtained from this experiment. First, all the users were asked to recall 10 scenes from the film and rate their arousal intensity from 1 to 10 for each scene ("1" being "tedious" and "10" representing

"exciting/tense"). Second, the general rating of the movie from 1 to 5 was also collected, indicating the user's overall opinion of the film ({"Poor", "Fair", "Good", "Very Good", "Excellent"}).

[71] We estimate the rate function, factor loading matrix, and latent sources using MCMC samples from our model. As discussed before, a non-informative prior Ga(10 ~ 3 , 10 ~ 3 ) is put on A u and , and uniform prior on [1,5] is put on . The maximum number of latent sources is set as = 10 and = 3. Producing 10000 samples took about 12000 seconds.

[72] Instantaneous Rate Inference: To qualitatively demonstrate the advantage of sharing information across users, we ran our model using both multiple traces (U = 10) and a single trace (U = 1) (both with a maximum of K = 10 latent sources). FIGs. IB and 1C show posterior mean of the estimated rate function (along with an uncertainty interval equal to one posterior standard deviation) for the shared and separate cases. We see that using multiple trace information gives estimates of the rate function that are piecewise stable, unlike the single trace case where the estimated rate function tends to follow the empirical Poisson rate. The inferred hazard function tends to be simpler in terms of state-changes as well, as it is constrained to explain more data. Such information is useful to extract responses to common stimulus from user-specific responses, and we look at this in more detail below.

[73] To quantitatively measure our rate estimation, we correlated our estimated rate values with the explicit arousal feedback for the 10 scenes. We transformed the recalled arousal intensity (1 -10) into binary labels by comparing each to the average arousal intensity of each user (such that each user has 5 scenes with a "0" class label and five scenes with a "1" class label). Using these binary labels as ground truth, we compare the instantaneous rate from the posterior mode estimate of our proposed method against that derived from the binning approach and a Markovian version of our methodology using bMJP. For all approaches, we evaluate the inferred rate function at time points corresponding to the 10 recalled scenes for all users and then inferred the binary labels at those time points by classifying all scenes with inferred instantaneous rate above a specified threshold as being from the "1" class. Varying the threshold, we plot the ROC (Receiver Operating Characteristic) curves in FIG. 9A.

[74] As shown in FIG. 9, the instantaneous rate inferred by the proposed model conforms to the user explicit feedback better than the rate estimated via the binning approach and the simplified model with bMJP. Specifically, our proposed algorithm is able to correctly classify almost 40% of the user explicit scene ratings with no false alarms, while the binning approach only classifies 10% of the scenes with no false alarms. [75] Factor Loading Matrix Inference: Each user has their own factor loading vector which can be used to calculate the distance between pairs of users. Thus, we compute the pairwise Euclidean distance between users using the posterior mode estimate of the factor loading matrix. We then test how well this user-similarity metric predicts the user ratings. Using all 45 possible pairs of users, we plot two sets of ROC curves: in the first, we compare pairs of users with the same rating (e.g., both users rate the film at "4") versus users that differ by a single rating point (e.g., "3" and "4"); in the second, we compare pairs of users with the same rating versus users that differ by two ratings points (e.g., "3" and "5"). As illustrated in FIG. 9B, the proposed method does well with predicting user rating similarity, with the ability to classify over 55% of the users with the same rating from the set of users two ratings apart, with no false alarms.

[76] Latent Source Inference: Finally, we analyze the dominant latent sources underlying the observed arousal responses returned by our inferences. FIG. 10A shows the posterior distribution over the possible number of dominant sources, defined as the minimum number of columns of the factor loading matrix containing 90 percent of total energy. The posterior mass concentrates around 4 and 5, and we plot the 5 dominant latent sources in FIG. 10B. The first source is an offset of the baseline Poisson rate. We found the following 4 sources had fairly clear interpretations. For the second source, the elements in the corresponding column of the factor loading matrix are all positive, indicating this factor enhances arousal intensity. Specifically, this is activated at around 20 minutes for scenes about a plane crash, around 55 minutes and 65 minutes for key turning points of the plot, and around 115 minutes and 125 minutes for a climax and a surprising denouement respectively. Taking the third source as another example, both positive and negative factor loadings exist among all users, indicating this factor enhances arousal intensity for some of users but suppresses it for others. This is activated for the scene when the main actor first meets the main actress, and for the occurrence of their last dialogue. Such information can be used along with user information to better understand users, the stimuli, and the interaction between the two.

[77] There can be a number of variations and extensions to our modeling choices. While we placed a multiplicative gamma shrinkage prior on the factor loading matrix. We can allow added structure to the loading matrix W (e.g., clustering its rows, to infer clusters of users), or allow W to vary with time (modeling the evolution of a user's tastes). Another extension is to incorporate user-covariates like age, sex, or profession to construct more elaborate extensions of our model. Ultimately, the goal is not just to understand users and stimuli, but to use models like ours to adaptively modify the stimulus by monitoring user response (for example, placing advertisements or removing part of a movie content based on the user's preferences). This is central to applications ranging from brain-computer interface to recommendation and targeted advertising.

[78] FIG. 11 illustrates an exemplary system 1100 that uses biometric data from multiple (U) users to improve recommendation. Data collection unit (1112) collects biometric data (yij from user 1, for example, GSR data when user 1 watches a movie. Similarly, data collection unit (1114) collects biometric data (yu ,. ) from user U. These data collection units may be situated in separate devices or may be integrated in one device. Latent source detector 1120 detects latent sources, for example, the occurrence of events such as action scenes or main character appearing. In addition, latent source detector 1120 also estimates the relevance of each latent source to an individual user. Knowing where the stimuli occur in the movie and how a user responds to the stimuli, recommendation engine 1130 may generate recommendation more specific to the user, for example, around an event that a user has a higher response weight (i.e., a higher preference). The recommendation engine may make recommendation to the K users, as well as other users whose biometric data are not collected.

[79] FIG. 12 illustrates an exemplary system 1200 that has multiple user devices connected to a recommendation engine according to the present principles. In FIG. 12, one or more user devices (1210, 1220, 1230) can communicate with recommendation engine 1240. The recommendation engine is connected to multiple users, and each user may communicate with the recommendation engine through multiple user devices. The user interface devices may be remote controls, smart phones, personal digital assistants, display devices, computers, tablets, computer terminals, digital video recorders, or any other wired or wireless devices that can provide a user interface.

[80] The recommendation engine 1240 may implement method 1000, and it may correspond to recommendation engine 1130. The recommendation engine 1240 may also correspond to other modules in recommender system 1200. Recommendation engine 1240 may obtain latent sources or the user-specific weight loading matrix from biometric data analyzer 1270. Recommendation engine 1240 may also interact with social network 1260, for example, to determine social influence or to suggest new friends based on similar reaction to content. Recommendation item database 1250 contains one or more databases that can be used as a data source for recommendations items, personalization of content, or feedback in content creation. [81] In one embodiment, a user device may request a recommendation to be generated by recommendation engine 1240. Upon receiving the request, the recommendation engine 1240 analyzes the users' inherent interests (for example, obtained from the requesting user device or another user device that contains user profiles), users' social interactions (for example, through access to a social network 1260) and users' interactions with the recommender system. After the recommendation is generated, the recommendation item database 1250 provides the recommended item to the requesting user device or another user device (for example, a display device).

[82] In addition to recommendation, the present principles can also be applied to, for example, but not limited to, in personalization of content, feedback for editing/modification of content.

[83] In the above discussion, the latent sources are binarized. However, the present principles can be applied when the latent sources can take a finite number of discrete states. The modified inference methodology would proceed as specified here with minimal modifications.

[84] The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.

[85] Reference to "one embodiment" or "an embodiment" or "one implementation" or "an implementation" of the present principles, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one implementation" or "in an implementation", as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment. [86] Additionally, this application or its claims may refer to "determining" various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.

[87] Further, this application or its claims may refer to "accessing" various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information. [88] Additionally, this application or its claims may refer to "receiving" various pieces of information. Receiving is, as with "accessing", intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, "receiving" is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.

[89] As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry the bitstream of a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.