Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
BAYESIAN CONTENT DISCOVERY
Document Type and Number:
WIPO Patent Application WO/2014/088564
Kind Code:
A1
Abstract:
A Bayesian learning based content discovery system uses an adaptive searcher and a belief determinator to adapt a set of user questions to be asked of a user. When the user answers a set of questions, the responses are used by the belief determinator to update the probability of finding a target item in a subsequent search using the set of questions. The updated belief is then used by the adaptive searcher to adapt the set of questions for subsequent searches

Inventors:
KVETON BRANISLAV (US)
BHAMIDIPATI SANDILYA (US)
WEN ZHENG (US)
Application Number:
PCT/US2012/067859
Publication Date:
June 12, 2014
Filing Date:
December 05, 2012
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
THOMSON LICENSING (FR)
KVETON BRANISLAV (US)
BHAMIDIPATI SANDILYA (US)
WEN ZHENG (US)
International Classes:
G06N7/00
Other References:
Z. WEN, B. KVETON, B. ERIKSSON, S. BHAMIDIPATI: "Sequential Bayesian search", PAPERS ACCEPTED FOR THE 30TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML'13), TO BE HELD 16-21 JUNE 2013, 19 March 2013 (2013-03-19), XP055075569, Retrieved from the Internet [retrieved on 20130816]
Z. WEN, B. KVETON, S. BHAMIDIPATI: "Learning to discover: a Bayesian approach", PAPERS ACCEPTED FOR THE NIPS 2012 WORKSHOP ON BAYESIAN OPTIMIZATION & DECISION MAKING, TO BE HELD ON 7 DECEMBER 2012, 1 December 2012 (2012-12-01), XP055075564, Retrieved from the Internet [retrieved on 20130816]
M. NAGHSHVAR, T. JAVIDI: "Sequentiality and Adaptivity Gains in Active Hypothesis Testing", ARXIV:1211.2291V1, 10 November 2012 (2012-11-10), XP055075851, Retrieved from the Internet [retrieved on 20130816]
L. YANG, S. HANNEKE, J. CARBONELL: "A theory of transfer learning with applications to active learning", MACHINE LEARNING, vol. 90, no. 2, 7 July 2012 (2012-07-07), pages 161 - 189, XP035163789, DOI: 10.1007/S10994-012-5310-Y
G. BELLALA: "Information and decision theoretic approaches to problems in active diagnosis", PH.D. IN THE UNIVERSITY OF MICHIGAN, 15 June 2012 (2012-06-15), XP055075869, Retrieved from the Internet [retrieved on 20130816]
R. WAEBER, P. I. FRAZIER, S. G. HENDERSON: "A Bayesian approach to stochastic root finding", PROCEEDINGS OF THE 2011 IEEE WINTER SIMULATION CONFERENCE (WSC'11), 11 December 2011 (2011-12-11), pages 4033 - 4045, XP032112293, DOI: 10.1109/WSC.2011.6148093
SANJOY DASGUPTA: "Analysis of a greedy active learning strategy", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, vol. 17, 2005, pages 337 - 344
ROBERT NOWAK: "The geometry of generalized binary search", IEEE TRANSACTIONS ON INFORMATION THEORY, vol. 57, no. 12, 2011, pages 7893 - 7906, XP011389100, DOI: doi:10.1109/TIT.2011.2169298
THOMAS COVER; JOY THOMAS: "Elements of Information Theory", 2006, JOHN WILEY & SONS
DASGUPTA; DANIEL GOLOVIN; ANDREAS KRAUSE: "Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization", JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, vol. 42, 2011, pages 427 - 486
MACHINE LEARNING, vol. 47, 2002, pages 235 - 256
Attorney, Agent or Firm:
SHEDD, Robert, D. et al. (2 Independence Way Suite 200Princeton, New Jersey, US)
Download PDF:
Claims:
Claims:

1. A system that discovers content, comprising:

a searcher that provides at least one adaptive question to find an item and receives a response to each question, wherein subsequent adaptive questions are adapted based on learning from prior searched items; and

an updater that calculates and updates a probability of finding an item based on the at least one adaptive question, wherein the probability is determined using Bayesian learning and provided to the searcher to adapt at least one adaptive question.

2. The system of claim 1, wherein the system increases a ranking of a question for a subsequent related item search based on responses to a current item search.

3. The system of claim 1, wherein the searcher determines a search result based on at least one item remaining in a search space that satisfies the adaptive questions.

4. The system of claim 1, wherein the system is a content discovery guide for movie content.

5. The system of claim 1 , wherein the questions are binary questions.

6. A method for content searching, comprising:

providing a first set of questions to find an item;

receiving a response to each of the questions until the item is found;

updating a probability of finding an item based on the found item using Bayesian learning; and

adapting a second set of questions based on the updated probability.

7. The method of claim 6 further comprising:

increasing a ranking of a question for a subsequent item search based on responses to a current item search.

8. The method of claim 6 further comprising:

determining a search result based on at least one item remaining in a search space that satisfies the adaptive questions asked of a user.

9. The method of claim 6 further comprising:

providing a first and second set of questions that are binary questions.

10. A system that discovers content, comprising:

a means for providing a first set of questions to find an item;

a means for receiving a response to each of the questions;

a means for updating a probability based on the found item using Bayesian learning; and

a means for adapting a second set of questions based on the updated probability in a subsequent search.

11. The system of claim 10 further comprising:

a means for increasing a ranking of a question for a subsequent item search based on responses to a current item search.

12. The system of claim 10 further comprising:

a means for determining a search result based on at least one item remaining in a search field that satisfies the adaptive questions asked of the user.

Description:
BAYESIAN CONTENT DISCOVERY BACKGROUND

[0001] Extensive online collections of content exist, such as restaurants, movies, music, and others. Items in these collections are typically described by many attributes, some of which are predefined and common, and others that are freely generated by users, such as reviews. The problem of content discovery in this environment has been traditionally approached in two ways. In one approach, the user types attributes of interest into in a search box and a system returns a ranked list of items that are most relevant to the user's query. In the other approach, the user navigates through a series of menus with item categories and progressively refines the selection to a list of desired items. Most recommendation websites combine both approaches.

SUMMARY

[0002] Content discovery is modeled as an interactive game. In this game, the user has a specific target item on their mind, which is drawn i.i.d. from some probability distribution. In each turn of the game, the system asks a (binary) question, the user answers, and this process is repeated until the system identifies a single item that satisfies all user's answers so far. The questions are generated with respect to the expected probability of drawing target items. This distribution is updated after each game. So when the user searches for content the next time, the questions are better tuned to what the user may look for and their number is smaller.

[0003] The above presents a simplified summary of the subject matter in order to provide a basic understanding of some aspects of subject matter embodiments. This summary is not an extensive overview of the subject matter. It is not intended to identify key /critical elements of the embodiments or to delineate the scope of the subject matter. Its sole purpose is to present some concepts of the subject matter in a simplified form as a prelude to the more detailed description that is presented later.

[0004] To the accomplishment of the foregoing and related ends, certain illustrative aspects of embodiments are described herein in connection with the following description and the annexed drawings. These aspects are indicative, however, of but a few of the various ways in which the principles of the subject matter can be employed, and the subject matter is intended to include all such aspects and their equivalents. Other advantages and novel features of the subject matter can become apparent from the following detailed description when considered in conjunction with the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] FIG. 1 is an example of a restaurant discovery guide on a smartphone.

[0006] FIG. 2 is an example of a movie discovery guide on a smartphone.

[0007] FIG. 3 is an example of a system that utilizes Bayesian learning to provide a user's target item.

[0008] FIG. 4 is another example of a system that utilizes Bayesian learning to provide a user's target item.

[0009] FIG. 5 is a method of applying Bayesian learning to content discovery.

DETAILED DESCRIPTION

[0010] The subject matter is now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the subject matter. It can be evident, however, that subject matter embodiments can be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing the embodiments.

[0011] In generalized binary search, the probability of items being of a user's interest is known. However, this is not true in many real-world scenarios. For instance, in the problem of content discovery, this would correspond to knowing the complete profile of the user, which is more often not the case. This limitation is addressed while utilizing generalized binary search as a framework for modeling interactive search. It is assumed that the probability with which searched items are drawn is unknown. An efficient technique is employed that learns this probability over time as the user interacts with the search engine. The techniques are Bayesian optimal, and its regret increases only sublinearly with time.

[0012] The number of searchable items is typically very large. Therefore, the items are organized in a tree-like taxonomy, where each node corresponds to a selection in a menu done by the user (FIG. 1 and 2). In an example cell phone discovery guide interface 100 of FIG. 1, for example, restaurant review website users can choose from over 100 cuisines - but only 9 can be shown simultaneously 102 on the screen of a smart phone. Additionally, in many interfaces, like a movie discovery guide interface of a cell phone 200, the selections 202 are limited to the screen size and the user is shown movies that satisfies the criteria but also includes many unwanted items.

[0013] In one instance, the content discovery problem is viewed as an interactive game. In this game, the user has a specific target item on their mind, which is drawn from some probability distribution π * . In each turn of the game, the system asks a (binary) question, the user answers, and this process is repeated until the system identifies a single item that satisfies all user's answers so far. The goal is to minimize the expected number of asked questions. This problem is known as generalized binary search (see, Sanjoy Dasgupta; Analysis of a greedy active learning strategy; In Advances in Neural Information Processing Systems 17, pages 337-344, 2005 and Robert Nowak; The geometry of generalized binary search; IEEE

Transactions on Information Theory, 57(12):7893-7906, 2011), and it can be solved nearly optimally by a greedy algorithm. In each turn of the game, the algorithm selects the question that divides the version space up to that time closest to two halves. The probability π * can be viewed as a user's preferences, and it is typically unknown. The efficient techniques disclosed herein learn this probability over time as the user interacts with a system. The techniques are Bayes optimal and their regret increases only sublinearly with time.

[0014] In one instance, a mathematical model is used for the interactive game.

It is assumed that the system has a collection of M items, and each item m =

1,2, ··· , M is identified by a feature vector φ(τη) & {0,1} . The item feature space Φ is defined as:

Φ = {ø: 3m = l,2, - , Ms. t.0 = 0(m)}. |Φ| is used to denote the cardinality of Φ. The user's preference is modeled as a probability distribution π * over Φ. That is, each time when the user interacts with the system (henceforth referred to as a user-system interaction), with probability π * (φ), they are looking for the item with feature φ (henceforth it is said that the user's target item feature at that interaction is φ). It is further assumed that the user's preference π * is time-invariant, and their target item features in different interactions are statistically independent. That is, the target item features are drawn from π * .

[0015] It is also assumed that during each user-system interaction, the system is allowed to ask a sequence of yes/no (binary) questions from a question pool Q. Each question in Q is represented as a function q: {0,1} K → {Y, N}, where "Y" denotes the answer "yes" while "N" denotes the answer "no." Specifically, if the user's target item feature is φ€ Φ and the system chooses to ask them question q, then their answer will be q(0)€ {Y, N}. In this instance, the question pool Q can be an arbitrary set of such tj's, as long as it satisfies the following assumption: whatever the user's target item feature φ€ Φ is, if the system has asked them all the questions in Q, then it is able to discover φ. \Q\ is used to denote the number of questions in the question pool Q.

[0016] During each interaction, the system asks questions adaptively and continues asking questions until it can determine the user's target item. Specifically, assume the user's target item feature is φ and the system has asked question q it ··· , q n inQ, then the system will observe answers <?ι(φ), ··· , ς η (φ) £ {Y, N}. The current observation state is defined as x— {q it q t (φ))" =1 . Furthermore, x is a terminal state if the system is able to determine φ based on it; otherwise, it is a nonterminal state and the system needs to choose a new question to ask based on x. A (one-interaction) policy T is a decision rule to choose a new question given a nonterminal state x (notation T is used since this policy can be represented as a decision tree). Obviously, if both the policy T and the user's target item feature φ are given, the sequence of questions the system will ask is completely determined, and N(T, φ) is used to denote the length of this question sequence.

[0017] Generally speaking, the system's objective during each interaction is to minimize the expected number of questions it needs to ask to discover the user's target item. For the oracle case, when the system is assumed to know the user preference π', it aims to solve:

rnin E^. [Ν(Γ, )] (Eq. 1)

in each interaction. Of course, in practice, the system does not know π* beforehand and needs to learn it while interacting with the user (discuses infra).

[0018] Before discussion of the more realistic case when the system needs to learn π*, it is worth pointing out that the oracle case of this problem (i.e., when the system is assumed to know π * ) is well-studied, and various methods have been proposed. Specifically, from the perspective of stochastic control, the oracle problem can be formulated as a Markov decision process (MDP) and hence can be solved by dynamic programming (DP). While in theory DP can obtain the optimal policy 7", however, in practice, it tends to be computationally intractable due to the curse of dimensionality, as has occurred in many other problems. It is also known to the information theorists for a long time that a special case of this oracle problem can be efficiently solved by Huffman coding (see, Thomas Cover and Joy Thomas; Elements of Information Theory; John Wiley & Sons, Hoboken, New Jersey, 2006), however, it cannot be generalized to the general case. In addition to these two exact solution methods, some approximation algorithms aiming for near optimal solutions have also been proposed. For example, it is well known that the oracle case of this problem can be solved nearly optimally and computationally efficiently by a greedy algorithm (see, Dasgupta; Daniel Golovin and Andreas Krause; Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization; Journal of Artificial Intelligence Research, 42:427-486, 2011). Recently, based on the notion of adaptive submodularity, the performance bound on this greedy algorithm is further tightened (see, Golovin et al.).

[0019] As discussed supra, in practice, the system needs to learn the user preference π' while interacting with them. Since it is assumed that the system cannot influence the user's behavior (the user's target item features are drawn from π * , no matter what the system has done), there is no need and no way for the system to explore. This distinguishes this problem from the classical multi-armed bandit problems (see, Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer; Finite-time analysis of the multiarmed bandit problem; Machine Learning, 47:235-256, 2002). Thus, a content discovery algorithm is provided herein based on Bayesian learning, which is referred to as the Bayesian content discovery algorithm.

[0020] Specifically, it is assumed that the system has a prior belief P 0 over the user preference π*, which is a probability distribution over all the possible realizations of 7T* . The system updates its belief at the end of each interaction, after observing the user's target item feature in that interaction (which is a sample from π * ). During interaction t, the system exploits its current belief P t to derive an adaptive (one- interaction) policy 77, which (exactly or approximately) solves:

and applies it in this interaction. In other words, T t * is optimal (or near-optimal) with respect to the system's current belief P t .

[0021] It is now shown that the optimization problem (Eq. 2) can be further simplified. Specifically, if the certainty-equivalence (CE) user preference is defined in interaction t as ^(φ) = Ε π Ρί [π(φ)], V e Φ, then for any policy T, we have:

E K ~ Pt [Κφ~ΛΝ(.Τ. [π .φ)Ν(Τ,φ)}

= Σφεφ Ν (Τ, φ)Έ η Ρι [π(.φ)] = ίφ)Ν(Τ, φ) = Ε φ .„·[Λ/(Τ, φ)].

Thus, min r Ε π Ρ( . [Εφ~„[Ν(Τ > is equivalent to m!n T [N(T, φ)].

Furthermore, notice that min T Εψ_ π · [N(T, ø)] is mathematically equivalent to the oracle problem (with the true user preference π * replaced by the CE user preference nl), thus, any method (either exact or approximate) discussed supra can be used to solve min r Έφ„ π( * [ΝζΤ, φ)]. The Bayesian content discovery algorithm is detailed in Algorithm 1 (TABLE 1).

TABLE 1 - Bayesian Content Discovery Algorithm

ASgarsthsH 1 Bayesian Content Discovery Algorithm

liipisfc Itera feature space Φ, question pool Q and prior belief Po

for i - 0, 1, 2, - · - do

Compute the CE user preference π (φ) = Ε π , ^Ρ( [π(φ)] , V^ c

Solve miny Ε^-., π · [Ν(Τ, φ)) (exactly or approximately), and denote the solution as T t * Apply in interaction t, and observe <j> t , the user's target item feature in interaction t

Update the system's (posteri ayes rule

end lor

[0022] In FIG. 3, an example system 300 is shown which incorporates a

Bayesian content discoverer 302 that employs a Bayesian content discovery algorithm to determine a user's target item 304 (or content) based on a series of adaptive user questions 306. In one instance, the example system 300 can be utilized in a processor that is specifically programmed to provide the Bayesian content discoverer 302. Thus, it 302 can also interface with a user interface such as a display device and the like to provide the user questions 306. The user interface device can be hard wired and/or wireless in nature and can be a remote device such as a cell phone and the like that allows interactive input. In this manner, the user questions can be relayed to a user and the user can then reply to the questions which are relayed back to the Bayesian content discoverer 302. This allows the Bayesian content discoverer 302 to process the answers to the questions and determine which questions will be asked of a user during a subsequent user search. The questions are optimized so that the next time the user searches for an item, the questions have been adapted based on the algorithm, the user reaches the searched item more quickly. Thus, if a user is searching for sushi restaurants in a first search, the system 300 might rank Asian cuisine higher in the taxonomy for subsequent restaurant searches and make it more easily selectable and the like for the user, reducing user frustration in subsequent searches.

[0023] Another example system 400 is illustrated in FIG. 4. In this instance, a

Bayesian content discoverer 402 is comprised of an adaptive searcher 404 and a belief updater 406. The belief updater 406 uses a prior belief or probability based on target items found by a user 414 up to that point. This belief is then used to adapt binary questions in the adaptive searcher 404 to build a set of user questions 408 to be asked of the user 414 to help find a user's desired target item 412 (what the user is searching for). Each time the user 414 answers a set of questions and finds a target item 410, the belief updater 406 updates its belief/probability of finding items by using

Bayesian learning (e.g., Algorithm 1, supra), and the updated belief/probability is used by the adaptive searcher 404 to adapt a future set of questions for a user's subsequen target item searches.

[0024] The Bayesian learning is used to adapt the set of questions such that fewer questions need be asked of the user before a target item is found. In one instance, the derived probability can be used to increase the ranking of the question for subsequent searches. This increase in ranking can be implemented such that the question is asked substantially sooner of the user and/or presented (e.g., displayed, etc.) to the user at a higher level. Either means will allow the user to more quickly find their target item. For example, users who frequently search for videos about pets, might be presented with a pet video category at a top level and/or asked if they are looking for pet videos sooner in the question hierarchy.

[0025] In view of the exemplary systems shown and described above, methodologies that can be implemented in accordance with the embodiments will be better appreciated with reference to the flow charts of FIG. 5. While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of blocks, it is to be understood and appreciated that the embodiments are not limited by the order of the blocks, as some blocks can, in accordance with an embodiment, occur in different orders and/or concurrently with other blocks from that shown and described herein. Moreover, not all illustrated blocks may be required to implement the methodologies in accordance with the embodiments.

[0026] FIG. 5 is a flow diagram of a method 500 of discovering content. The method starts by obtaining a prior belief on the probability of a user choosing their content of interest (i.e., their target item) 502. The prior belief is based on target items that the user has selected up to that point in time. A user searches for a target item 504. The user is asked a question with respect to the belief for choosing target item 506. This can be accomplished by presenting a menu with choices that a user can select and the like. The user responds to the question 508. The response can be a selection of one of the choices in a menu and the like. A determination is then made as to whether the remaining number of items in the selection space is more than one item 510. If more than one item still exists in the space, the user is asked another question with respect to the belief for choosing the target item 506. This question and answer loop can continue until one item remains in the selection space. Thus, at this point, the target item is found 512. The probability of drawing target items based on the found target item is then updated 514 and the user can begin another search 504 for a new target item where the process is based on updates gained from prior found target items. The method 500 optimizes each new set of adapted questions so that the user arrives at the target item as efficiently as possible.

[0027] The method 500 employs Bayesian learning to adapt the set of questions for the desired user content. For example, if a user constantly watches horror films, a content discovery guide will raise this genre higher in the question hierarchy. Thus, a user may be asked whether they desire to find a horror movie sooner in a set of questions than would be asked of a brand new user without a history of responses to adapted questions. This allows the user to more quickly and efficiently find their target item. In other words, the ranking of the question is moved higher so that if it is relevant to the searched item (target item) in a subsequent user search, it will be more likely to be asked of the user sooner than later. Likewise, the question (or selection) can be presented to a user first in a display of listed genre types and the like. This allows the user to select the correct genre at a higher level and move more quickly to finding their desired item (e.g., a horror movie).

[0028] What has been described above includes examples of the

embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the embodiments, but one of ordinary skill in the art can recognize that many further combinations and permutations of the embodiments are possible. Accordingly, the subject matter is intended to embrace all such alterations, modifications and variations that fall within scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim.