Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CREATING A CINEMATIC STORYTELLING EXPERIENCE USING NETWORK-ADDRESSABLE DEVICES
Document Type and Number:
WIPO Patent Application WO/2018/071212
Kind Code:
A1
Abstract:
Systems and methods are provided for creating a cinematic storytelling experience using network-addressable technologies. A user sends a request through a client application to retrieve a story from a story database on a server facility. The client device downloads from the server facility the story, a corresponding lexicon, and a set of coordination instructions. Responsive to the user beginning narration of the selected story, a client or a server application detects and interprets the user input and compares the input to a limited set of keywords contained in the lexicon and associated with a specific coordination instruction. When the application detects recitation of a keyword associated with a coordination instruction, the application directs the corresponding network-addressable device to carry out the instruction.

Inventors:
ROTHERA ALEXANDER (US)
LANGE SCOTT D (US)
DROP JENNIFER (US)
Application Number:
PCT/US2017/054623
Publication Date:
April 19, 2018
Filing Date:
September 29, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
GOOGLE LLC (US)
International Classes:
G06F3/16; G06F13/10; G06F40/00; G10L15/26
Foreign References:
US20140080109A12014-03-20
US20130307856A12013-11-21
US20140191976A12014-07-10
US20160171767A12016-06-16
Other References:
See also references of EP 3523718A4
Attorney, Agent or Firm:
PORTNOVA, Marina (US)
Download PDF:
Claims:
CLAIMS

1. A computer-implemented method comprising:

sending, from a client device to a serv er, a user selection of a story;

receiving, from the server, a lexicon associated with the selected story, the lexicon comprising a limited number of keywords;

receiving coordination instructions associated with the selected story, the coordination instructions including instructions for the client device to instruct one or more network-addressable devices to perform at least one action responsive to the client device detecting a keyword;

receiving user input, the user input including verbal recitation of the selected story; detecting correspondence between the user input and at least one of the keywords; and

directing the one or more network-addressable devices to perform the at least one action responsive to the coordination instructions for the at least one keyword.

2. The computer-implemented method of claim 1, wherein the detecting is performed by the client device.

3. The computer-implemented method of claim 1 or 2, wherein detecting

correspondence between the user input and at least one of the keywords further comprises: sending the receiv ed user input to the server; and

receiv ing from the serv er the correspondence between the user input and at least one of the keywords.

4. The computer-implemented method of claim 1 , 2 or 3, wherein the detecting comprises using a matching algorithm to match the audio profile of the at least one keyword with the audio profile of the user input.

5. The com puter- i mpl emented method of any preceding claim, wherein the v erbal recitation comprises playback of a recorded narration of the story.

6. The computer-implemented method of any preceding claim, further comprising: detecting network-addressable devices within a specified radius of the client device;

responsive to detecting the one or more network-addressable devices within the specified radius, directing the one or more network-addressable devices to perform the at least one action; and

responsive to failing to detect the one or more network-addressable devices within the specified radius, skipping the at least one action.

7. The computer-implemented method of any preceding claim, wherein the coordination instructions for the at least one keyword contain an instruction to perform the at least one action at a plurality of points in the story.

8. A computer system, comprising:

a computer-readable storage medium comprising executable program code for generating an integrated storytelling experience using network-addressable devices, the computer program code:

sending, from a client device to a server, a user selection of a story;

receiving, from the server, a lexicon associated with the selected story, the lexicon comprising a limited number of keywords;

receiving coordination instructions associated with the selected story, the coordination instructions including instructions for the client device to instruct one or more network-addressable devices to perform at least one action responsive to the client device detecting a keyword;

receiving user input, the user input including verbal recitation of the selected story; detecting correspondence between the user input and at least one of the keywords; and

directing the one or more network-addressable devices to perform the at least one action responsive to the coordination instructions for the at least one keyword.

9. The system of claim 8, wherein the detecting is performed by the client device.

10. The system of claim 8 or 9, wherein detecting correspondence between the user input and at least one of the keywords further compri ses:

sending the received user input to the serv er; and

receiving from the server the correspondence between the user input and at least one of the keywords.

1 I . The system of claim 8, 9 or 10, wherein the detecting comprises using a matching algorithm to match the audio profile of the at least one keyword with the audio profile of the user input.

12. The system of any one of claims 8 to 11, wherein the verbal recitation comprises playback of a recorded narration of the story.

13. The system of any one of claims 8 to 12, further comprising computer program code for:

detecting network-addressable devices within a specified radius of the client device;

responsive to detecting the one or more network-addressable devices within the specified radius, directing the one or more network-addressable devices to perform the at least one action; and

responsive to failing to detect the one or more network-addressable devices within the specified radius, skipping the at least one action.

14. The system of any one of claims 8 to 13, wherein the coordination instructions for the at least one keyword contain an instruction to perform the at least one action at a plurality of points in the story.

15. A non-transitory computer-readable medium storing computer-executable instructions that, in response to execution, cause a device comprising a processor to perform operations, comprising:

sending, from a client device to a server, a user selection of a story;

receiving, from the server, a lexicon associated with the selected story, the lexicon comprising a limited number of keywords;

receiving coordination instructions associated with the selected story, the coordination instructions including instructions for the client device to instruct one or more network-addressable devices to perform at least one action responsive to the client device detecting a keyword;

receiving user input, the user input including verbal recitation of the selected story; detecting correspondence between the user input and at least one of the keywords; and

directing the one or more network-addressable devices to perform the at least one action responsive to the coordination instructions for the at least one keyword.

16. The computer-readable medium of claim 15, wherein the detecting is performed by the client device.

17. The computer-readable medium of claim 15 or 16, wherein detecting

correspondence between the user input and at least one of the keywords further comprises:

sending the received user input to the serv er; and

receiving from the serv er the correspondence between the user input and at least one of the keywords.

18. The com puter- readab I e medium of any one of claims 15 to 17, wherein the detecting comprises using a matching algorithm to match the audio profile of the at least one keyword with the audio profile of the user input.

19. The computer-readable medium of any one of claims 15 to 18, wherein the verbal recitation comprises playback of a recorded narration of the story.

20. The computer-readable medium of any one of claims 15 to 19, wherein the instructions further comprise instructions for:

detecting network -addressabl e devices within a specified radius of the client device; responsive to detecting the one or more network-addressable devices within the specified radius, directing the one or more network-addressable devices to perform the at least one action; and

responsiv e to failing to detect the one or more n et work -ad dressabl e devices within the speci ied radius, skipping the at least one action.

Description:
CREATING A CINEMATIC STORYTELLING EXPERIENCE USING

NETWORK-ADDRESSABLE DEVICES

1. TECHNICAL FIELD

[0001] The disclosure generally relates to the field of voice recognition, and in particular, to detecting user input of keywords corresponding to coordination instructions and instructing network-addressable devices to carry out those instructions to create a cinematic storytelling experience.

2. BACKGROUND INFORMATION

[0002] Network-addressable or "smart home" devices are electronic devices connected to a network that can be controlled through other internet-connected devices, such as smartphones or tablets. These devices have emerged in recent years as a convenient and efficient way for consumers to control the appliances in their homes, often from remote locations. Increasingly, consumers are incorporating Internet of Things ( IoT)-enabled devices into their homes, allowing them to control the temperature, lighting, locks, and other gadgets from their smartphones or computers and allowing the devices to

communicate across a local area network, such as one found in a consumer's home.

100031 While network -addressable devices are surging in popularity, they are primarily used by a specific user group of technology-focused individuals. M any consumers choose to adopt network-addressable systems for security, energy-efficiency, and cost-saving purposes. Thus far, there have been few proposals to apply network-addressable technology more widely.

SUMMARY

[0004] Described embodiments enable the integration of network-addressable technologies with user narration of a book to create a cinematic storytelling experience. A user uses an application on a client device to select a story from a story database and download the story, an associated lexicon, and coordination instructions. After selecting the story, in some embodiments the user is given the option to create a custom smart home animation for the story.

100051 If the user elects not to create a custom animation for the story, the system uses a pre-made set of coordination instructions supplied by a publisher or third party to create the cinematic experience. When the user begins narration of the selected story, a voice recognition module listens and receives input via the microphone on the user's client device. A computer model is run against the input to determine a li kelihood that a word spoken by the user is a keyword. If the model determines that a word is a keyword, a rules module matches the keyword to a rule and retrieves a corresponding instruction directing at least one network- addressable device to carry out one or more desi gnated actions. The rules module sends the instruction to a control module, which consults an inventory of the user's network-addressabl e devices to determine whether the device associated with the designated action is included in the inventory. If the device is included in the inventory, the control module sends an instruction to the device to carry out the designated action( s). Sending the instruction upon detection of the keyword in the input speech from the user ensures that the action is correctly synchronized with the narration of the story.

[0006 j If the user elects to create custom coordination instructions, the user uses an effects module on the application to assign various smart home effects to different portions of the story. When the user begins narration of the selected story and reaches a word or phrase to which the user has assigned a custom coordination instruction, the control module sends an instruction to the corresponding network-addressable device to perform the action(s) associated with the instruction.

j 0007] One disclosed aspect includes a computer-implemented method comprising: sending, from a client device to a server, a user selection of a story; receiving, from the server, a lexicon associated with the selected story, the lexicon comprising one or more of keywords; receiving coordination instructions associated with the selected story, the coordination instructions including instructions for the client device to instruct one or more network-addressable devices to perform at least one action responsive to the client device detecting a keyword; receiving user input, the user input including verbal recitation of the selected story; detecting correspondence between the user input and at least one of the keywords; and directing the one or more network-addressable devi ces to perform the at least one action responsive to the coordination instructions for the at least one keyword. In an embodiment the detecting may comprise using a matching algorithm to match the audio profile of the at least one keyword with the audio profile of the user input and optionally may comprise narrowing the matching algorithm, for example based on information about the user's location in the story, thereby reducing or el iminating the likelihood that a word that is not a keyword is wrongly determined to be a keyword. Another disclosed aspect includes a computer system, comprising: a computer-readable storage medium comprising executable program code that, when executed, causes one or more computing devices to implement a method comprising: sending, from a client device to a server, a user selection of a story; receiving, from the server, a lexicon associated with the selected story, the lexicon comprising one or more keywords; receiving coordination instructions associated with the selected story, the coordination instructions including instructions for the client device to instruct one or more network-addressable devices to perform at least one action responsive to the client device detecting a keyword; receiving user input, the user input including verbal recitation of the selected story; detecting correspondence between the user input and at least one of the keywords; and directing the one or more network-addressable devices to perform the at least one action responsive to the coordination instructions for the at least one keyword. Other disclosed aspects include a corresponding computer-readable medium (which may be a non-transitory computer- readable medium, although the invention is not limited to this). The features and advantages described in this summary and the following description are not all inclusive and, in particular, many additional features and advantages will be apparent in view of the drawings, specification, and claims. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the disclosed subject matter.

BRIEF DESCRI PTION OF THE DRAWINGS

10008] The di sclosed embodiments have other advantages and features that will be more readily apparent from the detailed description and the accompanying figures. A brief introduction of the figures is below.

[0009 j FIG. 1 is a block diagram i llustrating a system architecture for creating a cinematic storytelling experience using network-addressable technologies, according to an embodiment.

[0010] FIG 2 A is a block diagram i llustrating an embodiment of the client device shown in FIG 1 .

[0011] FIG. 2B is a block diagram illustrating an embodiment of the server faci lity shown in FIG 1 .

j 0012 ] FIG. 3 is an interaction diagram illustrating the interaction between a book, a client application, a server facility, and a network-addressable device, according to an embodiment. [0013] FIG. 4 is block diagram i llustrating an example computer usable to implement entities of the cinematic storytell ing environment, according to an embodiment.

DETAILED DESCRIPTION

1001 { The figures and the following description describe certain embodiments by way of i llustration only. One ski lled in the art wi l l readi ly recognize from the following description that alternative embodiments of the structures and methods i llustrated herein may be employed without departing from the principles described herein. Features described herein with reference to one aspect or implementation may be applied to any other aspect or implementation.

[0015] Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures. It i s noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or l ike functionality. The figures depict embodiments of the disclosed system ( and method ) for purposes of ill ustration only. One skilled in the art will readi ly recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed w ithout departing from the principles described herein.

[0016 j FIG. 1 is a block diagram illustrating a system view of a server facility 100 for creating a cinematic storytel ling experience using network-addressabl e technologies, in accordance with one embodiment. The system is scalable to accommodate multiple users/viewers that use cl ient devices 1 10 to interact with the server facility 100. While only two client devices 1 1 OA and 1 1 OB are shown, it is understood that very large numbers (e.g., millions) of client devices are supported and can be in communication with the server facility 100 at any time.

[001 7] Each client device 1 10 is used by a user to communicate with the server facility 100. Examples of client devices 1 10 include any type of computer device, such as a mobile telephone, a smartphone, a tablet, a personal digital assistant, an IP enabled video player, or a personal computer (e.g., desktop, notebook, laptop) computer. In other embodiments, the client device 1 10 is a voice-activated home product such as a Google Home device or Amazon Echo.

[0018] Also connected to the server facility 100 through the network 130 are network- addressable devices 120. Network-addressable dev ices 1 20 include thermostats, cameras, lights, locks, and other devices that are connected across a home network and are capable of being controlled remotely via the client devices 1 10. While only two network- addressable devices 120 A and 12 OB are shown, it is understood that very large numbers (e.g., millions) of network- addressable devices are supported and can be in

communication with the server facility 100 at any time, in one embodiment, network- addressable devices 120 use a unified operating system such as Google's Brillo loT platform and communicate via a cross-platform common language such as Google's Weave™ platform.

[0019] The server facility 100 communicates with one or more client devices 1 10 and one or more network-addressable devices 120 via a network 130. The network 130 enables communications between the client device 1 10 and the server facility 100, between the network- addressable devices 120 and the server facility 100, and between the client device 1 10 and the network-addressable devices 120. in one embodiment, the network 130 is the Internet or a mobile network that connects mobile devices to the Internet, and uses standardized internetworking communications technologies and protocols, known now or subsequently developed, that enable the client device 1 10 and the network-addressable devices 120 to communicate with the server facility 100 and with each other. In another embodiment, the network 130 is a cloud computing network and includes one or more components of the server facility 100. The client device 1 10, the network-addressabl e device 120, and the server facility 100 are connected to the network 130 through network interfaces.

Client Device

[00201 As shown in FIG. 2A, the client device I 1 0 includes a client application 210 and a storage facility 235, such as a hard drive or flash memory device, to which the client device 1 10 stores data used by the user in performing tasks. The client device 1 10 al so typically includes a processor, a display device (or output to a display device), and a network interface for coupling to the server facility 100 via the netw ork 130.

[0021] In some embodiments, the client device I 10 includes a video player 205. The video player 205 is used for viewing a video stream. The video player 205 may be a standalone application, or a plug-in to another application such as a network browser. The video player 205 includes user interface controls (and corresponding application programming interfaces ) for selecting a video feed, starting, stopping, and rewinding a video. Also, the video player 205 can include in its user interface a video display format selection configured to indicate which video display format (e.g., a two- dimensional (2D) video or a three-dimensional (3D) video). Other types of user interface controls (e.g., buttons, keyboard controls) can be used as well to control the playback and video format selection functionality of the video player 205.

[0022] The client device 1 10 also includes a client application 210 downloaded through the network 1 30. I n another embod iment, the client appl ication 2 1 0 is preloaded on the cl ient device 1 10 such that the user of the client device 1 10 need not download the cl ient appl ication 210 separately. In one embodiment, when the user downloads the client application 210 onto the client device 1 1 0, the user is asked to input information about the network-addressable dev ices in the user's home or surrounding area. For example, a user might tell the application that the user has a smart thermostat and a wireless music system in the l iving room and color changing smart bulbs and a wireless fan in the bedroom. In another embodiment, a control module 225 on the cl ient device 1 10 detects network-addressable dev ices within a specific radius by searching for devices that use a unified operating system such as Google's Brillo system and a cross-platform common language such as Google's Weave® platform. This avoids the need for the client appl ication to ask a user to input information about available network-addressable devices or for the client application to cause the cl ient device to display an interface enabl ing the user to enter this information, thereby making the client application smaller, easier to download on the client dev ice and quicker to run . Thi s i s of particular advantage in the case where the appl ication is used with a mobile client dev ice.

100231 I n situations in which the systems discussed here collect personal

information about users, or may make use of personal information, the users may be prov ided with an opportunity to control whether programs or features col lect user information (e.g., information about a user's social network, social actions or activ ities, profession, a user's preferences, a user's current locati on, or devices within a speci ic radius of the client device 1 10), or to control whether and/or how to receive content from the content server that may be more relevant to the user. In addition, certain data may be treated in one or more ways before it is stored or used, so that personally identifiable information is removed. For example, a user's identity may be treated so that no personal ly identifiable information can be determined for the user, or a user's geographic location may be generalized where location information is obtained (such as to a city, zip code, or state level), so that a particular location of a user cannot be determined. Thus, the user may have control over how information is collected about the user and used by a content server.

[0024 j The client application 210 is used to download new story experiences from the server facility 100 and to choreograph cues and effects associated with each story. The user of the client device 1 10 downloads a story from a story database 275 ( Fig. 2B) onto the client device 1 1 0 through the client application 210. When the user downloads the story, in various embodiments a lexicon and a set of pre-made coordination instructions associated with the story are included in the dow nload, and the user in some embodiments is given the option to create a custom set of coordination instructions. The story, the lexicon, and any pre-made and/or custom coordination instructions are stored in the storage facility 235, such that the client device 1 10 need not be connected to the server facility 100 to begin the cinematic storytelling experience. In one embodiment, the download also includes a video of a content provider narrating the story.

[0025] The client application 2 10 includes a voice recognition module 2 15, a rules module 220, a control module 225, and an effects module 230. The voice recognition module 2 1 5 listens to user narration of the selected story and receives the narration as input via the microphone on

the client device 1 10. In one embodiment, the user input is playback of the video containing the content prov ider's narration of the story.

[00261 Referring still to Fig. 2 A and al so to Fig. 2B, for each story stored in the story database 275, a corresponding lexicon is stored in a corpus lexicon 280 and a set of pre- made coordination instructions supplied by the publisher of the story or a third party is stored in the story database 275. The lexicon contains a list of all of the words in the story, including a subset of the words that are designated keywords. In various embodiments, a story's lexicon al so contains the total number of words in the story and the corresponding numbers of each keyword. For example, the lexicon for "The Tell -Tale Heart" by Edgar Allen Poe might indicate that the story contains a total of 2093 words and that words 72, 268, 599, 1362, 1 87, and 2045 are keywords. In one embodiment, each keyword is a single word in the lexicon. In another embodiment, a keyword i a phrase. In sti ll other embodiments, the keywords may be a combination of singl e w ords and phrases. Each keyword is associated with at least one specific instruction directing one or more network-addressabl e dev ices to perform at least one action responsive to detection of the keyword. The speci ic instructions for each story are stored together as a set of coordination instructions in the story database 275. For example, a set of coordination instructions for a ghost story might include the following:

Table 1: Coordination Instructions

[0027] In some embodiments, the user elects to create a custom set of coordination instructions to animate the story. Upon opening the client application 210 and selecting the story, the user interacts with the effects module 230 to assign various effects to different parts of the story. The effects module 230 presents on the client device 1 10 the text of the selected story. In some embodiments, the effects module 230 indicates which ords are listed as keywords in the pre-made set of coordination instructions (e.g., by holding or underlining the keywords).

[0028 j The effects module 230 also displays on the client device 1 10 a list of pre-made building blocks from which the user can select. For example, the building blocks may contain light patterns such as flickering, fade on, fade off, lighting effect, color controls, etc. Building blocks related to sound effects might include thunder, heartbeats, doorbells, screams, etc. And building blocks related to network-addressable appliances might include control of smart outlets, temperature control, and fan control (e.g., on/off, speed control ).

100291 The user provides input to the effects module 230 in selecting from the list of pre- made building blocks. In one embodiment, the user uses a drag and drop interaction to assign bui lding blocks to words in the story. Alternatively, the user may select bui lding blocks by clicking on an entry in a list of available effects. In some embodiments, the building blocks are assigned to the keywords. Additionally or alternatively, the user may elect to assign the building blocks to words that are not designated keywords in the pre- made set of coordination instructions. In one embodiment, multiple building blocks are assigned to a single word. In another embodiment, a single building block i s assigned to multiple words. Responsive to receiving the user input, the effects module 230 consolidates the building blocks into a custom set of coordination instructions and sends the custom coordination instructions for storage on the storage facility 235. The user may create multiple sets of custom coordination instructions for each story.

100301 When the user opens the cl ient application 210 and selects a story, the effects module 230 displays on the cl ient device 1 1 0 an option to use the pre- made coordination instructions associated with the story or a set of custom coordination instructions.

100311 If the user elects to use the pre-made set of coordination instructions and begins narration of the selected story, the voice recognition module 2 1 5 interprets the user input and determines whether the input contains words in the story lexicon and, more specifically, whether the input contains a keyword. The v oice recognition module 2 1 5 is trained to interpret the user input by matching audio profiles of keywords to the user input to detect recitation of a keyword.

100321 The voice recognition module 2 1 5 is a text-to-speech system that uses a language model and an acoustic model to convert the user input to w ritten text and compare the text to a story lexicon. The language model contains a list of words and their probability of occurring in a given sequence, and the acoustic model contains a statistical representation of the distinct sounds that comprise each word in the language model . The acoustic model is trained using a set of audio recordings and their corresponding transcripts. For each word incl uded in a story's lexicon, the voice recognition module 2 1 5 receives as i put hundreds or thousands of iterations of the word to account for v ariations in accents, dialects, regional differences, and the l ike. Over time, variations in pronunciation will lead to greater certainty in recognizing a particular word and will increase the speed with which the v oice recognition module 215 is able to perform the detection. Additional or alternative methods for training v oice recognition modules are well known in the art.

100331 An audio profile of each keyword is stored with a story's lexicon in the corpus lexicon 280. In one embodiment, the v oice recognition module 215 does not perform a full speech-to-text analysis of the user input. Rather, the voice recognition module 215 employs a matching algorithm to match the audio profile of a story's keywords with the audio profi le of the user input as the user narrates the story. Alternatively, the voice recognition module 215 performs a full speech-to-text analysis and compares the text of the narration with the keywords to determine whether the user input contained a keyword.

100341 Suitable examples of speech-to-text engines include Google Cloud Speech, Google Web Speech, and Android Speech. Alternative speech-to-text engines may be adapted to operate according to the teachings disclosed herein. In either embodiment, detection by the voice recognition module 215 is improved if the voice recognition module 215 narrows the matching algorithm based on a temporal understanding of the user's location in the story as discussed below.

[0035] In another embodiment, the voice recognition module 215 detects the point in the story at which the user is located based on the number of words that the user has read, the location of the keywords, and the total number of words in the story. The voice recognition module 215 retrieves the story's lexicon from the corpus lexicon 280 and counts the number of words contained in the user input. When the voice recognition module 2 1 5 reaches a numbered word corresponding to a keyword, the voice recognition module 215 classifies the input as a match for the keyword. In some embodiments, the user exits the appli cation before completing the story, and the voice recognition module 215 saves the user's place such that when the user resumes narration, the voice recognition module 21 5 detects the user's position in the story. If the user resumes narration from the beginning of the story instead of the place where he previously left off, the voice recognition module 215 detects that the user input does not correspond to the expected position and uses the matching algorithm to match the audio profile of the user input with the audio profiles of the words in the lexicon to determine the user's position.

100361 When a user selects a story from the story database 275, the voice recognition module 2 1 5 retriev es the corresponding lexicon from the corpus lexicon 280, and recognition of the user input is restricted to the ords contained in the lexicon.

Restricting the applicable lexicon based on user selection of a story significantly increases the probability that the voice recognition module 2 1 5 will accurately interpret the user input and increases the speed at which the voice recognition module 2 1 5 operates.

100371 Responsiv e to receiv ing the user input, the voice recognition module 2 1 5 runs the computer model against the input to determine a score indicating the probability that the user input corresponds to a keyword. The highest scoring keyword match above a threshold is designated by the voice recognition module 215 as the match for the user input. The voice recognition module 215 then sends the keyword to the rules module 220 for analysis.

[0038] If the user elects to use a set of custom coordination instructi ons, the voice recognition module 2 1 5 interprets the user input and determines whether the input contains words in the story lexicon and, more specifically, whether the input contains a word or phrase for which the user designated a custom coordination instruction. The voice recognition module 2 1 5 performs interpretation of the user input in the same manner described abov e w ith respect to the p re-made set of coordination instructions.

1003 1 After interpreting the user input as a word stored in a story's lexicon, the v oice recognition module 2 1 5 prov ides the interpretation to the rules module 220 for processing. The rules module 220 coordinates the cues and effects associated with each story. Responsiv e to receiv ing the output from the v oice recognition module 2 5, the rules module 220 accesses and analyzes the pre- made or custom coordination instructions to determine the at least one specific network-addressable device instruction that corresponds to the detected keyword.

[0040] The rules module 220 then sends the at least one coordination instruction corresponding to the keyword to the control module 225 for communication to the network- addressable devices 1 20.

100411 The control module 225 receiv es the coordination instruction) s ) from the rules module 220 and instructs the network-addressable dev ices 120 to perform the action(s) corresponding to the keyword. In one embodiment, the instruction is to perform the action immediately after the control module 225 prov ides the instruction to the n et work - addressable dev ice 1 20. Additionally or alternatively, the instruction may be to perform the action at a later point in the story or to perform an action multiple times ov er the course of the story.

100421 The control module 225 compares the receiv ed coordination instruction for a keyword w ith a list of the network-addressable devices in the user's home or surrounding area as determined by user input or detection by the control module 225. If the coordination instruction contains an instruction corresponding to a network- addressable device that is within a specified radius of the client dev ice 1 10, the control module 225 instructs the network-addressable dev ice to execute the instruction. For example, if the coordination instruction requires smart lights to turn blue, the control module 225 sends the instruction if it determines that smart lights are present within a specified radius of the client device. In one embodiment, the control module 225 communicates with the network-addressable devices 120 via a communication platform such as Google's Weave™ platform . If the coordination instructi on contains an instruction corresponding to a network-addressable device that is not within a certain radius of the client device 1 10, the control module 225 skips the instruction in one embodiment.

100431 For example, assume a user selects the story "The Tell -Tale Heart" from the story database 275. The dow nload onto the user's client device 1 10 includes a story file, a lexicon containing a list of the words and keywords in the story, and a set of pre-made coordination instructions corresponding to the keywords. If the user elects not to use a set of custom coordination instructions, when the user of the client devi ce 1 1 0 begins reading the story aloud, the voice recognition module 2 1 5 receives the input and applies a matching algorithm to determine whether the input corresponds to a keyword. In some embodiments, the voice recognition module 2 1 5 narrows the matching algorithm based on a temporal understanding of the user's location in the story, as discussed above. Based on the results of the matching algorithm, the voice recognition model 2 1 5 determines a score indicating the probabil ity that the input corresponds to a keyword and selects the highest scoring keyword match above a threshold as the match for the user input. The voice recognition module 2 1 5 then sends the keyword to the rules module 220, which compares the keyword to the coordination instructions and sends the corresponding coordination instruction to the control module 225 for communication to the netwo k-addressabl e device 120. Narrowing the matching algorithm based on an understanding of the user's location in the story eliminates or reduces the possibility that the a word read by the user is wrongly interpreted as a keyword when it is in fact not a keyword, and so avoids the waste of resources that would occur if an action is performed when it i s not wanted.

100441 For example, the pre-made set of coordination instructions for The Tel l -Tale Heart may include an instruction that when a user says the keyword "darkness," the lights in the room should go out, or an instruction that when the user says "beating," the sound system in the room should make a loud beating noise. In other embodiments, different keywords in the same story trigger the same instruction or one keyword triggers multiple actions. In still another embodiment, a keyword triggers a network-add essabl e device 120 to take an action that develops over the course of a sentence, a paragraph, or the entire story. For example, the coordination instructions for the Tell -Tale Heart may include an instruction that when a user says the keyword "beating," the sound system in the room starts making a slow and quiet beating noise that becomes faster and louder as the story progresses. I f, for example, the coordination instructions include an instruction that when a user says the keyword "cold," the thermostat lowers the temperature in the room, but there is not a smart thermostat within a specified radius of the client device 1 10, the control module 225 will skip the instruction.

100451 In one embodiment, some functionality attributed to the client application 210 may be performed on the server facility 100. If greater processing is required than is capable of being performed on the client device I 10, the request is sent over the network 130 to the server facility 100 to be performed on a server application 250. Additional ly or alternatively, the server application 250 may perform some or all of the tasks affiliated with the client application 120, as described below.

[0046] The client dev ice 1 10 also includes a storage facility 23 that stores user- selected stories downloaded onto the client dev ice 1 10 through a server application 250. The storage facility 235 also stores the coordination instructions and the lexicon associated with the selected story as well as a video of a content prov ider narrating the story. When the user of the client dev ice 1 10 opens the application to initiate the cinematic storytelling experience, the client application 210 retrieves the story and associated information from the storage facility 235.

Server Facility

[0047] FIG. 2B is a block diagram of a serv er facility 100 for creating a cinematic storytelling experience using network -addressabl e technologies. As illustrated in FIG. 2B, the server facility 1 00 includes a serv er application 250, a video server 270, a story database 275, and a corpus lexicon 280. Other conv entional features, such as firewalls, load balancers, application servers, fail ov er servers, site management tools, and so forth are not shown as to more clearly illustrate the features of the system. Examples of a suitable server facil ity 1 00 for implementation of the system include the YouTube™ and Google Video™ websites; other video hosting sites are known as well, and can be adapted to operate according to the teaching disclosed herein. It will be understood that the term "website" represents any system and method of prov iding content and is not intended to be limited to systems that support content provided via the Internet or the HTTP protocol . The various servers are conventionally implemented, whether as a single piece of software or hardware or as multiple pieces of software or hardware and can couple to the network 130 via the server application 250. In general, functions described in one embodiment as being performed on the server side can also be performed on the client side in other embodiments if appropriate.

[0048] Like the client application 210, the server application 250 includes a voice recognition module 255, a rules module 260, a control module 265, and an effects module 270. In one embodiment, the server application 250 receives user queries for videos. In response to a user query, the server application 250 accesses the story and associated coordination instructions from the story database 275 and the lexicon from the corpus lexicon 280. In one embodiment, the server application 250 facilitates download of the story, the coordination instructions, and the lexicon onto the client device 1 10. In another embodiment, the story, the coordination instructions, and the lexicon remain stored on the server facility 100 during the cinematic storytel ling experience and the server application 250 performs the tasks affiliated with the client application 2 10 described above. In still another embodiment, creation of the cinematic storytelling experience is performed by both the client application 2 10 and the server application 250.

1004 1 The video server 270 receives uploaded media content from content providers and allows content to be viewed on the client device 1 10. Content may be uploaded to the video server 270 via the Internet from a personal computer, through a cellular network from a telephone or PDA, or by other means for transferring data over the network 130 known to those of ordinary skill in the art. Content may be downloaded from the video server 270 in a similar manner; in one embodiment media content i s prov ided as a file download to a client device 110; in an alternative embodiment, media content is streamed to the client device 1 10. The means by which media content is received by the video server 270 need not match the means by which it is delivered to the client device 110. For example, a content prov ider may upload a video via a browser on a personal computer, whereas the client device 1 1 0 may view that video as a stream sent to a PDA. Note also that the video server 270 may itself serve as the content provider.

[0050] The story database 275 stores stories and associated metadata, provided by their respective content owners. The story files have metadata associated with each fi le, such as a story ID, author, title, label, genre, and length.

[0051] The story database 275 also stores pre-rnade coordination instructions associated with each story that dictate the use of network-addressable devices 1 20 in the cinematic storytelling experience. For example, assume that a father is reading a story to hi s son about a small boat traversing the ocean. The coordination instructions may provide that when the voice recognition module 2 1 5 detects the keyword "calm" or "peaceful." the smart, l ights in the room should glow bl ue and green. The sky grows dark as the story progresses. When the father reads the keyword "storm," the smart speakers in the room switch from calm music and ocean sounds to an ominous score and loud winds. Finally, upon detecting the combination of the keywords " lightening" and " strikes," the serv er appl ication 250 instructs the smart lights in the room to flash yellow and the speakers to produce a thunder clap.

[0052] For each story in the story database 275, the corpus lexicon 280 stores a lexicon containing the words used in the story . The voice recognition module 2 1 5 is trained to recognize a limited number of keywords for each story, as described above in conjunction with FI G . 1. Restricting the applicable lexicon increases the probability that the v oice recognition module 2 1 5 will accurately interpret user input for a story, increase the speed at which the recognition is performed, and or eliminate or reduce the possibility that the word read by the user is wrongly interpreted as a keyword. In some embodiments, recognition speed and accuracy is also improved by narrowing the matching algorithm based on a temporal understanding of the user's location in the story such that the lexicon of possible words is based on the location.

Interaction Diagram

[0053] FIG. 3 is an interaction diagram illustrating an example process for creating a cinematic storytelling experience with network-addressable technologies. In the displayed embodiment, creation of the cinematic experience is performed on the client application 2 10. In other embodi ments, the process occurs on the server facility 100 or on both the client application 2 10 and the server facility 100.

[00541 The user of the client device 1 10 opens the client application 210 and interacts with the server application 250 on the server facility 100 to select 302 a story from the story database 275. The server application 250 accesses the story and a set of pre-made coordination instructions from the story database 275 and the lexicon from the corpus lexicon 280. The client device 1 10 downloads 04 the story, the pre-made coordination instructions, and the lexicon from the server facil ity 100. In one embodiment, the download includes a video of a content provider narrating the selected story. In another embodiment, the video serv er 270 streams a video to the client device 1 10.

[0055] The client application 2 10 displays on the client device I 10 an option to create a set of custom coordination instructions. In some embodiments, the user assigns pre-made bui lding blocks associated ith light patterns, sound effects, and appliance controls to animate various portions of the story and create the custom coordination instructions. In other embodiments, the user elects not to create custom coordination instructions and instead uses the pre-made set of coordination instructions supplied by the publisher or a third party.

[0056] The client application 2 10 detects 306 user narration of the selected story. In one embodiment, the user reads aloud from the downloaded story file stored in the storage facility 235. In another embodiment, the narration comprises the user reading from a physical version of the story separate from the cl ient device I 10 (e.g., a book, a magazine, another client device, etc. ). In still another embodiment, the narration comprises playback of a video file in which a content provider narrates the story.

[0057] The various components of the client application 2 10 analyze 308 the narration and instruct the network-addressable devices 120 to perform an action responsive to detecting recitation of a keyword. The voice recognition module 2 1 5 uses speech-to-text technology to interpret the user input as corresponding to a keyword and provides the interpretation to the rules module 220 for processing. Technology for performing speech- to-text is widely available and an appropriate method may be chosen by the implementer for performing this function, in embodiments where the user has created custom coordination instructions, recognition of the user input proceeds in the same manner.

[ 00581 The rules module 220 accesses the pre-made and custom coordination instructions from the storage facility 235 and compares the coordination instructions with the received keyword to determine the applicable instruction(s). The rules module 220 then communicates the applicable instruction to the control module 225, which determines whether the corresponding network-addressable device 1 20 is located within a specified radius of the client device 1 10. if the control module 225 determines that the network- addressable device 120 is located within the specified radius, the control module 225 instructs 310 the network -addressabl e dev ice 1 20 to perform the at least one action associated with the instruction. In one embodiment, the instruction directs one network- addressable device 120 to perform one action. In other embodiments, the instruction directs multiple network-addressable dev ices 1 20 to perform an action or directs one network-addressable device 120 to perform multiple actions. At 3 12, the network- addressable device 120 executes the instruction.

[0059] The client 1 10 and the server facility 100 are each implemented using computers. FIG. 4 is a block diagram illustrating an example computer 400 usable to implement entities of the cinematic storytelling env ironment, in accordance with one embodiment. The example computer 400 has sufficient memory, processing capacity, network connectivity bandwidth, and other computing resources to implement the cinematic storytelling experience as described herein.

100601 The computer 400 includes at least one processor 402 (e.g., a central processing unit, a graphics processing unit) coupled to a chipset 404. The chipset 404 includes a memory control ler hub 420 and an input/output (1/0) control ler hub 422. A memory 406 and a graphics adapter 4 12 are coupled to the memory controller hub 420, and a display 418 is coupled to the graphics adapter 412. A storage device 408, a keyboard 410, a pointing dev ice 4 14, and a network adapter 4 16 are coupled to the 1/0 controller hub 422. Other embodiments of the computer 400 hav e different architectures.

[0061] The storage dev ice 408 is a non-transitory com puter-readabl e storage medium such as a hard drive, compact disk read-only memory (CD-ROM ), DVD, or a solid-state memory dev ice. The memory 406 holds instructi ons and data used by the processor 402. The processor 402 may include one or more processors 402 having one or more cores that execute instructions. The pointing device 4 14 i s a mouse, touch -sensitive screen, or other type of pointing dev ice, and in some instances is used in combination with the keyboard 4 10 to input data into the computer 400. The graphics adapter 4 1 2 di splays media and other images and information on the display 4 1 8. The network adapter 4 1 6 couples the computer 400 to one or more computer networks (e g., the network 130).

[0062] The computer 400 is adapted to execute computer program modules for prov iding functionality described herein including presenting media and/or metadata generation . As used herein, the term "module" refers to computer program logic used to prov ide the specified functionality. Thus, a module can be implemented in hardware, firmware, and/or software. In one embodiment of a computer 400 that implements the server facility 100, program modules such as the server application 250 and the video server 270 are stored on the storage device 408, loaded into the memory 406, and executed by the processor 402. [0063] The types of computers 400 used by the entities of the cinematic storytelling environment can vary depending on the embodiment and the processing power required by the entity. For example, the client dev ice 1 10 is a smart phone, tablet, laptop, or desktop computer. In other embodiments, the client device 1 10 is a v oice-activ ated home product such as a Google Home device or Amazon Echo. As another example, the server facility 100 might comprise multiple blade servers working together to prov ide the functionality described herein. The computers 400 may contain duplicates of some components or may lack some of the components described above (e.g., a keyboard 410, a graphics adapter 412, a pointing device 414, a display 418). For example, the server facility 100 can run in a single computer 400 or multiple computers 400 communicating with each other through a network such as in a serv er farm .

100641 The present inv ention has been described in particular detail with respect to various embodiments, and those of skill in the art will appreciate that the inv ention may be practiced in other embodiments. In addition, those of ski ll in the art will appreciate the following aspects of the disclosure. First, the particular naming of the components, capitalization of terms, the attri butes, data structures, or any other programming or structural aspect is not mandatory or significant, and the mechani sms that implement the invention or its features may have different names, formats, or protocols. Second, the system may be implemented v ia a combination of hardware and software, as described, or entirely in hardware elements. Third, the particular div ision of functionality between the various system components described herein is merely exemplary, and not mandatory; functions performed by a single s stem component may instead be performed by multiple components, and functio s performed by multiple components may instead performed by a single component.

100651 Some portions of abov e description describe the invention in terms of algorithms and symbol ic representations of operations on information. These algorithmic descriptions and representations are used by those skilled in the data processing arts to most effectively conv ey the substance of their work to others skilled in the art. These operations, while described functional ly, computationally, or logically, are understood to be implemented by computer programs or equiv alent electrical circuits, microcode, or the like. Furthermore, it has also prov en conv enient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware or hardware.

[0066] In addition, the terms used to describe various quantities, data values, and computations are understood to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or the l ike, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical ( electronic ) quantities within the computer system memories or regi sters or other such information storage, transmission or display devices.

100671 ' The algorithms and di splays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description above. In addition, the present invention is not described with reference to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any references to specific languages are prov ided for disclosure of enablement and best mode of the present invention .

100681 Finally, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to del ineate or circumscribe the inventive subject matter. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.