CUSTOMIZED DIALOGUE SUPPORT - SONY INTERACTIVE ENTERTAINMENT LLC

Title:

CUSTOMIZED DIALOGUE SUPPORT

Document Type and Number:

WIPO Patent Application WO/2024/019817

Kind Code:

Abstract:

Systems and methods for customized dialogue support in virtual environments are provided. Dialogue maps stored in memory may specify dialogue triggers each associated with a corresponding dialogue instruction. Data regarding an interactive session associated with a user device may be monitored based on one or more of the stored dialogue maps. The presence of one of the dialogue triggers specified by the one or more dialogue maps may be detected based on the monitored data. Customized dialogue output may be generated in response to the detected dialogue trigger and based on the dialogue instruction corresponding to the detected dialogue trigger. The customized dialogue output may be provided to the interactive session in real-time with detection of the detected dialogue trigger.

Inventors:

FRYER-MCCULLOCH MORGAN (US)
DORN VICTORIA (US)
POWELL BRIELLE (US)
OSMAN STEVE (US)
NORTON GEOFF (US)

Application Number:

PCT/US2023/023866

Publication Date:

January 25, 2024

Filing Date:

May 30, 2023

Export Citation:

Click for automatic bibliography generation Help

Assignee:

SONY INTERACTIVE ENTERTAINMENT LLC (US)
SONY INTERACTIVE ENTERTAINMENT INC (JP)

International Classes:

G10L17/24; G10L21/16; G06F3/16

Foreign References:

US20210151070A1	2021-05-20
US7373300B1	2008-05-13
US20200218780A1	2020-07-09
US20200152184A1	2020-05-14

Attorney, Agent or Firm:

PHAM, Tam, Thanh et al. (US)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

WHAT IS CLAIMED IS:

1. A method for customized dialogue support in virtual environments, the method comprising: storing a plurality of dialogue maps in memory, each dialogue map specifying a plurality of dialogue triggers each associated with a corresponding dialogue instruction; monitoring data regarding an interactive session associated with a user device based on one or more of the stored dialogue maps; detecting when data associated with the interactive session indicates a presence of one of the dialogue triggers specified by the one or more dialogue maps; generating customized dialogue output in response to the detected dialogue trigger, wherein generating the customized dialogue output is based on the dialogue instruction corresponding to the detected dialogue trigger; and providing the customized dialogue output to the interactive session in real-time with detection of the detected dialogue trigger.

2. The method of claim 1, wherein detecting the presence of the detected dialogue trigger includes using one or more language usage models to predict the detected dialogue trigger.

3. The method of claim 2, wherein the language usage model is specific to a user of the user device, and wherein detecting the presence of the detected dialogue trigger is based on comparing a predetermined pattern of speech associated with the user to speech of the user during the interactive session.

4. The method of claim 2, wherein the dialogue instruction corresponding to the detected dialogue trigger is executable to modify an audio stream of speech by a user of the user device, and wherein the modified audio stream is provided to one or more other user devices in the interactive session in place of the audio stream.

5. The method of claim 2, wherein the dialogue instruction corresponding to the detected dialogue trigger is executable to modify a display of the user device to include one or more prompts associated with the predicted dialogue trigger.

6. The method of claim 5, wherein the prompts are selectable based on gaze data from gaze-tracking of the user within the virtual environment.

7. The method of claim 1, wherein at least one of the dialogue maps is specific to a predetermined theme, and wherein the dialogue triggers and the corresponding dialogue instruction include words and phrases associated with the predetermined theme.

8. The method of claim 1, wherein at least one of the dialogue maps specifies a dialogue trigger that includes at least one of gesture, controller input, or custom shortcut input.

9. The method of claim 1, wherein at least one of the dialogue maps specifies a dialogue instruction executable to play a pre-recorded audio clip in the interactive session.

10. The method of claim 1, further comprising customizing one of the dialogue maps for a user of the user device, wherein customizing the dialogue map includes: calibrating for one or more identified speech patterns and abilities of the user; and defining one or more of the dialogue triggers based on the calibration.

11. The method of claim 1, further comprising preconfiguring the corresponding dialogue instruction in accordance with a custom shortcut command, and defining one or more of the dialogue triggers based on the custom shortcut command.

12. A system for customized dialogue support in virtual environments, the system comprising: for customized dialogue support in virtual environments, the method comprising: memory that stores a plurality of dialogue maps, each dialogue map specifying a plurality of dialogue triggers each associated with a corresponding dialogue instruction; a communication interface that communicates over a communication network, wherein the communication interface receives data regarding an interactive session associated with a user device; and a processor that executes instructions stored in memory, wherein the processor executes the instructions to: monitor the received data regarding the interactive session based on one or more of the stored dialogue maps detect when data associated with the interactive session indicates a presence of one of the dialogue triggers specified by the one or more dialogue maps; and generate customized dialogue output in response to the detected dialogue trigger, wherein generating the customized dialogue output is based on the dialogue instruction corresponding to the detected dialogue trigger; wherein the communication interface provides the customized dialogue output to the interactive session in real-time with detection of the detected dialogue trigger.

13. The system of claim 12, wherein the processor detects the presence of the detected dialogue trigger by using one or more language usage models to predict the detected dialogue trigger.

14. The system of claim 13, wherein the language usage model is specific to a user of the user device, and wherein the processor detects the presence of the detected dialogue trigger based on comparing a predetermined pattern of speech associated with the user to speech of the user during the interactive session.

15. The system of claim 13, wherein the dialogue instruction corresponding to the detected dialogue trigger is executable to modify an audio stream of speech by a user of the user device, and wherein the modified audio stream is provided to one or more other user devices in the interactive session in place of the audio stream.

16. The system of claim 13, wherein the dialogue instruction corresponding to the detected dialogue trigger is executable to modify a display of the user device to include one or more prompts associated with the predicted dialogue trigger.

17. The system of claim 16, wherein the prompts are selectable based on gaze data from gaze-tracking of the user within the virtual environment.

18. The system of claim 12, wherein at least one of the dialogue maps is specific to a predetermined theme, and wherein the dialogue triggers and the corresponding dialogue instruction include words and phrases associated with the predetermined theme.

19. The system of claim 12, wherein at least one of the dialogue maps specifies a dialogue trigger that includes at least one of gesture, controller input, or custom shortcut input.

20. The system of claim 12, wherein at least one of the dialogue maps specifies a dialogue instruction executable to play a pre-recorded audio clip in the interactive session.

21. The system of claim 12, wherein the processor executes further instructions to customize one of the dialogue maps for a user of the user device, wherein the processor customizes the dialogue map by: calibrating for one or more identified speech patterns and abilities of the user; and defining one or more of the dialogue triggers based on the calibration.

22. The system of claim 1, wherein the processor executes further instructions to preconfigure the corresponding dialogue instruction in accordance with a custom shortcut command, and define one or more of the dialogue triggers based on the custom shortcut command.

23. A non-transitory, computer-readable storage medium, having embodied thereon a program executable by a processor to perform a method for customized dialogue support in virtual environments, the method comprising: storing a plurality of dialogue maps in memory, each dialogue map specifying a plurality of dialogue triggers each associated with a corresponding dialogue instruction; monitoring data regarding an interactive session associated with a user device based on one or more of the stored dialogue maps; detecting when data associated with the interactive session indicates a presence of one of the dialogue triggers specified by the one or more dialogue maps; generating customized dialogue output in response to the detected dialogue trigger, wherein generating the customized dialogue output is based on the dialogue instruction corresponding to the detected dialogue trigger; and providing the customized dialogue output to the interactive session in real-time with detection of the detected dialogue trigger.

AMENDED CLAIMS received by the International Bureau on 09 Oct 2023 (09-10-2023)

1. A method for customized dialogue support in virtual environments, the method comprising: storing a plurality of dialogue maps in memory, each dialogue map specifying a plurality of dialogue triggers each associated with a corresponding dialogue instruction; monitoring data regarding an interactive session associated with a user device based on one or more of the stored dialogue maps; applying a language usage model to analyze the data associated with the interactive session and data regarding a user associated with the user device, wherein applying the language usage model results in a prediction of when one of the dialogue triggers specified by the one or more dialogue maps is likely to occur; generating customized dialogue output in response to the predicted dialogue trigger, wherein generating the customized dialogue output is based on the dialogue instruction corresponding to the predicted dialogue trigger; and providing the customized dialogue output to the interactive session in real-time in accordance with when the predicted dialogue trigger is predicted to occur.

2. The method of claim 1, wherein the language usage model is a machine-learning model continually refined to analyze speech patterns for at least one of different users and different game titles.

3. The method of claim 1, wherein the language usage model is specific to the user of the user device, and wherein the prediction of when the predicted dialogue trigger is likely to occur is based on comparing a predetermined pattern of speech associated with the user to speech of the user during the interactive session.

AMENDED SHEET (ARTICLE 19)

4. The method of claim 1, wherein the dialogue instruction corresponding to the predicted dialogue trigger is executable to modify an audio stream of speech by a user of the user device, and wherein the modified audio stream is provided to one or more other user devices in the interactive session in place of the audio stream.

5. The method of claim 1, wherein the dialogue instruction corresponding to the predicted dialogue trigger is executable to modify a display of the user device to include one or more prompts associated with the predicted dialogue trigger.

6. The method of claim 5, wherein the prompts are selectable based on gaze data from gaze-tracking of the user within the virtual environment.

8. The method of claim 1, wherein at least one of the dialogue maps specifies a dialogue trigger that includes at least one of gesture, controller input, or custom shortcut input.

9. The method of claim 1, wherein at least one of the dialogue maps specifies a dialogue instruction executable to play a pre-recorded audio clip in the interactive session.

AMENDED SHEET (ARTICLE 19)

12. A system for customized dialogue support in virtual environments, the system comprising: memory that stores a plurality of dialogue maps, each dialogue map specifying a plurality of dialogue triggers each associated with a corresponding dialogue instruction; a communication interface that communicates over a communication network, wherein the communication interface receives data regarding an interactive session associated with a user device; and a processor that executes instructions stored in memory, wherein the processor executes the instructions to: monitor the received data regarding the interactive session based on one or more of the stored dialogue maps; apply a language usage model to analyze the data associated with the interactive session and data regarding a user associated with the user device, wherein applying the language usage model results in a prediction of when one of the dialogue triggers specified by the one or more dialogue maps is likely to occur; and generate customized dialogue output in response to the predicted dialogue trigger, wherein generating the customized dialogue output is based on the dialogue instruction corresponding to the predicted dialogue trigger; wherein the communication interface provides the customized dialogue output to the interactive session in real-time in accordance with when the predicted dialogue trigger is predicted to occur.

13. The system of claim 12, wherein the language usage model is a machine-learning model continually refined to analyze speech patterns for at least one of different users and different game titles.

AMENDED SHEET (ARTICLE 19)

14. The system of claim 12, wherein the language usage model is specific to the user of the user device, and wherein the prediction of when the predicted dialogue trigger is likely to occur is based on comparing a predetermined pattern of speech associated with the user to speech of the user during the interactive session.

15. The system of claim 12, wherein the dialogue instruction corresponding to the predicted dialogue trigger is executable to modify an audio stream of speech by a user of the user device, and wherein the modified audio stream is provided to one or more other user devices in the interactive session in place of the audio stream.

16. The system of claim 12, wherein the dialogue instruction corresponding to the predicted dialogue trigger is executable to modify a display of the user device to include one or more prompts associated with the predicted dialogue trigger.

17. The system of claim 16, wherein the prompts are selectable based on gaze data from gaze-tracking of the user within the virtual environment.

19. The system of claim 12, wherein at least one of the dialogue maps specifies a dialogue trigger that includes at least one of gesture, controller input, or custom shortcut input.

20. The system of claim 12, wherein at least one of the dialogue maps specifies a dialogue instruction executable to play a pre-recorded audio clip in the interactive session.

AMENDED SHEET (ARTICLE 19)

22. The system of claim 12, wherein the processor executes further instructions to preconfigure the corresponding dialogue instruction in accordance with a custom shortcut command, and define one or more of the dialogue triggers based on the custom shortcut command.

23. A non-transitory, computer-readable storage medium, having embodied thereon a program executable by a processor to perform a method for customized dialogue support in virtual environments, the method comprising: storing a plurality of dialogue maps in memory, each dialogue map specifying a plurality of dialogue triggers each associated with a corresponding dialogue instruction; monitoring data regarding an interactive session associated with a user device based on one or more of the stored dialogue maps; applying a language usage model to analyze the data associated with the interactive session and data regarding a user associated with the user device, wherein applying the language usage model results in a prediction of when one of the dialogue triggers specified by the one or more dialogue maps is likely to occur; generating customized dialogue output in response to the predicted dialogue trigger, wherein generating the customized dialogue output is based on the dialogue instruction corresponding to the predicted dialogue trigger; and providing the customized dialogue output to the interactive session in real-time in accordance with when the predicted dialogue trigger is predicted to occur.

AMENDED SHEET (ARTICLE 19)

Description:

CUSTOMIZED DIALOGUE SUPPORT

BACKGROUND OF THE INVENTION

1. Field of the Invention

[0001] The present invention generally relates to dialogue support. More specifically, the present invention relates to communication analytics and customized dialogue support in virtual environments.

2. Description of the Related Art

[0002] Presently available digital content (e.g., video games) may allow for interaction with other users within a virtual environment. Thus, many users may enjoy playing such digital content titles in social settings that allow for competitive gameplay, team gameplay, and other social interactions with other users (e.g., friends, teammates, competitors, spectators). Such social interaction— which may include text-based, voice, or video chat— may take place on one or more different platforms (e.g., game platform server, lobby server, chat server, other service provider). Thus, the users in the same interactive session may have different options for communicating with each other, as well as interacting with the virtual environment. Such communications may not only enhance success in gameplay, but enhances social relationships and enjoyment of the interactive content title as well.

[0003] Some interactive content titles allow for numerous players to engage with each simultaneously within an in-game virtual environment. In addition, the virtual environment may include a number of in-game events occurring at a fast pace, thereby requiring users to react quickly in order to succeed in gameplay. Thus, efficient and effective player communication may be crucial to successfully navigate increasingly complex virtual environments. Some users, however— particularly those with stutters, lisps, or other speech or verbal fluency conditions and disabilities affecting speech or verbalization— may face obstacles or otherwise have difficulty in communicating effectively at the pace of others. As a result, such communication obstacles and difficulties may adversely affect their gameplay success, enjoyment, and social experience with an interactive game title.

[0004] There is, therefore, a need in the art for improved systems and methods of customized dialogue support in virtual environments.

SUMMARY OF THE CLAIMED INVENTION

[0005] Embodiments of the present invention include systems and methods for customized dialogue support in virtual environments. Dialogue maps stored in memory may specify dialogue triggers each associated with a corresponding dialogue instruction. Data regarding an interactive session associated with a user device may be monitored based on one or more of the stored dialogue maps. The presence of one of the dialogue triggers specified by the one or more dialogue maps may be detected based on the monitored data. Customized dialogue output may be generated in response to the detected dialogue trigger and based on the dialogue instruction corresponding to the detected dialogue trigger. The customized dialogue output may be provided to the interactive session in real-time with detection of the detected dialogue trigger.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] FIG. 1 illustrates a network environment in which a system for customized dialogue support in virtual environments may be implemented.

[0007] FIG. 2 illustrates an exemplary uniform data system (UDS) that may be used to provide data to a system for customized dialogue support in virtual environments.

[0008] FIG. 3 is a flowchart illustrating an exemplary method for customized dialogue support in virtual environments.

[0009] FIG. 4 is a diagram illustrating an exemplary implementation of customized dialogue support in virtual environments.

[0010] FIG. 5 is a block diagram of an exemplary electronic entertainment system that may be used in embodiments of the present invention.

DETAILED DESCRIPTION

[0011] Embodiments of the present invention include systems and methods for customized dialogue support in virtual environments. Systems and methods for customized dialogue support in virtual environments are provided. Dialogue maps stored in memory may specify dialogue triggers each associated with a corresponding dialogue instruction. Data regarding an interactive session associated with a user device may be monitored based on one or more of the stored dialogue maps. The presence of one of the dialogue triggers specified by the one or more dialogue maps may be detected based on the monitored data. Customized dialogue output may be generated in response to the detected dialogue trigger and based on the dialogue instruction corresponding to the detected dialogue trigger. The customized dialogue output may be provided to the interactive session in real-time with detection of the detected dialogue trigger. [0012] FIG. 1 illustrates a network environment 100 in which a system for customized dialogue support in virtual environments may be implemented. The network environment 100 may include one or more content source servers 110 that provide digital content (e.g., games, other applications and services) for distribution, one or more content provider server application program interfaces (APIs) 120, content delivery network server 130, dialogue analytics server 140, and one or more user devices 150. The devices in network environment 100 communicate with each other using one or more communication networks, which may include a local, proprietary network (e.g., an intranet) and/or may be a part of a larger wide-area network. The communications networks may be a local area network (LAN), which may be communicatively coupled to a wide area network (WAN) such as the Internet. The Internet is a broad network of interconnected computers and servers allowing for the transmission and exchange of Internet Protocol (IP) data between users connected through a network service provider. Examples of network service providers are the public switched telephone network, a cable service provider, a provider of digital subscriber line (DSL) services, or a satellite service provider. One or more communications networks allow for communication between the various components of network environment 100. [0013] The servers described herein may include any type of server as is known in the art, including standard hardware computing components such as network and media interfaces, non-transitory computer-readable storage (memory), and processors for executing instructions or accessing information that may be stored in memory. The functionalities of multiple servers may be integrated into a single server. Any of the aforementioned servers (or an integrated server) may take on certain client-side, cache, or proxy server characteristics. These characteristics may depend on the particular network placement of the server or certain configurations of the server.

[0014] Content source servers 110 may maintain and provide a variety of digital content and digital services available for distribution over a communication network. The content source servers 110 may be associated with any content provider that makes its content available for access over a communication network. The content source servers 110 may therefore host a variety of different interactive content titles, which may further have be associated with object data regarding a digital or virtual object (e.g., activity information, zone information, character information, player information, other game media information, etc.) displayed in a digital or virtual environment during an interactive session.

[0015] Such content may include not only digital video and games, but also other types of digital applications and services. Such applications and services may include any variety of different digital content and functionalities that may be provided to user devices 150, including providing and supporting chat and other communication channels. The chat and communication services may be inclusive of voice-based, text-based, and video-based messages. Thus, a user device 150 may participate in a gameplay session concurrent and/or associated with one or more communication sessions, and the gameplay and communication sessions may be hosted on one or more of the content source servers 110.

[0016] The content from content source server 110 may be provided through a content provider server API 120, which allows various types of content source servers 110 to communicate with other servers in the network environment 100 (e.g., user devices 150). The content provider server API 120 may be specific to the particular operating language, system, platform, protocols, etc., of the content source server 110 providing the content, as well as the user devices 150 and other devices of network environment 100. In a network environment 100 that includes multiple different types of content source servers 110, there may likewise be a corresponding number of content provider server APIs 120 that allow for various formatting, conversion, and other cross-device and cross-platform communication processes for providing content and other services to different user devices 150, which may each respectively use different operating systems, protocols, etc., to process such content. As such, applications and services in different formats may be made available so as to be compatible with a variety of different user device 150. In a network environment 100 that includes multiple different types of content source servers 110, content delivery network servers 130, dialogue analytics server 140, user devices 150, and databases 160, there may likewise be a corresponding number of APIs managed by content provider server APIs 120.

[0017] The content provider server API 120 may further facilitate access of each of the user devices 150 to the content hosted or services provided by the content source servers 110, either directly or via content delivery network server 130. Additional information, such as metadata, about the accessed content or service can also be provided by the content provider server API 120 to the user device 150. As described below, the additional information (e.g., object data, metadata) can be usable to provide details about the content or service being provided to the user device 150. In some embodiments, the services provided from the content source servers 110 to the user device 150 via the content provider server API 120 may include supporting services that are associated with other content or services, such as chat services, ratings, and profiles that are associated with a particular game, team, community, etc. In such cases, the content source servers 110 may also communicate with each other via the content provider server API 120.

[0018] The content delivery network server 130 may include a server that provides resources, files, etc., related to the content from content source servers 110, including various content and service configurations, to user devices 150. The content delivery network server 130 can also be called upon by the user devices 150 that request to access specific content or services. Content delivery network server 130 may include universe management servers, game servers, streaming media servers, servers hosting downloadable content, and other content delivery servers known in the art

[0019] Dialogue analytics server 140 may include any data server known in the art that is capable of communicating with the different content source servers 110, content provider server APIs 120, content delivery network server 130, user devices 150, and databases 160. Such dialogue analytics server 140 may be implemented on one or more cloud servers that carry out instructions associated with interactive content (e.g., games, activities, video, podcasts, User Generated Content ("UGC"), publisher content, etc.). The dialogue analytics servers 140 may further carry out instructions, for example, for monitoring one or more interactive sessions based on one or more selected dialogue maps. Specifically, the dialogue analytics server 140 may monitor for one or more dialogue triggers as specified by the selected dialogue maps. When the dialogue triggers are detected, the dialogue analytics server 140 may generate customized dialogue output in accordance with the corresponding dialogue instruction specified by the dialogue map and provide the customized dialogue output within the interactive session.

[0020] The user device 150 may include a plurality of different types of computing devices. The user device 150 may be a server that provides an internal service (e.g., to other servers) in network environment 100. In such cases, user device 150 may correspond to one of the content servers 110 described herein. Alternatively, the user device 150 may be a computing device that may include any number of different gaming consoles, mobile devices, laptops, and desktops. Such user devices 150 may also be configured to access data from other storage media, such as, but not limited to memory cards or disk drives as may be appropriate in the case of downloaded services. Such user devices 150 may include standard hardware computing components such as, but not limited to network and media interfaces, non-transitory computer- readable storage (memory), and processors for executing instructions that may be stored in memory. These user devices 150 may also run using a variety of different operating systems (e.g., iOS, Android), applications or computing languages (e.g., C++, JavaScript). An exemplary client device 150 is described in detail herein with respect to FIG. 5. Each user device 150 may be associated with participants (e.g., players) or other types (e.g., spectators) of users in relation to a collection of digital content streams.

[0021] While pictured separately, the databases 160 may be stored on any of the servers and devices illustrated in network environment 100 on the same server, on different servers, or on any of the user devices 150. Such databases 160 may store or link to various sources and services used for dialogue analytics and customized dialogue output generation. In addition, databases 160 may store dialogue maps, as well as language usage models, which may be specific to a particular user, user group or team, user category or condition, themes, content genre, content title, sound types, etc. Language usage models may be developed and refined over time for use in identifying different types of speech patterns associated with certain speech conditions (e.g., stutter), which may serve as a dialogue trigger of a customized dialogue output that replaces certain speech and/or provides speech assistance or support to the user.

[0022] In some implementations, dialogue maps may be newly generated, stored in databases 160, and shared (including being made available for purchase) with other users, as well as further customized to the specific needs and preferences of an individual user. Thus, dialogue maps may also be stored in the databases 160 in association with each user for whom the maps are customized. Dialogue triggers may also be defined based on gameplay data regarding the user (e.g., user progress in an activity and/or media content title, user ID, user game characters, etc.) or other parameters. The dialogue map may further specify that each dialogue trigger corresponds to a dialogue instruction executable to generated customized dialogue output, which may be provided to the user device of the monitored user and/or to other users in the same interactive session.

[0023] Language usage models may be built and continually refined to analyze speech patterns related to dialogue triggers, such as stutters. Different models may be developed for different users, as well as different game titles (and common words and phrases used in such game titles). Such language models may be used not only to identify when certain audio sounds match a trigger, but also to predict when a trigger is likely to occur based on data regarding the current interactive session and its in-game events and conditions. Artificial intelligence and machine learning techniques may continually refine such models over time as more data is gathered regarding the user or similarly-situated users, as well as new game titles. The language usage models may also be used to analyze new users and make recommendations as to which models may be most applicable to the characteristics and conditions of the user and current interactive session.

[0024] For example, certain users with certain speech conditions may find certain sounds, words, or phrases difficult to verbalize. In the context of a game or other interactive activity, the inability to speak effectively to other players in the current game session may represent an accessibility barrier that prevents the user from succeeding at or enjoying session interaction. Such a user may preconfigure customized dialogue maps with dialogue triggers corresponding to sounds, words, or phrases that are particularly challenging to the user (or users having a common speech condition). The dialogue triggers may not necessarily be the sounds, words, or phrases themselves, but rather associated indicators that such sounds, words, or phrases may need to be spoken. In addition, the dialogue trigger may also include physical gestures (e.g., captured by a camera, motion detector, or other sensor) or controller-based inputs (e.g., entered using a game controller, keyboard, keypad, touchscreen, or other input devices), which allow the user to indicate when the user predicts or identifies that a predetermined sound, word, or phrase will need to be spoken. Such mapped inputs may therefore serve as a custom shortcut command to implement associated dialogue support options.

[0025] The user may also preconfigure corresponding dialogue instructions that are executable to process the portion of the session stream including the detected dialogue trigger in customized ways. For example, the dialogue instructions may refer to specific pre-recorded speech audio clips by the user (or other users, computer-generated voices, or combinations of the same) or preconfigured text messages, which may also be stored in databases and retrieved as needed in accordance with the dialogue instruction and associated dialogue trigger. A user may therefore customize a dialogue map by defining new types of dialogue triggers and specifying which dialogue instruction(s) to execute when the dialogue trigger is detected. The dialogue instruction may be executable to output specific types of dialogue, including translation into text-based messages, one or more pre-recorded sounds or clips, or modifications to a current recording. The modifications may include slowing down portions of the recording (e.g., sounds leading up to predicted stutter), speeding up portions of the recording (e.g., stutter sounds), otherwise standardizing or smoothing the pacing of speech (versus speech that abruptly starts and stops), changes in pitch, tonality, or other character of speech, including modifications to sound like famous voices, fanciful or creature voices, or other voice or sound modifications as desired by the user to adjust how their voice may be presented to others. Such custom dialogue maps may be shared with other users, as well as bought or sold as add-on purchases associated with a game console, game title, or accessory product

[0026] In exemplary implementations, a microphone may also be used to capture data regarding speech by the user in real-time during a current interactive session. The speech by the user may also be analyzed by dialogue analytics server 140 in accordance with one or more dialogue maps, the selection of which may be based on user preference or analytics that match the user with dialogue maps selected by or for similar users. In some instances, the user may wish to apply modifications to their own voice to one or more audiences. Different modifications may be applied to different streams. For example, the user's voice may be modified one way in association with one game title (e.g., modified to sound like an in-game character's voice associated with a game title being played in the current session), while being modified in a different way in association with another stream (e.g., modified to lower frequency response, increase or decrease volume, slow down speed, auto-complete or auto-correct broken words or phrases, modified to sound like the voice of a favorite character, celebrity, etc.). In some implementations, the user may further customize a dialogue map upon request based on a calibration session in which the specific speech patterns of the user may be analyzed to set thresholds regarding dialogue triggers and modulate responsive dialogue outputs. As such, user speech may be automatically modified and presented in different ways to different users in accordance with different dialogue maps in accordance with user preferences and characteristics.

[0027] Where the user may not wish to modify the sound of their voice, they may opt to specify that the dialogue support output merely provide support or assistance. For example, where a dialogue trigger is detected (or predicted), the dialogue map may specify dialogue instructions corresponding to presentation of available options for prompts that are less likely to trigger a stutter. The dialogue prompts may be pre-selected or identified based on machine learning applied to developing language usage models as to which sounds, words, and phrases are more or less likely to result in a stutter or other trigger condition. Such dialogue prompts may be provided and presented only within a display of the user device of the monitored user, such that other users in the interactive session may not see such display. Upon being presented with the options, the monitored user may input such selection based on gaze-tracking (e.g., by holding their gaze at the desired option for a predetermined period of time).

[0028] FIG. 2 illustrates an exemplary uniform data system (UDS) 200 that may be used to provide data to a system for customized dialogue support in virtual environments. Based on data provided by UDS 200, dialogue analytics server 140 can be made aware of the current session conditions, e.g., what in-game objects, entities, activities, and events that users have engaged with, and thus support analysis of and coordination of customized dialogue support by dialogue analytics server 140 with current gameplay and in-game activities. Each user interaction may be associated the metadata for the type of in-game interaction, location within the in-game environment, and point in time within an in-game timeline, as well as other players, objects, entities, etc., involved. Thus, metadata can be tracked for any of the variety of user interactions that can occur in during a game session, including associated activities, entities, settings, outcomes, actions, effects, locations, and character stats. Such data may further be aggregated, applied to data models, and subject to analytics. Such a UDS data model may be used to assign contextual information to each portion of information in a unified way across games.

[0029] For example, various content titles may depict one or more objects (e.g., involved in in-game activities) with which a user can interact and/or UGC (e.g., screen shots, videos, commentary, mashups, etc.) created by peers, publishers of the media content titles and/or third party publishers. Such UGC may include metadata by which to search for such UGC. Such UGC may also include information about the media and/or peer. Such peer information may be derived from data gathered during peer interaction with an object of an interactive content title (e.g., a video game, interactive book, etc.) and may be "bound" to and stored with the UGC. Such binding enhances UGC as the UGC may deep link (e.g., directly launch) to an object, may provide for information about an object and/or a peer of the UGC, and/or may allow a user to interact with the UGC.

[0030] As illustrated in FIG. 2, an exemplary console 228 (e.g., a user device 130) and exemplary servers 218 (e.g., streaming server 220, an activity feed server 224, an user-generated content (UGC) server 232, and an object server 226) are shown. In one example, the console 228 may be implemented on the platform server 120, a cloud server, or on any of the servers 218. In an exemplary example, a content recorder 202 may be implemented on the platform server 120, a cloud server, or on any of the servers 218. Such content recorder 202 receives and records content (e.g., media) from an interactive content title 230 onto a content ring-buffer 208. Such ring-buffer 208 may store multiple content segments (e.g., vl, v2 and v3), start times for each segment (e.g., V1_START_TS, V2_START_TS, V3_START_TS), and end times for each segment (e.g., V1_END_TS, V2_END_TS, V3_END_TS). Such segments may be stored as a media file 212 (e.g., MP4, WebM, etc.) by the console 228. Such media file 212 may be uploaded to the streaming server 220 for storage and subsequent streaming or use, though the media file 212 may be stored on any server, a cloud server, any console 228, or any user device 130. Such start times and end times for each segment may be stored as a content time stamp file 214 by the console 228. Such content time stamp file 214 may also include a streaming ID, which matches a streaming ID of the media file 212, thereby associating the content time stamp file 214 to the media file 212. Such content time stamp file 214 may be uploaded and stored to the activity feed server 224 and/or the UGC server 232, though the content time stamp file 214 may be stored on any server, a cloud server, any console 228, or any user device 130.

[0031] Concurrent to the content recorder 202 receiving and recording content from the interactive content title 230, an object library 204 receives data from the interactive content title 230, and an object recorder 206 tracks the data to determine when an object beings and ends. The object library 204 and the object recorder 206 may be implemented on the platform server 120, a cloud server, or on any of the servers 218. When the object recorder 206 detects an object beginning, the object recorder 206 receives object data (e.g., if the object were an activity, user interaction with the activity, activity ID, activity start times, activity end times, activity results, activity types, etc.) from the object library 204 and records the activity data onto an object ring- buffer 210 (e.g., ActivitylDl, START_TS; ActivityID2, START TS; ActivityID3, START TS). Such activity data recorded onto the object ring-buffer 210 may be stored in the object file 216. Such object file 216 may also include activity start times, activity end times, an activity ID, activity results, activity types (e.g., competitive match, quest, task, etc.), user or peer data related to the activity. For example, an object file 216 may store data regarding an item used during the activity. Such object file 216 may be stored on the object server 226, though the object file 216 may be stored on any server, a cloud server, any console 228, or any user device 130.

[0032] Such object data (e.g., the object file 216) may be associated with the content data (e.g., the media file 212 and/or the content time stamp file 214). In one example, the UGC server 232 stores and associates the content time stamp file 214 with the object file 216 based on a match between the streaming ID of the content time stamp file 214 and a corresponding activity ID of the object file 216. In another example, the object server 226 may store the object file 216 and may receive a query from the UGC server 232 for an object file 216. Such query may be executed by searching for an activity ID of an object file 216 that matches a streaming ID of a content time stamp file 214 transmitted with the query. In yet another example, a query of stored content time stamp files 214 may be executed by matching a start time and end time of a content time stamp file 214 with a start time and end time of a corresponding object file 216 transmitted with the query. Such object file 216 may also be associated with the matched content time stamp file 214 by the UGC server 232, though the association may be performed by any server, a cloud server, any console 228, or any user device 130. In another example, an object file 216 and a content time stamp file 214 may be associated by the console 228 during creation of each file 216, 214.

[0033] In exemplary embodiments, the media files 212 and activity files 216 may provide information to dialogue analytics server 140 regarding current session conditions, which may also be used as another basis for making predictions as to upcoming dialogue triggers, as well as identifying appropriate dialogue support outputs. Dialogue analytics server 140 may therefore use such media files 212 and activity files 216 to identify specific conditions of the current session, including currently speaking or noise-producing players, characters, and objects at specific locations and events. Based on such files 212 and 216, for example, dialogue analytics server 140 may identify a significance level of the in-game event (e.g., significant battles, proximity to breaking records), which may be used to interpret user speech within a particular scene and to filter available dialogue instructions for suitability to current status. Such session conditions may drive how the audio of different audio streams may be interpreted, thereby resulting in decisions as to which dialogue instructions are applied, how such dialogue instructions are applied, and provided to which user devices.

[0034] FIG. 3 is a flowchart illustrating an exemplary method 300 for customized dialogue support in virtual environments. The method 300 of FIG. 3 may be embodied as executable instructions in a non-transitory computer readable storage medium including but not limited to a CD, DVD, or non-volatile memory such as a hard drive. The instructions of the storage medium may be executed by a processor (or processors) to cause various hardware components of a computing device hosting or otherwise accessing the storage medium to effectuate the method. The steps identified in FIG. 3 (and the order thereof) are exemplary and may include various alternatives, equivalents, or derivations thereof including but not limited to the order of execution of the same.

[0035] In step 310, dialogue maps may be stored in memory (e.g., databases 160). The dialogue profile may include one or more predetermined dialogue triggers and associated dialogue instructions executable to provide customized dialogue output to an interactive session. Different users may be matched to different dialogue maps in accordance with calibration to assess their speech patterns or in accordance with express request or preference. For example, a user may specify a specific type of neurological or physical stutter, and dialogue analytics server 140 may identify which dialogue maps may be most suitable. In some instances, a new user may develop a new dialogue map or customize a current dialogue map in accordance with their own speech patterns (in defining dialogue triggers), as well as personal preferences and priorities in relation to different sounds (in defining dialogue instructions). Dialogue analytics server 140 may also query the new user in order to identify how to generate the custom dialogue map, or to customize specific dialogue triggers or instructions in an existing dialogue map. Dialogue maps may also be specific to different game titles or genre, correlating different types of dialogue triggers with words and phrases commonly used in relation to such game titles and genres. As discussed above, a different set of dialogue maps may be stored for each user.

[0036] In step 320, one or more audio streams associated with the user device 150 may be monitored by dialogue analytics server 140. A user using user device 150 to stream and play digital content in a current session may provide audio— through speech and other audio sounds— captured in an audio stream associated with the digital content. One or more microphones in the real-world environment of the user and associated with the user device 150 (e.g., embedded in headphones, controllers, video cameras) may capture such audio and provide to dialogue analytics server 150. The dialogue analytics server 140 may monitor all such audio streams (or any specific subset) based on one or more dialogue maps selected for the user. [0037] In step 330, the audio streams are analyzed by dialogue analytics server 140 against the selected dialogue maps. In particular, the dialogue triggers specified by the dialogue maps may be detected within each of the audio streams. For example, certain triggers associated with undesired speech results (e.g., lisps, stutters) may be identified or predicted in relation to current and upcoming in-game events. Once a dialogue trigger is detected (e.g., identified or predicted), the dialogue map may be consulted to determine which dialogue instruction may be associated with the detected trigger for application to the current interaction session.

[0038] In step 340, a custom dialogue output may be generated based on the dialogue instructions specified by the dialogue map as being associated with the detected dialogue trigger. Based on the analysis performed n step 330, dialogue analytics server 140 may determine that a particular dialogue trigger is present (or will soon be present) within the current interaction session and apply the corresponding dialogue instructions to provide dialogue support to the monitored user. Such support may be in the form of substitute or replacement speech by a pre-recorded clip of the user's own natural voice, as well as other prerecorded or computer-generated voices and sounds, which are provided to the other user devices in the same interactive session in place of the identified or predicted dialogue trigger. In some instances, the user may opt to have alternative words and phrases provided as dialogue prompts in lieu of having to speak the triggering words or phrases. The dialogue prompts may also be selectable (based on any combination of input including gaze-tracking) to trigger prerecorded custom audio clips.

[0039] In step 350, the customized dialogue output may be provided to the current interactive session. Depending on the dialogue instruction, such customized dialogue output may be provided only to the user device of the monitored user or to other user devices of other users. For example, selectable dialogue prompts may only be presented to the monitored user in a private window or overlay that is not provided to nor viewable by other users. Where the customized dialogue output may be substitute or replacement audio, however, such audio may be played in the current interactive session in place of the user-generated audio that was identified as a dialogue trigger. Because prediction may be used to identify a dialogue trigger, the associated customized dialogue output may be generated and provided to the interactive session in real-time or near real-time with detection of the dialogue trigger.

[0040] FIG. 4 is a diagram illustrating an exemplary implementation of customized dialogue support in virtual environments. As illustrated, different data streams 410A-B associated with a current interactive session of a user device may be provided to dialogue analytics server 140 for analysis and processing. Interactive data stream 410A may be associated with an interactive content title and include regarding with play of that interactive content title, which may be used to interpret and detect dialogue triggers. Social communication stream 410B may be associated with communications by the user of the user device 150. The social communication stream 410B may be analyzed by dialogue analytics server 140 in conjunction with the interactive session data 410A and one or more dialogue maps 420 to assess and detect dialogue triggers and to generate customized dialogue output in accordance with corresponding dialogue instructions.

[0041] Dialogue analytics server 140 may select and obtain the dialogue map(s) from one of the databases 160 based on characteristics of the user or the game title of the current interactive session. Using the selected dialogue map(s), dialogue analytics server 140 may analyze various session parameters that occur concurrently across the different data streams 410A-B. For example, in-game event data from interactive data stream 410A may be occurring concurrently with voice chat messages from social communications stream 410B. Data from the different streams may be analyzed by the dialogue analytics server 140 to identify (predict) dialogue triggers. The dialogue analytics server 140 may further identify what dialogue instructions are specified by the dialogue map(s) in association with the dialogue trigger. The identified dialogue instructions may then be retrieved— along with any associated pre-recorded clips— and executed by dialogue analytics server 140 to generate one or more custom dialogue outputs 430. The custom dialogue output 430 may then be provided to the user device 150 of the monitored user privately or provided to all user devices in the same current interactive session. [0042] FIG. 5 is a block diagram of an exemplary electronic entertainment system that may be used in embodiments of the present invention. The entertainment system 500 of FIG. 5 includes a main memory 505, a central processing unit (CPU) 510, vector unit 515, a graphics processing unit 520, an input/output (I/O) processor 525, an I/O processor memory 530, a controller interface 535, a memory card 540, a Universal Serial Bus (USB) interface 545, and an IEEE interface 550. The entertainment system 500 further includes an operating system readonly memory (OS ROM) 555, a sound processing unit 560, an optical disc control unit 570, and a hard disc drive 565, which are connected via a bus 575 to the I/O processor 525.

[0043] Entertainment system 500 may be an electronic game console. Alternatively, the entertainment system 500 may be implemented as a general-purpose computer, a set-top box, a hand-held game device, a tablet computing device, or a mobile computing device or phone. Entertainment systems may contain more or less operating components depending on a particular form factor, purpose, or design.

[0044] The CPU 510, the vector unit 515, the graphics processing unit 520, and the I/O processor 525 of FIG. 5 communicate via a system bus 585. Further, the CPU 510 of FIG. 5 communicates with the main memory 505 via a dedicated bus 580, while the vector unit 515 and the graphics processing unit 520 may communicate through a dedicated bus 590. The CPU 510 of FIG. 5 executes programs stored in the OS ROM 555 and the main memory 505. The main memory 505 of FIG. 5 may contain pre-stored programs and programs transferred through the I/O Processor 525 from a CD-ROM, DVD-ROM, or other optical disc (not shown) using the optical disc control unit 570. I/O Processor 525 of FIG. 5 may also allow for the introduction of content transferred over a wireless or other communications network (e.g., 4$, LTE, 3G, and so forth). The I/O processor 525 of FIG. 5 primarily controls data exchanges between the various devices of the entertainment system 500 including the CPU 510, the vector unit 515, the graphics processing unit 520, and the controller interface 535.

[0045] The graphics processing unit 520 of FIG. 5 executes graphics instructions received from the CPU 510 and the vector unit 515 to produce images for display on a display device (not shown). For example, the vector unit 515 of FIG. 5 may transform objects from three- dimensional coordinates to two-dimensional coordinates, and send the two-dimensional coordinates to the graphics processing unit 520. Furthermore, the sound processing unit 560 executes instructions to produce sound signals that are outputted to an audio device such as speakers (not shown). Other devices may be connected to the entertainment system 500 via the USB interface 545, and the IEEE 1394 interface 550 such as wireless transceivers, which may also be embedded in the system 500 or as a part of some other component such as a processor.

[0046] A user of the entertainment system 500 of FIG. 5 provides instructions via the controller interface 535 to the CPU 510. For example, the user may instruct the CPU 510 to store certain game information on the memory card 540 or other non-transitory computer-readable storage media or instruct a character in a game to perform some specified action.

[0047] The present invention may be implemented in an application that may be operable by a variety of end user devices. For example, an end user device may be a personal computer, a home entertainment system (e.g., Sony PlayStation2® or Sony PlayStation3® or Sony PlayStation4®), a portable gaming device (e.g., Sony PSP® or Sony Vita®), or a home entertainment system of a different albeit inferior manufacturer. The present methodologies described herein are fully intended to be operable on a variety of devices. The present invention may also be implemented with cross-title neutrality wherein an embodiment of the present system may be utilized across a variety of titles from various publishers.

[0048] The present invention may be implemented in an application that may be operable using a variety of devices. Non-transitory computer-readable storage media refer to any medium or media that participate in providing instructions to a central processing unit (CPU) for execution. Such media can take many forms, including, but not limited to, non-volatile and volatile media such as optical or magnetic disks and dynamic memory, respectively. Common forms of non-transitory computer-readable media include, for example, a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic medium, a CD-ROM disk, digital video disk (DVD), any other optical medium, RAM, PROM, EPROM, a FLASHEPROM, and any other memory chip or cartridge.

[0049] Various forms of transmission media may be involved in carrying one or more sequences of one or more instructions to a CPU for execution. A bus carries the data to system RAM, from which a CPU retrieves and executes the instructions. The instructions received by system RAM can optionally be stored on a fixed disk either before or after execution by a CPU. Various forms of storage may likewise be implemented as well as the necessary network interfaces and network topologies to implement the same.

[0050] The foregoing detailed description of the technology has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the technology to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. The described embodiments were chosen in order to best explain the principles of the technology, its practical application, and to enable others skilled in the art to utilize the technology in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the technology be defined by the claim.

Previous Patent: WORKSPACE DATABASES

Next Patent: INTENT IDENTIFICATION FOR DIALOGUE SUPPORT