Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
INTERACTIVE VOICE RESPONSE DEVICES WITH 3D-SHAPED USER INTERFACES
Document Type and Number:
WIPO Patent Application WO/2019/036569
Kind Code:
A1
Abstract:
Methods, devices, and systems are provided to deliver voice activated response, content, utility functions, and applications for a user's entertainment, engagement, companionship, and task performance via 3D-shaped audio-visual user interfaces on interactive voice response (IVR) devices. The devices and systems include voice activated interactive chat, imaging, visualization, and recognition capabilities. In addition, 3D-shaped audio-visual user interfaces, including one or more than one flat or flexible 2D audio-video display panels, are provided as according to the 3D-shapes of the IVR devices. The IVR devices with 3D-shaped audio-visual user interfaces are able to deliver device responses, content, utility functions, and applications to a user or a group of users in a user preferred environment and ambiance.

Inventors:
FAVIS STEPHEN (US)
SRIVASTAVA DEEPAK (US)
Application Number:
PCT/US2018/046849
Publication Date:
February 21, 2019
Filing Date:
August 17, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TAECHYON ROBOTICS CORP (US)
International Classes:
G06F17/00; G05B15/00; G05B19/418; G06F9/00; G06N3/00
Foreign References:
US8996429B12015-03-31
US20130054021A12013-02-28
US20020010584A12002-01-24
US20150382280A12015-12-31
Attorney, Agent or Firm:
SCARITO, John, D. et al. (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A method for providing 3D-shape based audio-visual user-interfaces (UIs) for voice activated interactions with users for entertainment, engagement, companionship, and performance of utility functions for user preferred environment, ambiance, functionalities, and applications via interactive voice response (IVR) devices, wherein the method comprises: providing a device with voice or speech activated IVR capabilities to interactively speak with a user or a group of users to receive their input as a request, engage them in conversation or entertainment, perform utility based tasks and deliver an audio-visual output response back to the user or the group of users for user preferred environment, ambiance, functionalities, and applications during user-device interactions;

providing the device with scanning, imaging, recognition, and perception capabilities using a 360° view camera to scan, image, recognize, and perceive the user or the group of users, their background, environment, and ambiance to select and deliver suitable audio- visual scenarios from a database of such scenarios, in addition to the IVR audio-visual output response of the device, to the user or the group of users for the user preferred environment, ambiance, functionalities, and applications during the user-device interactions;

providing the device with the database and data-management capabilities to write, delete, store, select, and deliver text, image, audio, and audio-video clips, files, characters, scenes, and scenarios to create the audio-visual scenes or scenarios delivered on the 3D- shaped audio-visual user-interfaces (UIs) as according to a 3D-shape of the device for the user preferred environment, ambiance, functionalities, and applications during the user- device interactions;

providing the device with display capabilities to deliver and display the text, image, audio, and audio-video clips, files, characters, scenes, and scenarios on the interactive 3D- shaped audio-visual user-interfaces (UIs) comprising one or more than one flat or flexible 2D display panel configured as according to the 3D-shape of the device for the user preferred environment, ambiance, functionalities, and applications during the user-device interactions; and

providing the device with capabilities to deliver interactive voice activated

entertaining content and responses for user engagement and companionship as well as the functionalities of performing task or utility based functions for the user or the group of users in the user preferred environment, ambiance, functionalities, and applications during the user- device interactions.

2. The method of claim 1, wherein the 3D-shape of the device is a cylinder of circular or polygonal cross-section, and wherein a number of vertices in the polygonal cross-section are three or more than three with an equal number of planar surfaces joined with or without seams surrounding the circumferential sides of the cylindrical device.

3. The method of claim 2, wherein each planar surface forming a side of the cylindrical device supports a flat panel display to form an overall 3D-shaped cylindrical audio-visual display comprising three or more than three flat or flexible panel displays to provide a 3D- shaped audio-visual user interface (UI) to provide the text, image, audio, and audio-video clips, files, characters, scenes, and scenarios as output during passive or active IVR interactions between the device and the user or the group of users.

4. The method of claim 2, wherein the 3D-shape of the device is a cylinder of circular cross-section with a cylindrical surface surrounding the circumferential sides of the cylindrical device supporting one or more than one flexible panel display to form a cylindrical 3D-shaped audio-visual user interface (UI) to provide the text, image, audio, and audio-video clips, files, characters, scenes, and scenarios as output during passive or active IVR interactions between the device and the user or the group of users.

5. The method of claim 4, wherein a ratio of a height of the cylindrical device versus a diameter of the cylindrical device of the circular cross-section is any number larger than 0.10 and equal to or smaller than 10.0.

6. The method of claim 2, wherein a ratio of a height of the cylindrical device versus an enveloping diameter of the cylindrical device of the polygonal cross-section is any number larger than 0.10 and equal to or smaller than 10.0, and wherein the enveloping diameter of the polygonal cross-section is the diameter of an imaginary circle completely enclosing the polygonal cross-section with the circumference passing through two or more than two vertices of the polygonal cross-section.

7. The method of claim 2, wherein the sides of the polygonal cross-section are equal to each other to form a square cross-section for a polygonal cross-section of four vertices or not equal to each other to form a rectangular cross-section for a polygonal cross-section of four vertices.

8. The method of claim 2, wherein a height of the cylindrical device is any number larger than 0.10 feet and equal to or smaller than 6.0 feet, and wherein a diameter of the cylindrical device of circular cross section or an enveloping diameter of the cylindrical device of polygonal cross-section is any number larger than 0.10 feet and equal to or smaller than 4.0 feet.

9. The method of claim 1, wherein the 3D-shape of the device is a polygonal pyramid, and wherein a number of vertices in the polygonal base are three or more than three with an equal number of planar surfaces joined with or without seams forming and surrounding the sides of the polygonal pyramidal device.

10. The method of claim 9, wherein each planar surface forming a side of the polygonal pyramidal device supports a flat panel display to form a 3D-shaped polygonal pyramid audiovisual display comprising three or more than three flat panel displays to provide a 3D-shaped audio-visual user interface (UI) to provide the text, image, audio, and audio-video clips, files, characters, scenes, and scenarios as output during passive or active IVR interactions between the device and the user or the group of users.

11. The method of claim 9, wherein a height of the polygonal pyramid is larger than 0.10 feet and equal to or smaller than 6.0 feet.

12. The method of claim 9, wherein an enveloping diameter of a polygonal cross-section at a base of the polygonal pyramid is larger than 0.10 feet and equal to or smaller than 4.0 feet, and wherein the enveloping diameter is defined as the diameter of an imaginary circle completely enclosing the polygonal cross-section of equal sides at the base with the circumference of the imaginary circle passing through all the vertices of the polygonal cross- section of equal sides at the base.

13. The method of claim 1, wherein the 3D-shape of the device is a spherical dome of diameter d with a flat circular base of diameter x, wherein a ratio of x/d is a number larger than 0.10 and equal to or smaller than 1.0, and wherein the diameter d of the spherical dome is a number larger than 0.10 feet and equal to or smaller than 6.0 feet.

14. The method of claim 13, where the device includes two or more than two polygonal planar surfaces cut on its side to provide two or more than two polygonal flat panel displays on each such planar surface to form a 3D-shaped spherical dome display comprising two or more than two flat polygonal display panels to provide a 3D-shaped audio-visual user interface (UI) to provide the text, image, audio, and audio-video clips, files, characters, scenes, and scenarios as output during passive or active IVR interactions between the device and the user or the group of users. 15. The method of claim 13, wherein a number of vertices in each polygonal planar surface is larger than three and equal to or smaller than eight.

16. The method of claims 2, 9 or 13, wherein a band of width larger than 0.10 feet and equal to or smaller than 1.0 feet, at the top of the device is covered with a one-way see- through mirrored surface, instead of a display, to house hardware for one or more than one camera for the imaging and scanning capabilities, and one or more than one microphone for listening capabilities of the device.

17. The method of claim 2, wherein a full or half spherical dome of a diameter larger than 0.10 feet and equal to or smaller than a diameter of the cylindrical device is included at a top of the device, and wherein the full or half spherical dome is covered with a one-way see- through mirrored surface to house hardware for one or more than one camera for the imaging and scanning capabilities, and one or more than one microphone for listening capabilities of the device.

18. The method of claim 16, wherein the one or more than one camera comprises a single 360° camera, two 180° cameras, three 90° cameras, six 60° cameras, or twelve 30° cameras to provide a 360° view surrounding the device.

19. The method of claims 2, 9 or 13, wherein the device comprises hardware and software for imaging, listening, speaking, display, battery and power, text, audio, audio-video data storage and processing, operating system and applications, and remote or cloud connectivity to provide the voice or speech activated IVR interactions while also delivering back-lighting, stationary or moving images, audio sounds, audio-video clips, files, data, scenes, characters, and scenarios on the 3D-shaped audio-visual interfaces during passive or active IVR interactions between the device and the user or the group of users for user preferred environment, ambiance, functionalities and applications.

20. The method of claims 2, 9, or 13, wherein passive IVR interactions between the device and the user or the group of users comprise the device located in the background or to the side while also providing back-lighting, stationary or moving images, audio sounds, audio-video clips, files, data, scenes, characters, and scenarios delivered on the 3D-shaped audio-visual interfaces to provide the user preferred environment, ambiance, functionalities and applications.

21. The method of claims 2, 9, or 13, wherein active IVR interactions between the device and the user or the group of users comprise the device located in front of the user or the group of users or between two or more than two users, while the user or the group of users actively interact with the device by looking at or touching the displays of the 3D-shaped audio-visual interfaces of the device for at least one of the text, images, audio-video clips, files, data, characters, scenes, and scenario data displayed on the device for visualization, presentation, analysis, and discussion in the user preferred environment and ambiance for the user preferred functionalities and applications.

22. The method of claims 2, 9, or 13, wherein the environment and ambiance scenarios provided on the 3D-shaped audio-visual user interfaces comprise audio-visual scenarios of beaches, rain-forests, rainy days, sunny days, flower fields, country sides, mountains and hilltops, fire places or candle lights, lava-lamps, lighted fish aquariums, lamps of different lighting intensities, disco settings, office settings or city lights for the user preferred environment, ambiance, functionalities and applications.

23. The method of claims 2, 9, or 13, wherein the functionalities and applications of the 3D-shaped audio-visual user interfaces comprise the display, visualization, presentation, analysis, and discussion of at least one of the text, images, audio-video clips, files, data, characters, scenes, and scenario data during the user-device audio-visual and touch based interactions in the user preferred environment and ambiance for the user preferred

functionalities and applications.

24. The method of claims 2, 9, or 13, wherein the functionalities and applications of the 3D-shaped audio-visual user interfaces comprise the audio-visual and touch based

interactions with at least one of the text, images, audio-video clips, files, data, characters, scenes, and scenario data displayed on the device located in a room, a hallway, a mall, a shopping center, a store, near a reception desk, a room, or a hallway for greeting, guiding, registration, help, information gathering, and advertising functions during the user-device interactions.

25. The method of claims 2, 9, or 13, wherein the background visual images or video clips, scenes, or scenarios scanned or filmed by the 360° view camera or a plurality of cameras to provide the 360° view during user-device interactions are overlapped as translucent background images or video-clips with foreground in-focus images or video scenes provided by the camera or the plurality of cameras, or obtained from the database, to provide 3D-visual-effects displayed on the 3D-shaped audio-visual user interfaces during the user-device interactions for the user preferred environment, ambiance, functionalities, and applications. 26. The method of claim 25, wherein a degree of depth perception in the 3D-visual- effects displayed on the 3D-shaped audio-visual user interfaces is adjusted or controlled by a degree of translucency of the overlapped background visual images or video clips, scenes, and scenarios. 27. The method of claim 25, wherein the 3D-visual-effects are used during a device-to- device video-conference or video-game-playing session, wherein the user or the group of users around a planar or 3D-visual-effect environment on one device are able to interact or communicate with another user or another group of users around another planar or 3D-visual- effect environment on another device during the device-to-device video conference or video game-playing session within the user preferred environment and ambiance for the user preferred functionalities and applications.

28. The method of claim 27, wherein a full room-to-room single 360° video-conference or video-game-playing audio-visual scene sent from a sending device is displayed on a receiving device as:

a single 360° wrap around scene on a 360° wrap around display of the receiving device;

stretched as two same 180° scenes on two displays forming a front viewing side and a back viewing side of the receiving device;

stretched as three same 180° scenes on three displays forming three viewing sides of the receiving device; or

stretched as four same 180° scenes on four displays forming four viewing sides of the receiving device.

29. The method of claim 28, wherein the full room-to-room single 360° video-conference or video-game playing audio-visual scene sent from the sending device is further

simultaneously stretched and displayed as a single 180° scene on a flat display of a mobile device, a lap-top computer, a desk-top computer, a television monitor, or a wall using a projector.

30. The method of claim 28, wherein the full room-to-room single 360° video-conference or video-game playing audio-visual scene sent from the sending device is displayed on part of the 360° wrap around display of the receiving device, and wherein at least one of text, images, audio-video clips, files, data, characters, scenes, or scenario data is simultaneously displayed on the remaining part of the 360° wrap around display of the receiving device.

31. The method of claim 21, wherein the device is capable of interacting with the user or the group of users during passive or active IVR interactions in one, more than one, or any combination of languages including English, French, Spanish, Russian, German, Portuguese, Chinese-Mandarin, Chinese-Cantonese, Korean, Japanese, Hindi, Urdu, Punjabi, Bengali, Gujrati, Marathi, Tamil, Telugu, Malayalam, Konkani, and African sub-continental and Middle Eastern languages.

32. The method of claim 31, wherein spoken accents include localized speaking style or dialect of the one, more than one, or any combination of the languages. 33. The method of claims 2, 9, or 13, wherein the device is capable of computing onboard and is configured to interact within an ambient environment without the user or the group of users present within the ambient environment.

34. The method of claims 2, 9, or 13, wherein the device is configured to interact with other devices within an ambient environment without the user or the group of users present within the ambient environment.

35. The method of claims 2, 9, or 13, wherein the device is configured to interact with other devices within an ambient environment with the user or the group of users present within the ambient environment.

36. The method of claims 2, 9, or 13, wherein the device comprise one or more than one port for connectivity with a HDMI cable, a personal computer, a mobile smart phone, a tablet computer, a telephone line, a wireless mobile, an Ethernet cable, a Bluetooth connection or a Wi-Fi connection.

37. The method of claims 2, 9, or 13, wherein the device is configured to answer functionally useful queries and perform functionally useful tasks, while also providing the suitable environment, ambiance, functionalities, and applications as according to the user's preference during the IVR interactions between the device and the user or the group of users.

38. The method of claims 2, 9, or 13, wherein the device is used by the user or the group of users for at least one of companionship, entertainment, storytelling, education, teaching, training, greeting, guiding, guest service, health-care service, or customer service while also performing functionally useful tasks including interacting with and controlling other devices configured to interact with the device via an HDMI cable, an Ethernet Cable, a Bluetooth connection or a Wi-Fi connection.

39. The method of claims 2, 9, or 13, wherein the device is configured to provide human like and robot like multiple interactive personalities (MIP) or animated multiple interactive personalities (AMD3) chat- and chatter-bots on the 3D-shaped audio-visual interfaces to interact with the user or the group of users in the user preferred environment and ambiance for the user preferred functionalities and applications.

40. An interactive voice response (IVR) system, comprising:

an IVR device, comprising:

a central processing unit;

one or more than one camera, wherein the one or more than one camera includes at least one 360° camera, at least two 180° cameras, at least four 90° cameras, at least six 60° cameras, or at least twelve 30° cameras comprising image and video sensing hardware configured to collect or scan an input image or video data from a user, environment, and ambiance within the environment within an interaction range of the device;

one or more than one microphone comprising sound and audio sensing hardware configured to collect or scan input audio, sound, or speech data from the user, the environment, and the ambience within the environment within the interaction range of the device;

one or more than one display, wherein each display comprises a touch sensitive flat panel display, a touch sensitive flexible panel display, a non-touch sensitive flat panel display, or a non-touch sensitive flexible panel display, wherein the one or more than one display is configured to form a 3D-shaped audio-visual user interface to provide user preferred environment, ambiance, functionalities, and applications to the user within the interaction range of the device;

one or more than one port configured to connect, via a wired or wireless connection, with at least one of an internet system, a mobile system, a cloud computing system, a keyboard, a USB, a HDMI cable, a television, a personal computer, a mobile smart phone, a tablet computer, a telephone line, a wireless mobile, an Ethernet cable, a Bluetooth connection or a Wi-Fi connection

an infrared universal remote output configured to control an external television, projector, audio equipment, video equipment, augmented-reality (AR) equipment, virtual-reality (VR) equipment, devices and appliances; a PCI slot configured to receive a single or a multiple carrier SIM card to connect with a direct wireless mobile data line for data and VOIP communication;

an onboard battery or power system configured for wired and inductive charging stations; and

a memory configured to store data related to a previous interaction of the device with the user and to store instructions executable by the central processing unit to process collected input data for the device to:

obtain information from the input data collected by the device;

determine a manner, a mode, and a type of response;

execute, by the device, the response without any overlap or conflict between different responses;

store information to update the data stored on the device;

change information stored on the device

synchronize information stored on the device with information stored in the cloud; and

create, delete, store, and update multiple interactive personalities (MIP) or animated multiple interactive personalities (AMIP) on the device.

41. The IVR system of claim 40, wherein the input data collected by the device within the vicinity or the interaction range including the device and the user within the environment, comprises:

one or more than one communicated character, word, or sentence relating to written and spoken communication between the user and the device;

one or more than one communicated image, light, or video relating to visual and optical communication between the user and the device;

one or more than one communicated sound or audio related to the communication between the user and the device; and

one or more than one communicated touch related to the communication between the user and the device;

wherein the collected input data communicates the information related to determining the environment, ambiance, functionalities, and applications to be delivered to the user.

42. The IVR system of claims 40 or 41, wherein the IVR device is configured to provide interactive interactions to the user for at least one of entertainment, engagement,

companionship, education, storytelling, karaoke, video-game playing, teaching, training, greeting, guiding, registration, help, information and customer services while also performing functionally useful tasks within the interaction range of the user in the user preferred environment, ambiance, functionalities, and applications.

43. A computer readable medium storing instructions executable by an IVR device, the instructions when executed by a processor of the IVR device cause the IVR device to:

provide voice or speech activated IVR capabilities to interactively speak with the user or the group of users to receive their input as a request, engage them in conversation or entertainment, perform utility based tasks and deliver an audio-visual output response back to the user or the group of users for user preferred environment, ambiance, functionalities, and applications during user-device interactions;

provide scanning, imaging, recognition, and perception capabilities using a 360° view camera to scan, image, recognize, and perceive the user or the group of users, their background, environment, and ambiance to select and deliver suitable audio-visual scenarios from a database of such scenarios, in addition to the IVR audio-visual output response of the device, to the user or the group of users for the user preferred environment, ambiance, functionalities, and applications during the user-device interactions;

provide database and data-management capabilities to write, delete, store, select, and deliver text, image, audio, and audio-video clips, files, characters, scenes, and scenarios to create the audio-visual scenes or scenarios delivered on the 3D-shaped audio-visual user- interfaces (UIs) as according to a 3D-shape of the device for the user preferred environment, ambiance, functionalities, and applications during the user-device interactions;

provide display capabilities to deliver and display the text, image, audio, and audio- video clips, files, characters, scenes, and scenarios on the interactive 3D-shaped audio-visual user-interfaces (UIs) comprising one or more than one flat or flexible 2D display panel configured as according to the 3D-shape of the device for the user preferred environment, ambiance, functionalities, and applications during the user-device interactions; and

deliver interactive voice activated content and responses for at least one of entertainment, engagement, companionship, education, storytelling, karaoke, video-game playing, teaching, training, greeting, guiding, registration, help, information, or customer services as well as functionalities of performing useful tasks or utility based functions within the interaction range of the user or the group of users in the user preferred environment, ambiance, functionalities, and applications during the user-device interactions.

Description:
INTERACTIVE VOICE RESPONSE DEVICES WITH 3D-SHAPED USER

INTERFACES

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to and the benefit of U.S. Provisional Patent

Application No. 62/546,840, entitled INTERACTIVE VOICE RESPONSE DEVICES WITH 3D-SHAPED USER INTERFACES, filed August 17, 2017, the entire disclosure of which is hereby incorporated by reference herein.

This application is related to International Patent Application No. PCT/US 17/29385, entitled MULTIPLE INTERACTIVE PERSONALITIES ROBOT and International Patent Application No. PCT/US 17/49458, entitled ROBOTS FOR INTERACTIVE COMEDY AND COMPANIONSHIP, the entire disclosures of which are hereby incorporated by reference herein.

BACKGROUND

1. Technical Field The present disclosure relates generally to the field of interactive voice response

(IVR) and robotic devices, and specifically to the IVR and robotic devices that interact with human users on a regular basis and are also called social IVR or social robotic devices.

2. Background of the Disclosure

Robots have been developed and deployed for the last few decades in a variety of industrial and commercial applications with a focus on replacing many repetitive tasks and communications in pre-determined scenarios. With advances in artificial intelligence (AI), machine learning (ML), and deep learning (DL) capabilities in recent years, robotic and IVR devices have started to move out of commercial and industrial scenarios and into interaction with human users at home or at work on a regular basis. The field of social or human interfacing robotic or IVR devices are primarily based on speech or voice based

communication for eliciting a response, delivering content, or performing a utility based task. With the advent of touch sensitive displays, the interaction or communication between a human and a device is also via human touch based signals on the displays. Early examples of IVR devices comprise smart phones, advanced automobiles, the Internet of Things (IOT), and security applications and devices. Early use of keyboard, mouse, and touch based interaction between a user and robotic, IVR, or computing devices has started to recede. Speech or voice based interactions have been defined as the new interface between users and their devices. Siri® on smart phones was an early commercial implementation of such an IVR based user interface. Recently, purely IVR devices, like Amazon Echo® or Google Home®, for speech based user interactions, such as: requesting to listen to a song, finding weather at a location, finding directions to a location, querying for information, controlling a utility, security, or other IOT function at a home or an office have been commercially successful. In such applications, the focus of speech based IVR communications between a user and their devices is mostly for the control and execution of task based utility functions, such as: requesting to play a song, asking for the weather, asking for a map, asking for event and product reviews, or even ordering products and service. Visual IVR devices were also conceptualized where the IVR between users and their devices are supplanted with visual information on 2D displays such as on mobile smart phones, tablets, and automobiles.

The personalities of IVR or robotic devices interacting with human users are at nebulous stage. As such, a need exists for an IVR or robotic device able to create a user preferred environment or ambiance in which the IVR or robotic device is capable of interacting with a human user for entertainment, engagement, and communication as well as performing task based utility functions. BRIEF SUMMARY

Methods, devices and systems for IVR devices with 3D-shaped audio-visual user interfaces are provided, which are capable of interacting with a user or a group of users with user preferred environment, ambiance, functions, and applications.

For example, related International Patent Application No. PCT/US 17/29385, entitled MULTIPLE INTERACTIVE PERSONALITIES ROBOT and U. S. Provisional Patent Application No. 62/381,976, entitled ROBOTS FOR INTERACTIVE COMEDY AND COMPANIONSHIP describe multiple voice response based Multiple Interacting

Personalities (MIP) robots and their software-based versions as Animated Multiple

Interacting Personalities (AMIP) chat-bots or chatter-bots for entertainment, communication and engagement during continuing interactions with a user or a group of users. In addition to performing IVR task based utility functions, a key differentiation of the described MIP robots, and AMIP chat-bots or chatter-bots on web- or mobile interfaces, is the ability to generate and deliver interactive contextual jokes, songs, duets, karaoke, and disco type performance as a part of a personality or personalities of the MIP robot or AMIP chat-bot or chatter-bot interacting with the user or the group of users. In a related way, the present disclosure provides methods, devices, and systems to deliver voice activated interactions with users with user preferred environment, ambiance, and functionalities via 3D-shaped interactive audio-visual interfaces configured as according to the shapes of the devices.

According to one aspect of the present disclosure, without any limitation, the method is to include 3D-shaped audio-visual displays using one or more than one flat or flexible panel display configured as according to the user preferred 3D-shapes of the IVR devices. The 3D-shaped audio-visual displays as according to the shapes of the devices allow the devices to display interactive text, images, audio-video clips, data, colors, lights, scenes, characters, and scenarios to create user preferred environment, ambiance, functions, and applications during passive or active user-device interactions. According to another aspect of the present disclosure, without any limitation, during the passive user-device interactions the device remains in the background to create appropriate environment and ambiance for the IVR interactions or tasks at hand. According to yet another aspect of the present disclosure, without any limitation, during the active user-device interactions the device is in the foreground, in front of a user, or between a group of users while the user or the users actively interact with the text, images, audio-video clips, data, colors, lights, scenes, characters, and scenarios displayed on the 3D-shaped audio-video user interface of the device.

According to one aspect of the present disclosure, the 3D-shaped user interfaces shaped as according to the shapes of the devices, without any limitation, are cylindrical, pyramidal, spheroidal, cubical, bread-box, or irregular user preferred shapes (e.g., lava-lamp shaped). According to another aspect of the present disclosure, without any limitation, one or more than one touch-sensitive or touch-less 2D flat or flexible panel display are configured with or without seams to provide 3D-shaped audio-visual user interfaces as according to the cylindrical, pyramidal, spheroidal, cubical, bread-box, or irregular user preferred shapes (e.g., lava-lamp shape) of the devices.

According to one aspect of the present disclosure, without any limitation, text strings, images, audio-video clips, data, colors, lights, scenes, characters, and scenarios are uniformly and simultaneously displayed across one or more than one 2D display with or without seams forming a single scene or scenario on the 3D-shaped audio-visual user interface of the IVR device interacting with a user or a group of users. According to another aspect, without any limitation, different text strings, images, audio-video clips, data, colors, lights, scenes, characters, and scenarios are displayed independently on different 2D displays with or without seams forming multiple scenes or scenario on the 3D-shaped audio-visual user interface of the IVR device interacting with a user or a group of users.

According to one aspect of the present disclosure, the method includes, without any limitation, providing software and/or hardware for microphones and speakers configured to provide IVR chat capabilities between a device and a user or a group of users. According to another aspect of the present disclosure, the method also includes, providing software and/or hardware cameras configured to provide for interactive image or video scanning, analysis, and recognition capabilities during the interactions between the device and the user or the group of users. According to yet another aspect of the present disclosure, the method also includes, without any limitation, providing software and/or hardware for 3D-shaped visual user interfaces as summarized above.

According to one aspect of the present disclosure, without any limitation, example scenarios could be the audio-visual display of sunny or sunset beaches, rain forests, rooms with romantic fireplaces, back-lit night lights, touristic city streets, night-lights, pre-historic Jurassic parks, zoos, out of this world Sci-Fi sights and sounds on space missions and planets, the sights and sounds of concerts, night-club or disco settings, or simply the sights and sounds of a fish aquarium or a lava lamp sitting on a side table. According to another aspect of the present disclosure, without any limitation, example scenarios could be the audio-visual display of the weather, weather or location finding maps, graphs, charts, and tables, stock tickers, scientific or any other technical data displayed in a class-room, a conference-room, or an office setting for visualization, analysis, and discussion, etc. According to yet another aspect of the present disclosure, without any limitation, example scenarios could be the audio-visual display of information, data and scenes used for entertainment, engagement, companionship, education, storytelling, karaoke, video-game playing, teaching, training, greeting, guiding, registration, help, information and customer services, and any other similar applications.

The above summary is illustrative only and is not intended to be limiting in any way. Details associated with one or more aspects of the present disclosure are set forth in the accompanying drawings and the detailed description below. Other features, objects, and advantages of the present disclosure will be apparent from the description, the drawings, and the claims. BRIEF DESCRIPTION OF FIGURES

FIG. 1: An example schematic of the components (in block diagram form) of a 3D-shaped audio-visual user interface on an IVR device.

FIG. 2A: An example schematic of a cylindrical shaped IVR device with a single circumferential display forming the 3D-shaped cylindrical audio-visual user interface in the center of the device.

FIG. 2B: The example schematic of the cylindrical shaped IVR device of FIG. 2 A with a different aspect ratio between the height and the diameter of the IVR device.

FIG. 2C: An example schematic of a cubical shaped IVR device with square cross-section and 4 flat panel displays forming the 3D-shaped audio-visual user interface.

FIG. 3A: An example user interacting passively with an IVR device.

FIG. 3B: An example user interacting actively with an IVR device.

FIG. 4: An example group of users surrounding an IVR device in a conference or scan mode.

FIG. 5: An example flow chart of an IVR device able to provide user-device audio-visual IVR interactions in addition to the background ambiance and mood audio-visual interactions. FIG. 6: An example flow chart of an IVR device able to operate in conference, scan, and focus modes.

FIG. 7A- 7D: Example 3D-visual effects created on 3D-shaped audio-visual user interface of an IVR device.

FIG. 8: An example flow chart of an IVR device with 3D-shaped audio-visual user interface with processing, storage, memory, audio-video I/O, connectivity, and power units within the device system.

DETAILED DESCRIPTION

Details of the present disclosure are described with illustrative examples to meet statutory requirements. However, the description itself and the illustrative examples in the figures are not intended to limit the scope of this disclosure. The inventors have contemplated that the subject matter of the present disclosure might also be embodied, in other ways, to include different steps or different combination of steps similar to the ones described in this document, in conjunction with present and future technological advances. Similar symbols used in different illustrative figures identify similar components unless contextually stated otherwise. The terms herein, "devices," "steps," "block," and "flow" are used below, to explain different elements of the method employed, and should not be interpreted as implying any particular order or components unless any specific order is explicitly described.

Aspects of the present disclosure are directed towards methods, devices, and systems for IVR devices to deliver voice activated interactions with users with user preferred environment, ambiance, and functionalities via 3D-shaped interactive audio-visual user- interfaces configured as according to the surfaces enveloping the 3D-shapes of the devices.

Based on various aspects of the present disclosure, without any limitation, the 3D- shaped interactive audio-visual user interfaces are provided by using one or more than one 2D flat or flexible panel display configured as according to the 3D-shapes of the IVR devices to display interactive text, images, audio-video clips, data, colors, lights, scenes, characters, and scenarios to create user preferred environment, ambiance, and functionalities during passive or active user-device interactions. During passive user-device interactions, without any limitations, the device remains on the side or in the background to create appropriate environment and ambiance for the IVR interactions or the tasks at hand. During active user- device interactions, without any limitations, the device is in the foreground or in front of a user or between a group of users while the user or the users actively interact or engage with text, images, audio-video clips, data, colors, lights, scenes, characters, and scenarios displayed on the surfaces enveloping the 3D-shapes of the devices.

Based on various aspects of the present disclosure, the 3D-shaped audio-visual user interface as according to the shapes of the devices, without any limitation, are cylindrical, pyramidal, spheroidal, cubical, bread-box, or irregular user preferred shapes (e.g., lava-lamp shape). According to one aspect of the present disclosure and based on the current and future advancements in technology, without any limitation, the 3D-shaped audio-visual user interfaces comprise single touch-sensitive or touch-less 2D flat or flexible panel displays shaped as according to the 3D-shapes of the devices. According to another aspect of the present disclosure, without any limitation, the 3D-shaped audio-visual user interfaces comprise the seamless or with seam integration of two or more than two touch-sensitive or touch-less 2D flat or flexible panel displays shaped as according to the 3D cylindrical, pyramidal, spheroidal, cubical, bread-box, or irregular (e.g., lava-lamp) shapes of the devices.

Based on various aspects of the present disclosure, without any limitation, text strings, images, audio-video clips, data, colors, lights, scenes, characters, and scenarios may be uniformly, simultaneously, and/or seamlessly displayed across one or more than one 2D display forming a single scene or scenario on the 3D-shaped audio-visual user interface of the IVR device for active or passive interactions with a user or a group of users. Based on other aspects of the present disclosure, without any limitation, different text strings, images, audio- video clips, data, colors, lights, scenes, characters, and scenarios may be displayed independently on different 2D displays with or without seams forming multiple and different scenes or scenarios on the 3D audio-visual user interface of the IVR device for active or passive interactions with a user or a group of users.

Based on various aspects of the present disclosure, the methods, devices, and systems, without any limitation, include providing software and/or hardware for one or more than one microphone and speaker configured to provide for IVR chat capabilities and tasks based functionalities between users and their devices, while also providing active or passive interactive 3D-shaped audio-visual scenarios displayed on 3D-shaped audio-visual interfaces for user preferred environment, ambiance, and functionalities.

Based on another aspect of the present disclosure, the methods, devices, and systems, without out any limitation, include providing software and/or hardware for one or more than one camera configured to provide interactive image or video scanning, analysis, and recognition capabilities for recognizing users and their environment and ambiance, while also providing IVR chat capabilities and tasks based functionalities between users and their devices, while also providing active or passive interactive 3D-shaped audio-visual scenarios displayed on 3D audio-visual interfaces for user preferred environment, ambiance, and functionalities.

Based on various aspects of the present disclosure, without any limitation, example audio-visual scenarios could be the audio-visual display of sunny or sunset beaches, rain forests, rooms with romantic fire-places, back-lit night lights, touristic spots on city streets, city night-lights, pre-historic Jurassic parks, zoos, out of this world Sci-Fi sights and sounds in space and distant planets, the sights and sounds of live concerts, night-club or disco settings, or simply the sights and sounds of fish aquariums, gentle rains, water-falls, or lava lamps in the background. According to another aspect, without any limitation, example audio-visual scenarios could also be the audio-visual display of the weather, weather or location finding maps, graphs, charts, and tables, news, stock tickers, sport scores, sport rankings, sport schedules, and scientific or technical data displayed in a class-room, a conference-room, or an office setting for visualization, analysis, and discussion. Based on yet another aspect, without any limitation, example audio-visual scenarios could also be the audio-visual display of information, data and scenes used for entertainment, engagement, companionship, education, storytelling, karaoke, video-game playing, teaching, training, greeting, guiding, registration, help, information and customer services, and any other similar applications

Having briefly described an example overview of the various aspects of the present disclosure, an example IVR system with 3D-shaped audio-visual user interfaces, and components in which aspects of the present disclosure may be implemented are described below in order to provide a general context of various aspects of the present disclosure. Referring now to FIG. 1, an example IVR system for implementing aspects of the present disclosure is shown and designated generally as an IVR device 100. It should be

understood that the IVR device 100 and other arrangements described herein are set forth only as examples and are not intended to suggest any limitation as to the scope of use and functionality of the present disclosure. Other arrangements and elements (e.g. machines, surfaces, user interfaces, functions, orders, and groupings, etc.) can be used instead of the ones shown, and some elements may be omitted altogether and some new elements may be added depending upon the current and future status of relevant technologies without altering the various aspects of the present disclosure. Furthermore, the blocks, steps, processes, devices, and entities described in this disclosure may be implemented as discrete or distributed components or in conjunction with other components, and in any suitable combination and location. Various functions described herein as being performed by the blocks shown in figures may be carried out by hardware, firmware, and/or software.

An IVR device 100 in FIG. 1 includes, without any limitation, a top section 102, a middle section 104, and a base section 106. The top section 102 supports and houses one or more than one camera with imaging and recognition capabilities 108, as well as one or more than one microphones 110 with interactive speech or voice recognition and response capabilities, referenced as 112 and 114. The middle section 104 supports and houses the CPU, storage, memory, and all I/O capabilities 116. The base section 106 supports and houses one of more than one speaker system 118 as well the internal power supply (e.g., battery), charging mechanism, and/or battery storage capabilities, referenced as 120. The surface of the cylindrical middle section 104 supports the cylindrical 3D shaped audio- visual display capabilities, referenced at 122 (e.g., cylindrical display), which could be made of one or more than one flat or flexible panel touch or touch-less displays configured seamlessly or with seams to provide 3D shaped audio-visual user interface of the IVR devices embodied in the present disclosure. In one aspect, the base 106 could itself be supported on another stationary platform to give stability to the device. The camera, imaging and recognition capabilities 108 could include one or more than one video cameras with 360 degrees, 180 degrees, or 90 degrees field of view as according to user preferences for functionalities and applications. The speaker 118 capabilities could include one or more than one speaker and a sub-woofer speaker for surround sound effect. In addition to the above list of general components and their functions, the IVR device also includes a power unit, a charging unit, a computing or processing unit, a storage unit, memory devices and ports. The structural and component building blocks of an IVR device described herein represent example logical, processing, sensing, display, detection, control, storage, memory, power, input/output and are not necessary components of an IVR device. For example, a display device unit 122 may be made of single or multiple touch or touch less, flat or flexible panel displays, USB, HDMI, and Ethernet cable ports could be representing the key I/O components, a processor unit may have memory and storage, etc. FIG. 1 is an illustrative example of an IVR device 100 that can be used with one or more aspects of the present disclosure.

One aspect of the present disclosure may be described in the general context of an IVR device with onboard audio-visual sensors, speakers, computer, power unit, display, and a variety of I/O ports. The computer or computing unit may include, without any limitation, computer codes or machine readable instructions, including computer readable program modules executable by the computer to process and interpret input data generated from an IVR device configured to interact with a user or a group of user and generate output response through interactive audio-visual response delivered through a set of speakers and through 3D-shaped audio-visual displays to create user preferred

environment, ambiance, functionalities, and applications. Generally, program modules include routines, programs, objects, components, data structures etc., referring to computer codes that take input data, perform particular tasks, and produce an appropriate response by the IVR device. Through USB, Ethernet, WIFI, modem, and/or HDMI ports the IVR device is also connectable to the Internet and a cloud computing environment capable of uploading and downloading information, content, and data to and from a remote source such as a cloud computing and storage environment, a user or group of users configured to interact with the IVR device in person, and other IVR devices within interaction environments.

FIGS. 2A-2C, without any limitation, show IVR devices 200, 210, and 220 with example 3D-shapes. Each IVR device may comprise internal components similar to those shown in FIG. 1, wherein camera, imaging, recognition, microphone, listening, speech recognition capabilities are located in the top section; modules for CPU, storage, memory, logic, and 10 capabilities are located in the middle section; and the modules for speaker or audio sound, power, charging, battery storage and connectivity ports capabilities are located in the bottom or base section. According to aspects of the present disclosure, the 3D-shaped audio-visual user-interfaces (UIs) are provided by the displays on the circumferential surfaces (e.g., 202, 212) surrounding or enveloping the sides of the IVR devices (e.g., 200, 210).

Example 3D-shapes of the IVR devices 200 (FIG. 2A) and 210 (FIG. 2B) with 3D- shaped UIs, without any limitation, are cylinders with circular cross-sections of different aspect ratios or the ratio of a height of the device versus a diameter of the device. The values of the height of the device, the diameter of the device, and the aspect ratio between the two, without any limitation, are chosen as according to the functionality or use of the device: i) actively with a user or a group of users in a room sitting or standing around a table at work or in a room at home, ii) passively on a side table at work or in a room at home, and/or iii) actively or passively as a stand-alone device in hallways, lobbies, at reception desks in buildings, shopping centers, malls, hospitals, hotels, airports, or corporations. Other example 3D-shapes of the cylindrical IVR devices 200 and 210, without any limitation, are cylinders with polygonal cross-sections with three (3) or more than 3 vertices with an equal number of planar surfaces surrounding the sides of the cylindrical device of polygonal cross-section. The values of the height of the device, enveloping diameter of the device, and the aspect ratio between the two, without any limitation, are chosen as according to the functionality or the use of the device: i) actively with a user or a group of users in a room sitting or standing around a table at work or in a room at home, ii) passively on a side table at work or in a room at home, and/or iii) actively or passively as a stand-alone device in hallways, lobbies, at reception desks in buildings, shopping centers, malls, hospitals, hotels, airports, or corporations. The enveloping diameter of the polygonal cross-section of the cylindrical devices is the diameter of an imaginary circle completely enclosing the polygonal cross-section with circumference passing through at least two or more than two vertices of the polygonal cross-section.

In another aspect of the present disclosure, without any limitation, is a cylindrical IVR device 220 (FIG. 2C) of a polygonal cross-section with four (4) sides of equal length forming a square cross-section of the device, wherein the height of the device is also equal to side of the square cross-section forming a 3D-cubic-shaped device with 4 square planer surfaces 222 (e.g., left facing surface), 224 (e.g., front facing surface), 226 (e.g., right facing surface), and 228 (back facing surface), surrounding the sides of the device for 4 square planer flat panel displays on surfaces 222, 224, 226, and 228 with seams at the edges forming a four sided 3D-shaped audio-visual UI for a user or a group of users sitting around a device on a table or even standing around the device on a table or on a floor depending on the size or the height of the IVR device.

As an aspect of the present disclosure of an IVR device with polygonal cross- section, without any limitation, the 3D-shaped audio-visual UI could display or show: i) the identical copies of the same text, image, audio-video data, characters, scene, or scenario on all the different planar flat panel displays forming the 3D-shaped UI of the device, ii) different text, image, audio-video data, characters, scene, or scenario on all the different planar flat panel displays forming the 3D-shaped UI of the device, or iii) a one single text, image, audio-video data, characters, scene, or scenario spread synchronously and seamlessly across all the different planar flat panel displays forming a single 3D-shaped text, image, audio-video data, characters, scene, or scenario on all the different planar flat panel displays forming the 3D-shaped UI of the device for user preferred environment, ambiance, functionalities and applications.

Other example aspects of 3D-shaped audio-visual UIs of the present disclosure (not shown), without any limitation, are the polygonal pyramid shape type IVR devices with three (3) or more than 3 vertices with an equal number of planar triangular surfaces surrounding the sides of the IVR device, or a full or half spherical dome shaped IVR device (e.g., with flat planar surfaces cut on the sides). The height and the enveloping diameters of these IVR devices are chosen as according to the user-device interaction preferences: i) with a user or a group of users in a room sitting or standing around a table interacting actively with device in a room; ii) with a device sitting passively on a side table in a room for user preferred environment or ambiance, or iii) an actively or passively interacting stand-alone device with a user or a group of users in hallways, lobbies, at reception desks in buildings, shopping centers, malls, hospitals, hotels, airports, or corporations for user preferred environment, ambiance, functionalities, and applications for such devices.

According to various aspects of the present disclosure of an IVR device with 3D- shaped audio-visual UIs, without any limitation, is the capability to be used for passive or active interactions of the device with a user or a group of users for user preferred environment, ambiance, functionalities, and applications. Example uses of an IVR device passively or actively interacting with a user or a group of users, without any limitation, are shown in FIG. 3 and FIG. 4. In an example scenario 300 (FIG. 3A), without any limitation, an IVR device 302 is located on a side-table passively interacting with a user. In such an aspect, the user does not directly interact with the shape based audio-visual UI of the device, but uses only voice based commands to occasionally interact with the device, and remains busy in his own main activity (e.g., reading a book), and the device provides an audio-visual environment and ambiance for the user. Another passive application of an IVR device with 3D-shaped audio-visual interfaces is therefore to provide example background images or visuals, audio or sounds, or audio-visual clips of scenes or scenarios of beaches, rain-forests, rainy days, sunny days, flower fields, country sides, mountains and hilltops, fire places or candle lights, lava-lamps, lighted fish aquariums, lamps of different lighting intensities, disco settings, or city lights, etc., for user preferred environment and ambiance without any limitations.

In yet another example scenario 310 (FIG. 3B), without any limitation, another IVR device 312 is located on a table in front of a user, wherein the user actively interacts with the audio-visual UI of the device in addition to using voice based IVRs as well to interact with the device. The active application of an IVR device with 3D-shaped audio- visual interfaces is therefore to provide example audio-visual display of the text, image, audio sounds, or audio-video clips of example scientific, technology, business, information, knowledge, and analysis data, without any limitation, for user preferred functionalities and applications of the 3D shape based audio-visual UIs of the IVR devices of this disclosure.

As yet another aspect of the present disclosure, an example scenario 400 of active user-device interactions between an IVR device 402 and a user or a group of users is shown in FIG. 4, wherein in a class-room, a conference room, a coffee-house, at an office desk, hall-way, lobby, reception desk, etc., without any limitation, an IVR device is located between a user or a group of users, wherein the same or different example scientific, technology, business, information, knowledge, and analysis data, could displayed on the example individual displays 404 and/or 406 of the device 402 for the individual user- device interactions. As an aspect of the present disclosure, two or more than two users can directly interact facing each other, with IVR interactions of the device, as well as with same or different example scientific, technology, business, information, knowledge, and analysis data displayed on individual displays 404 and/or 406 of the IVR device 402 in a class-room, a conference room, a coffee-house, at an office desk, hall-way, lobby, reception desk setting without needing a projector, screen, or even a wall to realize user-to- user meetings and conferences and sharing common or individual data as according to user preferred functionalities and applications.

As an aspect of the present disclosure, an example flow chart of the capability of an IVR device to interact with a user or a group of users, with or without the background audio clips or the sound of the audio-video scenes and scenarios displayed on the 3D- shaped audio-visual displays, without any limitation, is shown in FIG. 5. At box 500, an IVR device receives and checks for a new user input or the device play response of a previous step as an input, and decides at box 502 if the received audio input is a utility command or not? At box box 502, if the received input is not a utility command, the IVR device continues to play the previous background audio sound clip or the sound of the audio-video scenes and scenarios displayed on the 3D-shaped audio-visual displays. At box 502, if the received input is a utility command, without any limitation, the IVR device lowers the sound or audio volume of the continuing previous activity at box 504, selects appropriate sound or audio response from the database at box 506, appropriately mixes the background sound or audio from box 504 with the sound or audio response from the database at box 506 and restores the volume for the output at box 508. The net audio or sound output including any one or more of the audio or sound outputs including the audio response to the utility command is played to the user at box 510, the background ambient sound or audio is played to the user at box 512, the mood or effect sound or audio is played to the user at box 514, with relevant sound-less text, image, and video clips shown on the 3D-shaped displays or UIs at box 516 for a user or a group of users for a user preferred environment, ambiance, functionalities, and applications.

As an aspect of the present disclosure, an example flow chart of the capability of an IVR device to interact with a user or a group of users, with same or different data displayed on multiple displays of the device in conference, scan, and focus modes, without any limitation, is shown in FIG. 6. At any point during the user-device interactions, user has a choice to make a selection among a conference mode, a span mode, or a focus mode at box 600. In one aspect, in the conference mode at box 602, without any limitation, the same audio-visual data is displayed on different multiple displays facing all individual users, wherein individual users could look at the same data and engage in discussion with each other face to face in a conference. In another aspect, in the span mode at box 604, without any limitation, the same single audio-visual data is seamlessly spanned across multiple displays such as to create the appearance of a seamless single audio-visual scene or scenario displayed around the 3D shaped device for user preferred environment, ambiance, and applications. In yet another aspect, in the focus mode at box 606, without any limitation, different audio-visual data is shown on different multiple displays facing an individual user as according to the individual user's selection or preferences.

In one aspect of the present disclosure, the camera 108 of the cylindrical IVR device 100 of FIG. 1 may comprise a 360° camera and the one or more than one displays 122 of the cylindrical IVR device 100 may form a 3D-shaped 360° circumferential display, without any limitation, that allows for back-ground images, audio-visual scenes, or scenarios around the device, or around a user or a group of users interacting with a device, to be over-lapped with the images, audio-visual scenes, or scenarios from the database of the device to provide 3D audio-visual displays or user interface, with translucent depth perception, for the IVR device interacting with a user or a group of users for user preferred environment, ambiance, functionalities, and applications.

An example 3D audio-visual user interface, with translucent depth perception, without any limitation, is shown in FIGS. 7A-7D. The IVR device 700 in Figures 7A-7D is same as an IVR device 100 of FIG. 1 and comprises a 360° camera 702, and a 3D shaped cylindrical circumferential display 704 is made of one or more than flexible or flat panel displays put together with or without seams. Additionally, an example chicken wire mesh 706 forms the surrounding background of the device 700, without any limitation. As an aspect of the present disclosure, using the image of the background chicken-wire mesh grabbed from the 360° camera 702, Fig 7A shows, without any limitation, that the image of the background chicken-wire mesh can be displayed circumferentially around the cylindrical device 700 on the circumferential display 704. As another aspect of the present disclosure, without any limitation, Fig 7B shows the image of the background of the chicken-wire mesh is overlapped with another image of a candle light lamp and shown on the same circumferential display 704 of the cylindrical device 700 as if the candle light lamp is in the foreground and the chicken wire mesh is in the background. As yet another aspect of the present disclosure, without any limitation, the background image of the chicken wire mesh is made translucent to create a depth perception between the image of a candle light lamp in the foreground and the image of the translucent chicken wire mesh in the background. As yet another aspect of the present disclosure, without any limitation, the degree of depth perception can be changed by changing the degree of translucent characteristics of the background chicken wire mesh. FIGS. 7C and 7D show the same IVR device 700 with example chicken-wire mesh background in perspective and functions similarly to FIG. 7A and FIG. 7B, respectively.

Lastly, as another aspect of the present disclosure, an IVR device (e.g., 700) can use on-board multicarrier SIM card, or WIFI, or Ethernet connected to the Internet, to make video-conferencing calls to another IVR device of (e.g., 700), without any limitation, such that users on both sides of the two devices may communicate or interact with each other within 3D audio-visual backgrounds of the two sides using the overlap of the foreground in-focus images, scenes, or scenarios on the background translucent images, scenes, or scenarios with the degree of translucency defining the degree of depth perception in the 3D audio-visual environment, ambiance, functionalities, or applications as according to user preferences interacting with the devices.

Having briefly described an example overview of the various aspects of the present disclosure, an example operating environment, system, and components in which the various aspects of an IVR device may be implemented are described below in order to provide a general device context of various aspects of the present disclosure. It should be understood that the IVR device operating environment and the components in FIG. 8 and other arrangements described herein are set forth only as examples and are not intended to suggest any limitation as to the scope of use and functionality of the present disclosure. The IVR device 800 in FIG. 8 includes one or more than one buses that directly or indirectly couple memory/storage 802, one or more processors 806, sensors and controllers 810, input/output ports 804, input output components 808, and an illustrative power supply 812. These blocks represent logical, not necessarily actual, components. For example, a display or speakers could be I/O components 808, processor could also have memory as according to the nature of art. FIG. 8 is an illustrative example of environment, computing, processing, storage, display, sensor, and controller devices that can be used with one or more aspects of the present disclosure.

The components and tools used in the present disclosure may be implemented on one or more computers executing software instructions. According to one aspect of the present disclosure, the tools used may communicate with server and client computer systems that transmit and receive data over a computer network or a fiber- or copper-based telecommunications network. The steps of accessing, downloading, and manipulating the data, as well as other aspects of the present disclosure are implemented by central processing units (CPU) in the server and client computers executing sequences of instructions stored in a memory. The memory may be a random-access memory (RAM), read-only memory (ROM), a persistent store, such as a mass storage device, or any combination of these devices. Execution of the sequences of instructions causes the CPU to perform steps according to aspects of the present disclosure.

The instructions may be loaded into the memory of the server or client computers from a storage device or from one or more other computer systems over a network connection. For example, a client computer may transmit a sequence of instructions to the server computer in response to a message transmitted to the client over a network by the server. As the server receives the instructions over the network connection, it stores the instructions in memory. The server may store the instructions for later execution, or it may execute the instructions as they arrive over the network connection. In some cases, the CPU may directly support the downloaded instructions. In other cases, the instructions may not be directly executable by the CPU, and may instead be executed by an interpreter that interprets the instructions. In other aspects, hardwired circuitry may be used in place of, or in combination with, software instructions to implement the present disclosure. The tools used in the present disclosure are not limited to any specific combination of hardware circuitry and software, nor to any particular source for the instructions executed by the server or client computers. In some instances, the client and server functionality may be implemented on a single computer platform.

Various aspects of the present disclosure are described in the following numbered clauses:

1. A method for providing 3D-shape based audio-visual user-interfaces (UIs) for voice activated interactions with users for entertainment, engagement, companionship, and performance of utility functions for user preferred environment, ambiance, functionalities, and applications via the interactive voice response (IVR) devices, wherein the method comprises of:

providing a device with voice or speech activated interactive voice response (IVR) capabilities to be able to interactively speak with a user or a group of users to receive their input as a request, engage them in conversation, entertainment, and perform utility based tasks and deliver an audio-visual output response back to the user or a group of users for user preferred environment, ambiance, functionalities, and applications during user-device interactions;

providing a device with scanning, imaging, recognition, and perception capabilities using a 360° view camera to be able to scan, image, recognize, and perceive a user or a group of users, their background, environment, and ambiance to be able to select and deliver suitable audio-visual scenarios from a database of such scenarios, in addition to the IVR needed audio-visual output of the device, to a user or a group of users for user preferred environment, ambiance, functionalities, and applications during user-device interactions;

providing a device with a database and data-management capabilities to write, delete, store, select, and deliver text, image, audio, and audio-video clips, files, characters, scenes, and scenarios to create audio-visual scenes or scenarios delivered on the 3D-shaped audio-visual user-interfaces (UIs) as according to the 3D-shapes of the devices for user preferred environment, ambiance, functionalities, and applications during user-device interactions;

providing a device with display capabilities to deliver and display text, image, audio, and audio-video clips, files, characters, scenes, and scenarios on interactive 3D-shaped audio-visual user-interfaces (UIs) made of one or more than one flat or flexible 2D display panels configured as according to the 3D-shapes of the devices for user preferred environment, ambiance, functionalities, and applications during user-device interactions; and

providing a device with capabilities to deliver interactive voice activated entertaining content and responses for user engagement and companionship as well as the

functionalities of performing task or utility based functions for a user or a group of users in user preferred environment, ambiance, functionalities, and applications during user-device interactions.

2. The method of clause 1, wherein the shape of the device is a cylinder of circular or polygonal cross-section with the number of vertices in the polygonal cross-section are 3 or more than 3 with equal number of planar surfaces joined with or without seams

surrounding the circumferential sides of the cylindrical devices.

3. The method of clause 1 and the cylindrical device of clause 2, wherein each planar surface forming the side of the cylindrical device supports a flat panel display forming an overall 3D-shaped cylindrical audio-visual display, made of 3 or more than 3 flat or flexible panel displays, providing a 3D-shaped audio-visual user interface (UI) for providing text, image, audio, and audio-video clips, files, characters, scenes, and scenarios as output during the passive or active IVR interactions between a device and a user or a group of users.

4. The method of clause 1, wherein the shape of the device is a cylinder of circular cross-section of clause 2 with a cylindrical surface surrounding the side of the cylindrical device supporting one or more than one flexible panel displays forming a cylindrical 3D- shaped audio-visual user interface (UI) for providing text, image, audio, and audio-video clips, files, characters, scenes, and scenarios as output during the passive or active IVR interactions between a device and a user or a group of users.

5. The method of clause 1, and the cylindrical device of clause 2, wherein the ratio of the height of the cylindrical device vs the diameter of the cylindrical device of the circular cross-section of clause 4 is any number larger than 0.10 and equal to or smaller than 10.0.

6. The method of clause 1, and the cylindrical device of clause 2, wherein the ratio of the height of the cylindrical device vs the enveloping diameter of the cylindrical device of the polygonal cross-section of clause 2 is any number larger than 0.10 and equal to or smaller than 10.0, and wherein the enveloping diameter of the polygonal cross-section is the diameter of an imaginary circle completely enclosing the polygonal cross-section with the circumference passing through 2 or more than 2 vertices of the polygonal cross-section. 7. The method of clause 1, and the cylindrical device of clause 2, wherein the sides of the polygonal cross-section are equal to or not equal to each other forming a square or a rectangular cross-section for a polygonal cross-section of 4 vertices.

8. The method of clause 1, wherein the height of the cylindrical device of the clause 2 is any number larger than 0.10 feet and equal to or smaller than 6.0 feet, and wherein the diameter or the enveloping diameter of the cylindrical device of circular or polygonal cross-section, respectively, of clause 2 is any number larger than 0.10 feet and equal to or smaller than 4.0 feet.

9. The method of clause 1, wherein the shape of a device is a polygonal pyramid with number of vertices in the polygonal base are 3 or more than 3 with equal number of planar surfaces joined with or without seams forming and surrounding the sides of the polygonal pyramidal device.

10. The method of clause 1, and the device of clause 9, wherein each planar surface forming the side of the polygonal pyramid device supports a flat panel display forming a 3D-shaped polygonal pyramid audio-visual display, composed of 3 or more than 3 flat panel displays, providing a 3D-shaped audio-visual user interface (UI) for providing text, image, audio, and audio-video clips, files, characters, scenes, and scenarios as output during the passive or active IVR interactions between a device and a user or a group of users.

11. The method of clause 1, and the device of clause 9, wherein the height of the polygonal pyramid of clause 8 is larger than 0.10 feet and equal to or smaller than 6.0 feet.

12. The method of clause 1, and the device of clause 9, wherein the enveloping diameter of the polygonal cross-section at the base of the polygonal pyramid of clause 9 is larger than 0.10 feet and equal to or smaller than 4.0 feet, wherein the enveloping diameter is defined as the diameter of an imaginary circle completely enclosing the polygonal cross- section of equal sides at the base with circumference of the imaginary circle passing through all the vertices of the polygonal cross-section of equal sides at the base.

13. The method of clause 1, wherein the shape of a device is a spherical dome shaped pyramid of diameter d with a flat circular base of a diameter x with the ratio of x/d is a number larger than 0.10 and equal to or smaller than 1.0, and wherein the diameter d of the spherical dome is a number larger than 0.10 feet and equal to or smaller than 6.0 feet.

14. The method of clause 1, and the spherical dome shaped pyramidal device of clause 13, where the device includes 2 or more than 2 polygonal planar surfaces cut on its side to provide 2 or more than 2 polygonal flat panel displays on each such planar surfaces forming a 3D-shaped spherical dome type display, composed of 2 or more than 2 flat polygonal display panels, providing a 3D-shaped audio-visual user interface (UI) for providing text, image, audio, and audio-video clips, files, characters, scenes, and scenarios as output during the passive or active IVR interactions between a device and a user or a group of users.

15. The method of clause 1, and devices of clauses 13 and/or 14, wherein the number of vertices in polygonal planar surfaces are larger than 3 and equal to or smaller than 8.

16. The method of clause 1, wherein bands of width larger than 0.10 feet and equal to or smaller than 1.0 feet, at the top of the cylindrical, polygonal pyramid, and spherical dome shaped pyramid devices of clauses 2, 9, and/or 13, are covered with one-way see-through mirrored surfaces, instead of displays, to house the hardware for one or more than one camera for imaging and scanning capabilities, and one or more than one microphone for the listening capabilities of the devices. 17. The method of clause 1, and the devices of clause 2, wherein a full or half spherical dome of diameter larger than 0.10 feet and equal to or smaller than the diameter of the cylindrical devices is included at the top, and wherein the full or half spherical dome at the top is covered with one-way see-through mirrored surface to house the hardware for one or more than one camera for imaging and scanning capabilities, and one or more than one microphone for the listening capabilities of the devices.

18. The method of clause 1, and the devices of clause 2, 9, and/or 13, wherein the one or more than one camera, located within the camera housing units of clauses 16 and/or 17, comprises a single 360° camera, two 180° cameras, three 90° cameras, six 60° cameras, twelve 30° cameras, or various combinations thereof, to provide a 360° view surrounding the device.

19. The method of clause 1, wherein the devices of clauses 2, 9, and/or 13 with hardware and software for imaging, listening, speaking, display, battery and power, text, audio, audio-video data storage and processing, operating system and applications, and remote or cloud connectivity, have capabilities of voice or speech activated interactive voice response (IVR) interactions, while also providing additional back-lighting, stationary or moving images, audio sounds, audio-video clips, files, data, scenes, characters, and scenarios delivered on 3D-shaped audio-visual interfaces of any of clauses 2-17, without any limitations, are capable of passive or active IVR interactions between the device and a user or a group of users for user preferred environment, ambiance and functionalities and applications.

20. The method of clause 1, and the devices of clauses 2, 9, and/or 13, wherein the passive IVR interactions between a device and a user or a group of users means device remains in the background or on the side while also providing additional back-lighting, stationary or moving images, audio sounds, audio-video clips, files, data, scenes, characters, and scenarios delivered on 3D-shaped audio-visual interfaces of any of clauses 2-17, without any limitations, for providing user preferred environment, ambiance, and functionalities and applications.

21. The method of clause 1, and the devices of clauses 2, 9, and/or 13, wherein the active IVR interactions between a device and a user or a group of users means the device is located in front of a user or a group of users, or between two or more than two users, while the user or the group of users actively interact with the device by looking and or touching the displays of 3D-shaped audio-visual interfaces of the device, without any limitation, for any or all example text, images, audio-video clips, files, data, characters, scenes, and scenario data displayed on the device for visualization, presentation, analysis, and discussion in user preferred environment and ambiance for user preferred functionalities and applications.

22. The method of clause 1, and the devices of clauses 2, 9, and/or 13, wherein the example environment and ambiance scenarios provided on 3D audio-visual user interfaces of any of clauses 2-15 may include, without any limitation, are audio-visual scenarios of beaches, rain-forest, rainy day, sunny day, flower fields, country side, mountains and hilltops, fire places or candle lights, lava-lamps, lighted fish aquarium, lamps of different lighting intensities, disco settings, office settings or city lights for user preferred environment, ambiance, functionalities and applications.

23. The method of clause 1, and devices of clauses 2, 9, and/or 13, wherein the example functionalities and applications of 3D-shaped audio-visual user interfaces of any of clauses 2-15 may include, without any limitation, the display, visualization, presentation, analysis, and discussion of any or all example text, images, audio-video clips, files, data, characters, scenes, and scenario data during the user-device audio-visual and touch based interactions in user preferred environment and ambiance for user preferred functionalities and applications.

24. The method of clause 1, and devices of clauses 2, 9, and/or 13 wherein the example functionalities and applications of 3D-shaped audio-visual user interfaces of any of clauses

2-15, may also include, without any limitation, the audio-visual and touch based interactions with any or all example text, images, audio-video clips, files, data, characters, scenes, and scenario data displayed on a device located in a room, hallway, mall, shopping center, store, near a reception desk, room, or hallway for greeting, guiding, registration, help, information gathering, and advertising functions during the user-device interactions.

25. The method of clause 1, and devices of clauses 2, 9, and/or 13, wherein the background visual images or video clips, scenes, or scenarios scanned or filmed by the 360° view camera or a plurality of cameras to provide the 360° view during user-device interactions are overlapped as translucent background images or video-clips with the foreground in-focus images or video scenes provided by the camera or the plurality of cameras, or obtained from the database, to provide 3D-visual-effects displayed on the 3D- shaped audio-visual user interfaces during the user-device interactions for user preferred environment, ambiance, functionalities, and applications. 26. The method of clause 1, and devices of clauses 2, 9, and/or 13, wherein the degree of depth perception in 3D-visual-effects displayed on 3D-shaped audio-visual user interfaces is adjusted or controlled by the degree of translucency of the overlapped background visual images or video clips, scenes, and scenarios of clause 25.

27. The method of clause 1, and devices of clauses 2, 9, and/or 13, wherein the 3D- visual-effects of clauses 25 and/or 26 are used during a device-to-device video-conference and video-game-playing session, wherein a user or a group of users around a planar or 3D- visual-effect environment on one device are able to interact or communicate with another user or a group of users around another planar or 3D-visual-effect environment on another device during the device-to-device video conference or video game-playing session within user preferred environment and ambiance for user preferred functionalities and

applications.

28. The method of clause 27, wherein a full room-to-room single 360° video-conference or video-game-playing audio-visual scene sent from a sending device is displayed on a receiving device as: a single 360° wrap around scene on a 360° wrap around display of the receiving device; stretched as two same 180° scenes on two displays forming a front viewing side and a back viewing side of the receiving device; stretched as three same 180° scenes on three displays forming three viewing sides of the receiving device; or stretched as four same 180° scenes on four displays forming four viewing sides of the receiving device.

29. The method of clause 28, wherein the full room-to-room single 360° video- conference or video-game playing audio-visual scene sent from the sending device is further simultaneously stretched and displayed as a single 180° scene on a flat display of a mobile device, a lap-top computer, a desk-top computer, a television monitor, or a wall using a projector.

30. The method of clause 28, wherein the full room-to-room single 360° video- conference or video-game playing audio-visual scene sent from the sending device is displayed on part of the 360° wrap around display of the receiving device, and wherein at least one of text, images, audio-video clips, files, data, characters, scenes, or scenario data is simultaneously displayed on the remaining part of the 360° wrap around display of the receiving device.

31. The method of clause 1, and the devices of clauses 2, 9, and/or 13, wherein the devices are capable of interacting with a user or a group of users during passive or active IVR interactions of clauses 20 and/or 21 in one, more than one, or any combination of the major spoken languages including English, French, Spanish, Russian, German, Portuguese, Chinese-Mandarin, Chinese-Cantonese, Korean, Japanese and major South Asian and Indian languages such as Hindi, Urdu, Punjabi, Bengali, Gujrati, Marathi, Tamil, Telugu, Malayalam, and Konkani, and major African sub-continental and Middle Eastern languages.

32. The method of clause 1, and the devices of clauses 2, 9, and/or 13 wherein the spoken accents, without any limitation, include localized speaking style or dialect of any one or combination of the major spoken languages of clause 28.

33. The method of clause 1, and the devices of clauses 2, 9, and/or 13, wherein the devices are capable of computing on-board and are configured to interact within an ambient environment without a user or group of users present within the environment.

34. The method of clause 1, and the devices of clauses 2, 9, and/or 13, wherein the devices are configured to interact with other devices of the method of clause 1 within an ambient environment without any user or a group of users present within the environment.

35. The method of clause 1, and the devices of clauses 2, 9, and/or 13, wherein the devices are configured to interact with other devices of the method of clause 1 within an ambient environment with a user or a group of users present within the environment.

36. The method of clause 1, and the devices of clauses 2, 9, and/or 13, wherein the devices may include, without any limitation, one or more than one ports for connectivity with FIDMI cable, personal computer, mobile smart phone, tablet computer, telephone line, wireless mobile, an Ethernet cable, a Bluetooth or a Wi-Fi connection.

37. The method of clause 1, and the devices of clauses 2, 9, and/or 13, wherein the devices may answer functionally useful queries and perform functionally useful tasks, while also providing the suitable environment, ambiance, functionalities, and applications as according to a user's preference during the IVR interactions between the device and a user or a group of users.

38. The method of clause 1, and the devices of clauses 2, 9, and/or 13, wherein the devices are used for a user or a group of users for companionship, entertainment, storytelling, education, teaching, training, greeting, guiding, guest service, health-care service, customer service and any other purpose, without any limitation, while also performing functionally useful tasks such as interacting with and controlling other devices via HDMI cable, Ethernet Cable, Bluetooth or Wi-Fi configured to interact with the devices of clauses 2, 9, and/or 13.

39. The method of clause 1, and the devices of clauses 2, 9, and/or 13, wherein the human like and robot like multiple interactive personalities (MTP) or animated multiple interactive personalities (AMIP) chat- and chatter-bots, covered in International

PCT/US 17/29385 and Provisional Application US 16/62381976, are configured to be provided on 3D-shaped audio-visual interfaces of any of clauses 2-15 to interact with a user or a group of users in user preferred environment and ambiance for user preferred functionalities and applications.

40. An IVR device apparatus system, and the devices of clauses 2, 9, and/or 13, comprising of

a physical interactive voice response (IVR) apparatus system;

a central processing unit (cpu);

one or more than one camera, including at least one 360° camera, at least two 180° cameras, at least four 90° cameras, at least six 60° cameras, at least twelve 30° cameras, or various combinations thereof, for image or video sensing hardware that collects or scans input image or video data from users, their environment, and the ambience within the environment within the interaction range of the devices;

one or more than one microphones as sound or audio sensing hardware that collects or scans input audio, sound, or speech data from users, their environment, and the ambience within the environment within the interaction range of the devices;

one or more than one touch sensitive or non-touch sensitive flat or flexible panel displays forming the 3D-shaped audio-visual user interfaces, of any of clauses 2-15, to provide user preferred environment, ambiance, functionalities, and applications to a user or a group of users within the interaction range of the devices;

wired or wireless capability to connect with internet, mobile, cloud computing system, other devices with ports to connect with key-board, USB, HDMI cable, a television, personal computer, mobile smart phone, tablet computer, telephone line, wireless mobile, Ethernet cable, Bluetooth and Wi-Fi connection;

an infrared universal remote output to control external television, projector, audio, video, AR- VR- equipment, devices and appliances;

PCI slot for single or multiple carrier SIM card to connect with direct wireless mobile data line for data and VOIP communication;

onboard battery or power system with wired and inductive charging stations; and memory including the stored previous data related to the interaction of the devices with a user or a group of users as the instructions to be executed by the processor to process the collected input data for the device to perform the following functions without any limitations:

obtain information from all input data;

determine the manner, the mode, and the type of the response; execute the response by the device without any overlap or conflict between different responses possible;

store the information related to updating the information needed to be stored on the device;

change any one or all stored information on the device;

synchronize any one or all stored information on the device with the relevant information stored in the cloud; and

create, delete, store, and update MIP and AMIP personalities on the device. 41. An IVR device apparatus system of clause 40, wherein the input data, within the vicinity or the interaction range including the device and a user or a group of users within an environment, comprises:

one or more communicated characters, words, and sentences relating to written and spoken communication between a user and the device;

one or more communicated images, lights, videos relating to visual and optical communication between a user and the device;

one or more communicated sound or audio related to the communication between a user and the device;

one or more communicated touch related to the communication between a user and the device;

to communicate the information related to determining the environment, ambiance, functionalities, and applications needed by a user or a group of users as according to clause 1.

42. An IVR device apparatus system of clauses 40 and/or 41 configured for interactive interactions provided by the system as used for a user or a group of users for entertainment, engagement, companionship, education, storytelling, karaoke, video-game playing, teaching, training, greeting, guiding, registration, help, information and customer services, and any other purpose, without any limitation, while also performing functionally useful tasks with in the interaction range of a user or a group of users in user preferred environment, ambiance, functionalities, and applications.

43. Computer readable medium with stored executable instructions in IVR device apparatus system of any of clauses 40-42 that when executed by a computer apparatus, cause the computer apparatus to perform the methods of any of clauses 1-39 to receive input data, process the data to provide information to the device apparatus system to interact with a user or a group of users for entertainment, engagement, companionship, education, storytelling, karaoke, video-game playing, teaching, training, greeting, guiding, registration, help, information, and customer services, and any other purpose, without any limitation, while also performing functionally useful tasks within the interaction range of a user or a group of users in user preferred environment, ambiance, functionalities, and applications.

The present disclosure is not limited to the aspects described herein and the constituent elements can be modified in various manners without departing from the spirit and scope of the disclosure. Various aspects of the disclosure can also be extracted from any appropriate combination of a plurality of constituent elements disclosed. Some constituent elements may be deleted from the constituent elements disclosed in the various aspects. The constituent elements described in different aspects may be combined arbitrarily.

Various aspects of the present disclosure are described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show by the way of illustration, specific example aspects by which the disclosure may be practiced. Various aspects, however, may be embodied in many different forms and should not be construed as limited to the aspects set forth herein. Rather, the disclosed aspects are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Throughout the specification and claims, the following terms take the meanings explicitly associated herein, unless the context clearly dictates otherwise. The phrase "in one aspect" as used herein does not necessarily refer to the same aspect, though it may. Furthermore, the phrase "in another aspect" as used herein does not necessarily refer to a different aspect, although it may. Thus, as described below, various aspects of the disclosure may be readily combined, without departing from the scope or spirit of the disclosure.

Still further, while certain aspects of the disclosure have been described, these aspects have been presented by way of example only, and are not intended to limit the scope of the disclosure. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the disclosure.

As used in this specification and claims, the terms "for example," "for instance," "such as," and "like," and the verbs "comprising," "having," "including," and their other verb forms, when used in conjunction with a listing of one or more components or other items, are each to be construed as open-ended, meaning that the listing is not to be considered as excluding other, additional components or items. Other terms are to be construed using their broadest reasonable meaning unless they are used in a context that requires a different interpretation.