Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SYSTEM FOR OFFERING DIGITAL SERVICES WITHIN A PHONE CALL
Document Type and Number:
WIPO Patent Application WO/2018/068096
Kind Code:
A1
Abstract:
A method of providing access to a subscription service in a telephone call, comprising routing a call through a telephone network device, the call being between a call originating device and a call receiving device; accessing customer information relating to an account associated with at least one of the call originating device and the call receiving device, the customer information comprising subscriber information relating to the subscription service; based on the subscription information, determining whether at least one of the call originating device and the call receiving device is a subscription accessing device; and forwarding audio data generated by the subscription accessing device through an application server in response to determining that at least one of the call originating device and the call receiving device is a subscription accessing device.

Inventors:
BAILEY NICK (AU)
POULET GUILLAUME (AU)
PERERA SUSANTHA (AU)
CHEKALKIN VASILY (AU)
ABRAHAM CHERIAN (AU)
TUCH LAURIE (AU)
Application Number:
PCT/AU2017/051109
Publication Date:
April 19, 2018
Filing Date:
October 13, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
OPTUS ADMINISTRATION PTY LTD (AU)
International Classes:
G10L15/30; H04M3/42; H04M7/06; H04M7/10
Foreign References:
US20020094067A12002-07-18
US20070127635A12007-06-07
US20040247094A12004-12-09
Attorney, Agent or Firm:
FB RICE (AU)
Download PDF:
Claims:
CLAIMS:

1. A method of providing access to a subscription service in a telephone call, the method comprising: routing a call through a telephone network device, the call being between a call originating device and a call receiving device; accessing customer information relating to an account associated with at least one of the call originating device and the call receiving device, the customer information comprising subscriber information relating to the subscription service; based on the subscription information, determining whether at least one of the call originating device and the call receiving device is a subscription accessing device; and forwarding audio data generated by the subscription accessing device through an application server in response to determining that at least one of the call originating device and the call receiving device is a subscription accessing device; wherein the application server is configured to monitor the audio data generated by the subscription accessing device to allow for the provision of the subscription service to a user of the subscription accessing device. 2. The method of claim 1, wherein the telephone network device forms part of a mobile telephony core network.3. The method of claim 1 or claim 2, wherein the subscription service comprises a digital assistant service.

4. The method of claim 3, wherein the provision of the subscription service comprises forwarding audio data generated by the subscription accessing device to a digital assistant system.

5. The method of any one of claims 1 to 4, wherein the application server is configured to detect at least one predetermined keyword or phrase in the audio data generated by the subscription accessing device, to provide the subscription service to the user when the predetermined keyword or phrase is detected. 6. The method of any one of claims 1 to 5, wherein the application server is configured to detect and differentiate between at least two predetermined keywords or phrases, and to provide access to at least two different subscription services depending on which keyword or phrase is detected.

7. The method of any one of claims 1 to 6, wherein the telephone network device comprises an IP multimedia subsystem (IMS) and the method comprises routing the call from the call originating device through the IMS and to the call receiving device.

8. The method of any one of claims 1 to 7, wherein the telephone network device comprises a mobile switching centre server (MSS) and the method comprises routing the call from the call originating device through the MSS and to the call receiving device.

9. The method of any one of claims 1 to 8, wherein the telephone network device comprises a media gateway (MGW) and the method comprises routing the call from the call originating device through the MGW and to the call receiving device.

10. The method of any one of claims 1 to 9, wherein the application server comprises a session initiation protocol application server (SIP- AS) and the method comprises forwarding audio data generated by the subscribed subscription accessing device through the SIP- AS in response to determining that at least one of the call originating device and the call receiving device is associated with the subscribed account. 11. The method of claim 10, further comprising the SIP- AS determining a geographic location of the subscription accessing device.

12. The method of claim 11, further comprising the SIP- AS selecting a Real-time Transport Protocol (RTP) engine for routing the call through based on the determined geographic location.

13. The method of claim 12, further comprising the SIP- AS identifying which RTP engine is geographically closest to the subscription accessing device, and routing the call through the identified RTP engine.

14. The method of any one of claims 1 to 13, wherein the application server comprises a keyword spotting (KWS) module configured to monitor the audio data generated by the subscription accessing device to allow for the provision of the subscription service to a user of the subscription accessing device.

15. The method of any one of claims 1 to 14, wherein the call originating device is at least one of a mobile telephone, landline telephone, cell enabled communication device or internet enabled communication device.

16. The method of any one of claims 1 to 15, wherein the call receiving device is at least one of a mobile telephone, landline telephone, cell enabled device or internet enabled device.

17. The method of any one of claims 1 to 16, further comprising accessing subscriber information stored as initial filter criteria (iFC) on the IMS to determine whether at least one of the call originating device and the call receiving device is a subscription accessing device.

18. A method of providing a service within a call, the method comprising: receiving audio data from at least one service accessing device during a telephone call; identifying at least one request in the audio data; generating response audio data in response to the request; and mixing the response audio data with audio data of the telephone call to be received by the at least one subscriber.

19. The method of claim 18, wherein identifying at least one request in the audio data comprises converting the audio data to request text, and identifying at least one request in the request text.

20. The method of claim 19, wherein identifying at least one request in the request text comprises identifying at least one of an intent and an entity in the audio data.

21. The method of any one of claims 18 to 20, wherein generating response audio data in response to the request comprises generating response text, and converting the response text to response audio data.

22. The method of any one of claims 18 to 21, wherein the request is passed to a third party application and the third party application generates a response that is converted to response audio data. 23. The method of claim 22, further comprising providing a developer portal hosted on a server and accessible via a user interface of a computing device, the developer portal comprising executable code configured, when executed, to allow the third party application to be customised by a third party.

24. The method of claim 22 or claim 23, further comprising selecting the third party application from a plurality of third party applications.

25. The method of claim 24, where the third party application is selected based on subscriber preferences.

26. The method of claim 25, wherein the subscriber preferences are stored in a digital assistant database and are associated with a Globally Unique Identifier (GUID) corresponding to the subscriber.

27. The method of claim 25 or 26, further comprising providing a user portal hosted on a server and accessible via a user interface of a computing device, the user portal comprising executable code configured, when executed, to allow the subscriber to modify the subscriber preferences.

287. The method of any one of claims 18 to 27, further comprising receiving a Globally Unique Identifier (GUID) corresponding to the subscriber to allow for personalisation of provided services, wherein the GUID is used to retrieve records from the digital assistant database.

29. The method of claim 28, wherein the only subscriber data received is the GUID.

30. A digital assistant system for providing digital assistant services through a telephone call, the system comprising: a speech module configured to receive, from an application server, audio data generated during a telephone call, and to convert the audio data to text; an intent module configured to receive text from the speech module and to use natural language processing to identify a request based on the text; and a dispatcher module to receive the request and determine an action to take based on the request.

31. The digital assistant system of claim 30, wherein the dispatcher module is configured to generate a response to the request.

32. The digital assistant system of claim 31, wherein the dispatcher module is configured to generate a text response, and communicate the text response to the speech module for conversion to an audio response.

33. The digital assistant system of claim 32, wherein the speech module is configured to forward the audio response to the application server for mixing with the audio of the phone call.

34. The digital assistant system of any one of claims 31 to 33, wherein the speech module is configured to communicate with at least one speech service that provides audio to text and text to audio conversion. 35. The digital assistant system of any one of claims 31 to 34, wherein the intent module is configured to communicate with a natural language processing service that provides natural language processing.

36. The digital assistant system of any one of claims 31 to 35, wherein the dispatcher module is configured to communicate with a developer application that processes the request and generates a response.

37. The digital assistant system of claim 36, further comprising a server hosting a developer portal accessible via a webpage that allows a developer to modify the developer application through a user interface accessible via a computing device, the developer portal being provided by software code executed at the server. 38. The digital assistant system of any one of claims 31 to 37, further comprising a server hosting a user portal accessible via a webpage that allows a user to modify settings and preferences associated with the provision of the digital assistant services through a user interface accessible via a computing device, the user portal being provided by software code executed at the server.

39. The digital assistant system of claim 38, further comprising a digital assistant database for storing the settings and preferences associated with the provision of the digital assistant services.

40. The digital assistant system of any one of claims 31 to 39, further comprising an application programming interface (API) comprising executable code hosted on a server and configured to facilitate communications between the digital assistant system and a telephone infrastructure hosting the application server.

41. The digital assistant system of any one of claims 31 to 40, configured to perform the method of any one of claims 18 to 30. 42. An application server for providing digital assistant services through a telephone call, the application server comprising: an router for forwarding audio data to a digital assistant system, the audio router comprising a keyword spotting module, wherein the keyword spotting module is configured to monitor a telephone call routed through the application server and generate a trigger if a predetermined keyword or phrase is detected; and an audio mixer, the audio mixer configured to mix audio received from the digital assistant system into a phone call; wherein the audio router is configured to forward audio data to the digital assistant system in response to the trigger from the keyword spotting module. 43. The application server of claim 42, wherein the router is an audio router.

44. The application server of claim 42 or claim 43, wherein the keyword spotting module is configured to generate a trigger corresponding to the keyword or phrase detected, and the audio router is configured to select a digital assistant system from a plurality of digital assistant systems based on the trigger, and to forward audio data to the selected digital assistant system in response to the trigger.

45. The application server of any one of claims 42 to 44, wherein the application server supports session initiation protocol (SIP).

46. A telephone network system for providing digital assistant services through a telephone call, the system comprising: the application server of any one of claims 42 to 45, and a telephone network device; wherein the telephone network device is configured to access customer information relating to an account, the customer information comprising subscription information relating to the digital assistant service; and, based on the subscription information, forward audio data generated by a subscription accessing device through the application server.

47. The method of claim 46, wherein the telephone network device forms part of a mobile telephony core network.

48. A method of providing access to a subscription service in a telephone call, the method comprising: receiving audio data from a telephone network device, the audio data having been generated by a subscription accessing device during a telephone call; processing the received audio data to detect at least one predetermined subscription invocation keyword or phrase in the received audio data. 49. The method of claim 48, wherein the telephone network device forms part of a mobile telephony core network.

50. The method of claim 48 or claim 49, further comprising providing access to the subscription service when the predetermined keyword or phrase is detected by forwarding the received audio data to a digital assistant system.

51. The method of any one of claims 48 to 50, wherein the processing comprises monitoring the received audio data to allow for the provision of the subscription service to a user of the subscribed subscription accessing device.

52. A method of providing access to a subscription service in a telephone call, the method comprising: routing a call through a telephone network device, the call being between a call originating device and a service provider device; accessing customer information relating to an account associated with the call originating device, the customer information comprising subscriber information relating to the subscription service; based on the subscription information, determining whether the call originating device is a subscription accessing device; and forwarding audio data generated by the subscription accessing device through an application server in response to determining that the call originating device is a subscription accessing device.

53. The method of claim 52, wherein the telephone network device forms part of a mobile telephony core network.

54. A computer-readable storage medium storing executable program code that, when executed by at least one processor, causes the at least one processor to perform the method of any one of claims 1 to 30, 48 to 53.

Description:
"Method and system for offering digital services within a phone call"

TECHNICAL FIELD Described embodiments generally relate to systems and methods for offering subscription services within a phone call. In particular, described embodiments are directed to systems and methods of routing calls to allow for digital assistant services to be available during a call. BACKGROUND

Digital assistants allow users to use natural language to invoke commands and receive information in the form of digitised speech. For example, users may be able to ask a digital assistant to tell them the weather, look up a definition of a word, or book a service such as a taxi. The digital assistant may respond to the user in the form of natural speech.

In the past, access to digital assistants has primarily been through applications on computing devices. It is desired to address or ameliorate one or more shortcomings or disadvantages associated with prior systems for providing digital assistants, or to at least provide a useful alternative thereto.

Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present disclosure as it existed before the priority date of each claim of this application. Throughout this specification the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. SUMMARY Some embodiments relate to a method of providing access to a subscription service in a telephone call, the method comprising:

routing a call through a telephone network device, the call being between a call originating device and a call receiving device;

accessing customer information relating to an account associated with at least one of the call originating device and the call receiving device, the customer information comprising subscriber information relating to the subscription service; based on the subscription information, determining whether at least one of the call originating device and the call receiving device is a subscription accessing device; and

forwarding audio data generated by the subscription accessing device through an application server in response to determining that at least one of the call originating device and the call receiving device is a subscription accessing device;

wherein the application server is configured to monitor the audio data generated by the subscription accessing device to allow for the provision of the subscription service to a user of the subscription accessing device.

According to some embodiments, the telephone network device forms part of a mobile telephony core network. According to some embodiments, a subscription accessing device is a device associated with a telephone network account of a subscriber of the subscription service.

In some embodiments, the subscription service is or comprises a digital assistant service. In some embodiments, the provision of the subscription service comprises forwarding audio data generated by the subscription accessing device to a digital assistant system.

According to some embodiments, the application server is configured to detect at least one predetermined keyword or phrase in the audio data generated by the subscribed subscription accessing device, to provide the subscription service to the user when the predetermined keyword or phrase is detected.

According to some embodiments, the application server is configured to detect and differentiate between at least two predetermined keywords or phrases, and to provide access to at least two different subscription services depending on which keyword or phrase is detected.

The method of any one of claims 1 to 5, wherein the telephone network device comprises an IP multimedia subsystem (IMS) and the method comprises routing the call from the call originating device through the IMS and to the call receiving device.

In some embodiments, the telephone network device comprises an IP multimedia subsystem (IMS) and the method comprises routing the call from the call originating device through the IMS and to the call receiving device. In some embodiments, the telephone network device comprises a mobile switching centre server (MSS) and the method comprises routing the call from the call originating device through the MSS and to the call receiving device. In some embodiments, the telephone network device comprises a media gateway (MGW) and the method comprises routing the call from the call originating device through the MGW and to the call receiving device.

According to some embodiments, the application server is or comprises a session initiation protocol application server (SIP- AS) and the method comprises forwarding audio data generated by the subscribed subscription accessing device through the SIP- AS in response to determining that at least one of the call originating device and the call receiving device is associated with the subscribed account.

According to some embodiments, the method further comprises the SIP- AS

determining a geographic location of the subscription accessing device. The SIP-AS may select a Real-time Transport Protocol (RTP) engine for routing the call through based on the determined geographic location. In some embodiments, the SIP- AS identifies which RTP engine is geographically closest to the subscription accessing device, and routes the call through the identified RTP engine.

In some embodiments, the application server comprises a keyword spotting (KWS) module configured to monitor the audio data generated by the subscription accessing device to allow for the provision of the subscription service to a user of the subscription accessing device. In some embodiments, the call originating device is at least one of a mobile telephone, landline telephone, cell enabled communication device or internet enabled communication device. In some embodiments, the call receiving device is at least one of a mobile telephone, landline telephone, cell enabled device or internet enabled device.

Some embodiments further comprise accessing subscriber information stored as initial filter criteria (iFC) on the IMS to determine whether at least one of the call originating device and the call receiving device is a subscription accessing device wherein the subscriber information is stored as initial filter criteria (iFC) on the telephone network device.

Some embodiments relate to a method of providing a service within a call, the method comprising:

receiving audio data from at least one service accessing device during a telephone call;

identifying at least one request in the audio data;

generating response audio data in response to the request; and

mixing the response audio data with audio data of the telephone call to be received by the at least one subscriber.

In some embodiments, identifying at least one request in the audio data comprises converting the audio data to request text, and identifying at least one request in the request text. According to some embodiments, identifying at least one request in the request text comprises identifying at least one of an intent and an entity in the audio data. According to some embodiments, generating response audio data in response to the request comprises generating response text, and converting the response text to response audio data.

According to some embodiments, the request is passed to a third party application and the third party application generates a response that is converted to response audio data. Some embodiments further comprise providing a developer portal hosted on a server and accessible via a user interface of a computing device, the developer portal comprising executable code configured, when executed, to allow the third party application to be customised by a third party. Some embodiments further comprise selecting the third party application from a plurality of third party applications. According to some embodiments, the third party application is selected based on subscriber preferences.

According to some embodiments, the subscriber preferences are stored in a digital assistant database and are associated with a Globally Unique Identifier (GUID) corresponding to the subscriber. Some embodiments further comprise providing a user portal hosted on a server and accessible via a user interface of a computing device, the user portal comprising executable code configured, when executed, to allow the subscriber to modify the subscriber preferences.

Some embodiments further comprise receiving a Globally Unique Identifier (GUID) corresponding to the subscriber to allow for personalisation of provided services, wherein the GUID is used to retrieve records from the digital assistant database. According to some embodiments, the only subscriber data received is the GUID.

Some embodiments relate to a digital assistant system for providing digital assistant services through a telephone call, the system comprising:

a speech module configured to receive, from an application server, audio data generated during a telephone call, and to convert the audio data to text;

an intent module configured to receive text from the speech module and to use natural language processing to identify a request based on the text; and

a dispatcher module to receive the request and determine an action to take based on the request. According to some embodiments, the dispatcher module is configured to generate a response to the request. In some embodiments, the dispatcher module is configured to generate a text response, and communicate the text response to the speech module for conversion to an audio response. According to some embodiments, the speech module is configured to forward the audio response to the application server for mixing with the audio of the phone call. In some embodiments, the speech module is configured to communicate with at least one speech service that provides audio to text and text to audio conversion.

According to some embodiments, the intent module is configured to communicate with a natural language processing service that provides natural language processing.

According to some embodiments, the dispatcher module is configured to communicate with a developer application that processes the request and generates a response. Some embodiments further comprise a server hosting a developer portal accessible via a webpage that allows a developer to modify the developer application through a user interface accessible via a computing device, the developer portal comprising software code executed at the server.

Some embodiments further comprise a server hosting a user portal accessible via a webpage that allows a user to modify settings and preferences associated with the provision of the digital assistant services through a user interface accessible via a computing device, the user portal comprising software code executed at the server.

Some embodiments further comprise an application programming interface (API) configured to facilitate communications between the digital assistant system and a telephone infrastructure hosting the application server.

According to some embodiments, the digital assistant system is configured to perform the method of some other embodiments.

Some embodiments relate to an application server for providing digital assistant services through a telephone call, the application server comprising:

a router for forwarding audio data to a digital assistant system, the router comprising a keyword spotting module, wherein the keyword spotting module is configured to monitor a telephone call routed through the application server and generate a trigger if a predetermined keyword or phrase is detected; and

an audio mixer, the audio mixer configured to mix audio received from the digital assistant system into a phone call;

wherein the router is configured to forward audio data to the digital assistant system in response to the trigger from the keyword spotting module. In some embodiments, the router is an audio router. According to some embodiments, the keyword spotting module is configured to generate a trigger corresponding to the keyword or phrase detected, and the audio router is configured to select a digital assistant system from a plurality of digital assistant systems based on the trigger, and to forward audio data to the selected digital assistant system in response to the trigger.

In some embodiments, the application server supports session initiation protocol (SIP).

Some embodiments relate to a telephone network system for providing digital assistant services through a telephone call, the system comprising:

the application server of any one of claims 37 to 39, and

a telephone network device;

wherein the telephone network device is configured to access customer information relating to an account, the customer information comprising subscription information relating to the digital assistant service; and, based on the subscription information, forward audio data generated by a subscription accessing device through the application server.

Some embodiments relate to a method of providing access to a subscription service in a telephone call, the method comprising:

receiving audio data from a telephone network device, the audio data having been generated by a subscription accessing device during a telephone call;

processing the received audio data to detect at least one predetermined subscription invocation keyword or phrase in the received audio data.

Some embodiments relate to a method of providing access to a subscription service in a telephone call, the method comprising: routing a call through a telephone network device, the call being between a call originating device and a service provider device;

accessing customer information relating to an account associated with the call originating device, the customer information comprising subscriber information relating to the subscription service;

based on the subscription information, determining whether the call originating device is a subscription accessing device; and forwarding audio data generated by the subscription accessing device through an application server in response to determining that the call originating device is a subscription accessing device. Some embodiments relate to a computer-readable storage medium storing executable program code that, when executed by a processor, causes the processor to perform the method of any one of some other embodiments.

BRIEF DESCRIPTION OF DRAWINGS

Embodiments are described in further detail below, by way of example and with reference to the accompanying drawings, in which: Figure 1 shows a system for offering digital services within a phone call according to some embodiments;

Figure 2 shows a method of routing calls made by a subscriber of the digital services; Figure 3 shows a method of routing calls to a subscriber of the digital services;

Figure 4 shows a method of processing calls routed through a digital assistant; and Figure 5 shows a system for offering digital services within a phone call according to some alternative embodiments.

DETAILED DESCRIPTION

Described embodiments generally relate to systems and methods for offering subscription services within a phone call. In particular, described embodiments are directed to systems and methods of routing calls to allow for digital assistant services to be available during a call.

Figure 1 shows a system 100 for the provision of subscription services within a telephone call. In particular, system 100 may be used to provide digital services or digital assistant services within a phone call. An alternative system 500 for the provision of subscription services within a telephone call that may be used to provide digital services or digital assistant services within a phone call is shown in Figure 5. According to some embodiments, system 100 may further be used to provide translation services, voicemail services, call forwarding services, and other subscription services within a phone call. System 100 includes mobile telephone infrastructure 110, as well as a digital assistant infrastructure 120. Mobile telephone infrastructure 110 and digital assistant infrastructure 120 may each comprise one or more servers, server systems, databases, data stores, memory devices, computing devices, computer networks, telephone exchanges and other network hardware. Digital assistant infrastructure 120 may include infrastructure to support multiple digital assistants, such as digital assistant infrastructure 120a, digital assistant infrastructure 120b, and any other digital assistant infrastructures. Digital assistant infrastructure 120a is described in detail in this document, but it should be understood that digital assistant infrastructure 120b, and any further digital assistant infrastructures, may be configured similarly to digital assistant infrastructure 120a.

Telephone infrastructure 110 may include existing telephone infrastructure used to offer telephone services to customers, as well as infrastructure specifically designed for the provision of digital assistant services. Digital assistant infrastructure 120 provides an interactive voice system accessible from within a phone conversation established using telephone infrastructure 110. Digital assistant infrastructure 120 may comprise one or more physical servers or server systems, and may be a cloud based system in some embodiments. According to some embodiments, digital assistant infrastructure 120 may be hosted on a cloud computing platform, such as the Amazon Web Services (AWS) cloud service, for example.

Telephone infrastructure 110 includes an information technology subsystem 130 having a customer database 132, a front end module 134, and a carrier services module 136. Telephone infrastructure 110 further includes a mobile core 140, which may be a telephone network device or system. The telephone network device or mobile core 140 may form part of a mobile telephony core network. The telephone network device or mobile core 140 may have an IP multimedia subsystem (IMS) core 142, and a session initiation protocol application server (SIP-AS) 144. According to some alternative embodiments, as described below with reference to Figure 5, mobile core 140 may have an IP multimedia subsystem (IMS) core 142, and an application server 510. Telephone infrastructure 110 may be accessed by customer telephone devices 195 through an access network 190, or other existing infrastructure used to connect telephone calls between subscribers, where the customer device 195 is registered to a customer of the telephone network of telephone infrastructure 110. Telephone infrastructure 110 may also be accessed by non-customer telephone devices 180 via a point of interconnection (POI), which may route the call to telephone infrastructure 110. Telephone devices 195 and 180 may be mobile phones, landline phones, or other cell or internet enabled communication devices, including tablets, laptops and personal computers.

During a phone call established over IMS core 142, the voice of one or more phone subscribers that have previously opted-in (or subscribed) to receive digital assistant services can be processed via SIP-AS 144. According to some embodiments, a phone call may be between two or more parties. Where more than one party to the call is an opted-in subscriber, the voices of more than one party to the call may be processed via SIP- AS 144. Each subscriber's voice may be processed separately. According to some embodiments, only the voice of subscribers is processed, in order to comply with privacy regulations and concerns. If SIP- AS 144 detects one or more triggers generated based on digital assistant invocation phrases or keywords detected in the processed voice data of the subscriber, the SIP-AS 144 may route the call to the appropriate digital assistant services hosted on digital assistant infrastructure 120a or 120b, which can handle subsequent commands spoken by the subscriber. Invocation phrases or keywords may be stored in a lookup table, database or other data structure within a memory device associated with SIP-AS 144. According to some embodiments, each digital assistant infrastructure 120 may be associated with one or more invocation phrases or keywords, SIP-AS 144 may be configured to detect and differentiate between at least two predetermined keywords or phrases, and the call may be routed to a different digital assistant infrastructure 120a or 120b depending on which keywords or phrases are detected by SIP- AS 144.

Customer database 132 may store information about the customers of telephone infrastructure 110, including information about whether or not each customer is a subscriber to the digital assistant services offered by the digital assistant infrastructure 120. Customer database 132 may further store the Mobile Station International Subscriber Directory Number (MSISDN) used to identify the mobile phone number of each customer, as well as the Globally Unique Identifier (GUID) assigned to each subscriber of the digital assistant services. Customer database 132 may be an existing component of the telephone network, configured to store the additional subscriber data for the digital assistant services, such as the GUID, for each subscribed customer. The GUID allows a particular customer to be identified within digital assistant infrastructure 120a without compromising the customer's privacy, or passing on the customer's details. However, use of the GUID allows for the customer to receive personalised services based on preferences they set with regard to how they wish for the services to be performed. Customer database 132 may be hosted on a cloud based system, or on one or more physical servers or server systems. For example, customer database 132 may be hosted on a relational database such as an ORACLE database in some embodiments. Carrier services module 136 may be in communication with customer database 132, and may be configured to retrieve data from and store data to customer database 132. Front end module 134 provides a computerised service by which customers of telephone infrastructure 110 can become subscribers of the digital assistant services. Front end module 134 may include a server that hosts a website that customers can access over a network such as Internet 199 via a client computing device 198 to subscribe to one or more services offered by digital assistant infrastructure 120. According to some embodiments, front end module 134 may alternatively be accessed via a telephone application or mobile application. Front end module 134 may allow existing subscribers to set and alter preferences and settings corresponding to their account and their access to digital assistant infrastructure 120. For example, where digital assistant infrastructure 120 includes two or more digital assistant infrastructures 120a and 120b, a customer may be able to set a preference regarding which digital assistant infrastructure 120a or 120b should be accessed. In some embodiments, customers may be able to set and modify preferences regarding which of their personal information they authorise being shared with digital assistant infrastructure 120. In some embodiments, customers may also be able to subscribe to digital assistant services by calling their telephone services provider, or by filling out a digital or physical form that can be provided to their telephone services provider.

Carrier services module 136 may provide secure services within telephone infrastructure 110 to handle provisioning of the digital assistant services, and may comprise an authorisation server 137. Authorisation server 137 may comprise code executable to provide authorisation services to carrier services module 136. According to some embodiments, authorisation server 137 may provide token-based authentication and authorisation, and may use the Open Authorization (OAUTH) standard, for example. Authorisation server 137 may allow select customer information stored in customer database 132 to be provided to third party services hosted by digital assistant infrastructure 120, without exposing sensitive information such as the user's identity or password without the customer's authorisation. Carrier services module 136 may update subscriber profile information within customer database 132 by uploading information entered by a user via front end module 134 to customer database 132. When a new user subscribes to the digital assistant services offered by digital assistant infrastructure 120, carrier service module 136 may receive information such as the MSISDN and an authorisation personal identification number (PIN) entered by a subscriber from front end module 134. The subscriber may be required to enter the PIN to authenticate the subscriber's account at the time of opting in to the digital assistant services. Carrier services module 136 may communicate with front end module 134 via Secure Hypertext Transfer Protocol (HTTPS) in order to facilitate the subscription of new customers to the digital assistant services provided by digital assistant infrastructure 120 and to store new subscriber information received from front end module 134 in customer database 132. Carrier services module 136 may also write data to customer database 132 when a customer alters their settings or preferences with regard to the provision of the digital assistant services.

Carrier services module 136 may further communicate to customer database 132 to retrieve the stored MSISDN and GUID for a customer, in order to allow for the anonymisation of subscriber data when a subscriber accesses digital assistant services, and to identify authorised subscribers. Authorisation server 137 may generate a onetime token when a user opts in to use the digital assistant services, to allow the digital assistant infrastructure to access specific user information stored in customer database 132, such as the GUID associated with a user, when the user opts in to the digital assistant services. The one-time token may be an OAUTH token in some embodiments.

When a telecom subscriber visits front end module 134 and attempts to opt- in for the digital assistant services, front end module 134 will authenticate the customer using an appropriate existing telecom authentication mechanism. Front end module 134 will then issue a one-time authorisation token that will be forwarded to user portal 121 via a user browser session. The authorisation will be used by API 123 (as described below) together with a predefined key to retrieve an authorisation token from carrier services module 136. The authorisation token can subsequently be used to get further information, such as the GUID, from carrier services module 136. The GUID may be accessed every time a subscriber makes a call, to allow for the identification of a user to provide customisation of the digital assistant services based on the user preferences, without supplying the user's personal information to the digital assistant infrastructure 120a. Carrier services module 136 may communicate with customer database 132 via the transmission control protocol (TCP). Carrier services module 136 may pass the GUID to SIP- AS 144 prior to and/or when a predetermined keyword or phrase is detected and digital assistant services are invoked, based on a request for the GUID from SIP-AS 144. Carrier services module 136 may also interface with existing services offered by telephone infrastructure 110, allowing the enrichment of the call based on information available to the network carrier. For example, carrier services module 136 may be configured to provide location based services, credit checks, or supply information about the subscriber such as the subscriber's name, address, billing history or call history, for example. Carrier services module 136 may supply this information to API 123 (described below) of digital assistant infrastructure 120, in order to allow the information to be used when responding to subscriber requests being processed by API 123. According to some embodiments, mobile core 140 may comprise a mobile switching system (MSS) instead of IMS core 142. Mobile core 140 may further comprise a media and signalling gateway in some embodiments, which may be used for handling the Real-Time Transport Protocol (RTP) media of the call. A 2G/3G mobile core MSS may be used to control switching subsystems of the telephone network. According to some embodiments, the MSS may support other generations of the telephone network, such as the Global System for Mobile Communications (GSM). The media and signalling gateway allows for communication between different network types, such as the Public Switched Public Network (PSTN) or Public Land Mobile Network (PLMN), for example. In some embodiments, the media and signalling gateway may also allow for communication between at least one of the Plain Old Telephone Service (POTS), Signalling System 7 (SS7), Next Generation Networks (2G, 2.5G and 3G radio access networks) or private branch exchange (PBX) networks.

IMS core 142 may connect calls to and from non-customer telephone devices 180 and customer telephone devices 195. IMS core 142 may form part of current packet- switched telephone networks and may comprise a standardized framework for delivering IP multimedia services over an IP packet-switched network. SIP-AS 144 may receive MSISDN data from IMS core 142 via Session Initiation protocol (SIP). The MSISDN data may be stored by IMS core 142 in a Home Subscriber Database (HSS) used by IMS core 142 to authorise a device to access the network and retrieve subscriber information including an initial filtering criteria (iFC) profile. In an alternative embodiments, MSISDN data may be retrieved from customer database 132. Initial filtering criteria (iFC) module 143 may be stored in IMS core 142, and may hold control logic for routing calls to SIP-AS 144. iFC module 143 may be configured to retrieve subscriber information via customer database 132, and to use this information to determine whether a call should be routed through SIP-AS 144. According to some embodiments, some customer profile information retrieved from customer database 132 may be stored within memory of IMS core 142 via a cache mechanism to improve system performance. iFC module 143 may be a standard module of IMS core 142 as used in current packet- switched telephone networks. IMS core 142 may further communicate voice data generated during a call to the SIP-AS 144 via the Real-time Transport Protocol (RTP). .

SIP-AS 144 may be configured to handle call establishment on the network and to interpret voice content of subscribers. In order to handle call establishment, SIP-AS 144 may comprise a SIP switching system, which may include the open source FreeSWITCH switching system in some embodiments. SIP-AS 144 may also include a set of SIP user agents. The SIP user agents may handle voice compression encoding in the call, and may scale with the number of voice streams being handled. Compressed and encoded voice signals or data may be passed to the keyword spotting module 146, described below. In some embodiments, the SIP user agents may be implemented as proxy agents, where the signalling and RTP media is only handled by the SIP user agent as it passes through the SIP user agent, enabling modification of the media. In other embodiments, the SIP user agents may be implemented as back-to-back user agents, which may handle signalling and RTP Media as an endpoint, enabling control of the phone call and allowing functions such as call conferencing. SIP-AS 144 may receive MSISDN and GUID data from carrier services module 136 in order to allow for subscriber data to remain anonymous during interactions with digital assistant services. SIP-AS 144 may communicate with carrier services module 136 via the Secure Hypertext Transfer Protocol (HTTPS). SIP-AS 144 may include a router, such as an audio router 145, for routing calls to digital assistant infrastructure 120a. According to some embodiments, audio router 145 may include a keyword spotting (KWS) module 146, to allow SIP-AS 144 to determine whether a subscriber spoke a keyword during an established phone call. KWS module 146 may use voice recognition and analysis software to enable keyword spotting to be performed, and may generate a trigger if a predetermined keyword or phrase is detected. For example, the open source PocketSphinx software may be used by KWS module 146 for voice recognition and analysis. If a predetermined keyword or phrase is recognised, audio router 145 may be configured to forward just the voice command of the subscriber of the established call through digital assistant infrastructure 120 as audio data. Voice data or samples of voice data generated by the subscriber may be forwarded to digital assistant infrastructure 120 in response to the trigger generated by KWS module 146, and digital assistant infrastructure 120 may be configured to process the subscriber's command and generate a response. The voice data or samples may be taken beginning at a time when the keyword or phrase is detected, and continuing for a predetermined amount of time after the keyword or phrase was first spoken.

Audio router 145 may use phoneme-based routing and/or rules-based routing to route voice data. In phoneme based routing, the voice data is routed based on whether or not a particular keyword, phrase or phoneme was detected in the processed audio received from IMS core 142. According to some embodiments, the routing location may be dependent on the detection of a particular phoneme. For example, the detection of one invocation phrase or keyword may cause audio router 145 to forward the subscriber's voice data from the call to digital assistant infrastructure 120a, while the detection of a second, different invocation phrase or keyword may cause audio router 145 to forward the subscriber's voice data from the call to digital assistant infrastructure 120b. KWS module 146 of audio router 145 may use a phoneme sequence detector to detect phonemes, which uses existing acoustic and phoneme models to determine if the audio data contained a phoneme.

In rules-based routing, a call is routed based on a routing rule. For example, the rule may be established within a subscriber's user preferences. According to some embodiments, the rule may be to route only specific calls through digital assistant infrastructure 120. For example, only calls made to numbers specified by the subscriber, such as calls to specific family members, might be routed. According to some embodiments, a combination of rules-based and phoneme-based routing may be used.

KWS module 146 of audio router 145 may be used to match an invocation phrase or keyword spoken by a subscriber to a particular digital assistant infrastructure 120a or 120b, and thereafter route the voice data of the opted-in subscriber who has spoken the keyword or phrase to the destination digital assistant infrastructure 120a or 120b. Audio router 145 does not route the whole call to digital assistant infrastructure 120, but only a copy of the audio data generated at the subscriber's device forming a voice command.

According to some embodiments, the voice command may comprise any audio data generated by the subscriber for a predetermined time period after invoking the digital assistant by speaking the predetermined keyword or phrase. For example, the voice command may include any audio generated for 15, 30, 45 or 60 seconds after the keyword or phrase is spoken. Audio router 145 of SIP-AS 144 may be configured to transmit the voice data of the subscriber and the GUID corresponding to the subscriber to a speech module 125 of the digital assistant infrastructure 120a, described in further detail below. SIP- AS 144 may communicate with speech module 125 via a full duplex secure communication channel. According to some embodiments, SIP-AS 144 may communicate with speech module 125 via a secure web socket (WSS). According to some embodiments, no subscriber information is passed to speech module 125 aside from the GUID, to provide anonymity to the subscriber.

SIP- AS 144 may further include an audio mixer 147 configured to mix any audio response generated by digital assistant infrastructure 120a into the call for the subscriber to hear. Audio mixer 147 may be configured to combine multiple sounds into one or more channels. Audio mixer 147 may be configured to manipulate and/or enhance one or more of a source's volume level, frequency content, dynamics, and panoramic position. Audio mixer 147 may be configured to mix the sound generated by the digital assistant infrastructure 120a into one or more audio legs or channels of the call.

Digital assistant infrastructure 120a includes a user portal 121, a developer portal 122, an application programming interface (API) 123 and a digital assistant database 124. Digital assistant infrastructure 120a further includes a speech module 125, an intent module 126, and a dispatcher module 127. Digital assistant infrastructure 120a may be configured to process speech received from telephone infrastructure 110 and to deliver the required services back to the subscriber by inserting audio responses corresponding to the subscriber's request back into the subscriber's telephone conversation. According to some embodiments, the services may also include physical services such as ordering transportation, food, or other products to be delivered to the subscriber or to an address specified by the subscriber or stored in the subscriber's preferences.

User portal 121 may be a portal accessible to a user via a web-based interface hosted on a server on Internet 199 and accessible via a computing device 198 to allow the user to enable or disable digital assistant services and manage preferences of the digital assistant infrastructure 120a. According to some embodiments, user portal 121 may alternatively be accessed via a telephone application or mobile application. User portal 121 may be configured to manage preferences relating to the delivery of third party services to an existing subscriber. This is different to front end module 134, which may be configured to manage preferences relating to the subscription status of a customer, and customer preferences relating to the access to the digital assistant services. For example, through user portal 121 a subscriber may set their preferences so that if they ask for a pizza to be delivered to their address the pizza is ordered through a particular pizza delivery company. When accessing the digital assistant services, the services will be prioritised based on the preferences set by the user.

Dispatcher module 127 makes use of API 123 to retrieve the user preferences from digital assistant database 124. Dispatcher module 127 applies this retrieved information to a service provider determining function that outputs which service provider should be used to fulfil the subscriber's request. According to some embodiment, the service provider determining function may operate by calculating a sum of priority integers for each input parameter of user preferences, and comparing the calculated sum with a calculated priority integer of each developer application 170 to determine which developer application 170 is most relevant for the request. The service provider determining function may take into account multiple parameters in addition to user preferences, such as the physical distance from the subscriber to the service provider, the opening hours of the service provider, and other parameters. Dispatcher 127 then routes the user command to the selected developer application 170. User portal 121 may communicate with API 123 in order to allow for the subscriber preferences to be stored within digital assistant database 124. Developer portal 122 may include a portal accessible by third party developers, in order to allow developers to create interfaces that allow their services to be accessible through the digital assistant. Developer portal 122 allows each third party developer to create a unique and personalised interface for their services. According to some embodiments, developer portal 122 may be accessible via a website hosted on a server on Internet 199 and accessible via a computing device 198. According to some embodiments, developer portal 122 may alternatively be accessed via a telephone application or mobile application. The portal may be used by third party service providers to set up their services for access via digital assistant infrastructure 120a. For example, a cab company or fast food delivery company may use developer portal 122 to create and configure services to be available to a user during a call, such that a user can order a taxi or a fast food delivery through natural speech after invoking the digital assistant services. API 123 may include a set of services configured to securely facilitate operations between systems such as user portal 121 and developer portal 122. For example, API 123 may receive user applications selections from user portal 121 when a subscriber uses user portal 121 to enter which third party applications they wish to have access to. API 123 may also receive developer profile information from developer portal 122. API 123 may communicate with digital assistant database 124 to store the data received from user portal 121 and developer portal 122, and may use the GUID to identify the subscriber to which particular preferences received from user portal 121 belong. API 123 may also communicate with carrier services module 136 to retrieve a GUID and other subscriber information from customer database 132 based on a provided authorisation token generated by authorisation server 137 and communicated to API 123 by carrier services module 136. The authorisation token allows for the retrieval of customer information from customer database 132 in a secure way, by ensuring only authorised processes that have the appropriate token are allowed access to the information. API 123 may also provide additional data such as service usage statistics and user profile information to carrier services module 136. API 123 may communicate with user portal 121, developer portal 122, and services module 136 via HTTPS. API 123 may communicate with digital assistant database 124 via TCP.

Digital assistant database 124 may be a database storing subscriber preferences received from user portal 121 via API 123. Digital assistant database 124 may also store a list of user applications, being applications that a given subscriber has access to, and developer profiles, being the details of a particular developer that is building developer application 170 for access through the digital assistant. Developer details may include the first name, last name, company, ABN, contact number, contact email, and/or password of a registered developer. Information may be retrieved from digital assistant database 124 by API 123 during a call to ensure that the subscribers speech and commands are processed and interpreted in accordance with the stored subscriber preferences. For example, a subscriber may be able to set their preferred service provider for a particular service request, such as a particular pizza store for when they order pizza, or a particular source for the retrieval of information, such as a particular meteorology website for when they ask for information about the weather. Digital assistant database 124 may be hosted on a cloud based system, or on one or more physical servers or server systems. For example, digital assistant database 124 may be a PostgreSQL database hosted on the Amazon Relational Database Service (RDS) in some embodiments.

Speech module 125 may provide real time speech-to-text and text-to- speech services. Speech module 125 may receive audio data from audio router 145 of SIP-AS 144 if KWS module 146 determines that the digital assistant services have been invoked, based on whether a predetermined keyword or phrase is recognised. Speech module 125 may convert the audio received to a text transcript, and pass on the text to the intent module 126. Speech module 125 may be configured to transmit the text generated based on the voice data of the subscriber and the GUID corresponding to the subscriber to intent module 126 via a full duplex secure communication channel. In some embodiments, the received audio may be processed by intent module 126 without first converting it to text. According to some embodiments, no subscriber information is passed to intent module 126 aside from the GUID, to provide anonymity to the subscriber.

Speech module 125 may communicate with a speech service 150 in order to process the speech. Speech service 150 may be an existing speech service, such as the Bing Speech API, for example. Speech module 125 may be configured to transmit the voice data of the subscriber and the GUID corresponding to the subscriber to speech service 150 via a full duplex secure communication channel. According to some embodiments, no subscriber information is passed to speech service 150 aside from the GUID, to preserve the anonymity of the subscriber. Intent module 126 may be configured to determine an intent of the processed speech of the subscriber. Intent module 126 may use a real time natural language processor to recognise intent and entities from the text commands or audio data received from speech module 125. For example, intent module 126 may recognise that the subscriber wished to order food from a particular entity or food delivery service, or that the subscriber asked about the weather in Sydney today. Recognised intents may refer to the subject of the question, while entities may refer to the specific parameters for which the question is being asked. For example, where a subscriber asks about the weather in Sydney today, the subject of the weather is the intent, while the location Sydney and the data and time of today are entities that are used to limit the parameters of the question. Intent module 126 may pass on the commands determined based on the text commands or audio data to the dispatcher module 127. Intent module 126 may be configured to transmit the commands and the GUID corresponding to the subscriber to dispatcher module 127 via a full duplex secure communication channel. According to some embodiments, no subscriber information is passed to dispatcher module 127 aside from the GUID, to provide anonymity to the subscriber.

According to some embodiments, where no intent can be determined (such as if no words were detected, or a clear sentence could not be reconstructed), intent module 126 may communicate with speech module 125 to cause an audio message to be relayed back to the subscriber. For example, the message may state that the system did not understand what was said, and ask the subscriber to repeat their request again.

Intent module 126 may communicate with a natural language processor (NLP) service 160 in order to process the text or audio. NLP service 160 may be an existing NLP service, such as the LUIS NLP API, for example. Intent module 126 may be configured to transmit the text or audio to be processed and the GUID corresponding to the subscriber to NLP service 160 via HTTPS. According to some embodiments, no subscriber information is passed to NLP service 160 aside from the GUID, to provide anonymity to the subscriber.

Dispatcher module 127 may be configured to dispatch commands received from intent module 126 to the relevant developer application 170. Developer application 170 may be configured to then respond to the subscriber's commands by providing information, services, or both. According to some embodiments, developer application 170 may be configured to communicate information to API 123, in order to cause API 123 to read to or retrieve data from digital assistant database 124. Dispatcher module 127 may also communicate with API 123 in some embodiments, to retrieve a list of services from digital assistant database 124 which are available to the particular subscriber. Where a speech response is to be provided to the subscriber, developer application 170 may be configured to produce the response in the form of text. Developer application 170 may transmit the generated text and the GUID corresponding to the subscriber to speech module 125, which may convert the text to speech. According to some alternative embodiments, developer application 170 may be configured to produce the response in the form of speech automatically. This speech may then be passed back through to audio mixer 147 to be mixed with the subscriber's call. According to some embodiments, no subscriber information is passed to API 123 aside from the GUID, to provide anonymity to the subscriber. The speech received from speech module 125 may be mixed into the audio of both parties of the call, so that both parties can hear the subscriber's conversation with the digital assistant. According to some alternative embodiments, the response may be mixed only into the audio received by the subscriber, such that a non-subscriber does not hear the digital assistant's response. According to some embodiments, a subscriber may be able to alter their preferences through user portal 121 to specify which digital assistant responses they wish to be heard by the other party in the call. For example, a customer may authorise responses to requests for services such as fast food delivery to be heard by the other party in a call, but may wish to prevent the other party from hearing responses to queries about events in the subscriber's calendar. Figure 5 shows an alternative system 500 for the provision of subscription services within a telephone call.

System 500 may be configured to provide similar or identical services to those described above with reference to system 100 of Figure 1, such as digital assistant services, translation services, voicemail services, call forwarding services, and other subscription services within a phone call.

System 500 includes mobile telephone infrastructure 110, as well as a digital assistant infrastructure 120. In the illustrated embodiment, digital assistant infrastructure 120 is identical to that described above with reference to Figure 1. Mobile telephone infrastructure 110 may comprise one or more servers, server systems, databases, data stores, memory devices, computing devices, computer networks, telephone exchanges and other network hardware. Telephone infrastructure 110 may include existing telephone infrastructure used to offer telephone services to customers, as well as infrastructure specifically designed for the provision of digital assistant services.

Telephone infrastructure 110 includes an information technology subsystem 130 as described above with reference to Figure 1. Telephone infrastructure 110 further includes a mobile core 140, which may be a telephone network device or system. The telephone network device or mobile core 140 may form part of a mobile telephony core network, and may have an IP multimedia subsystem (IMS) core 142, and an application server 510 having a session initiation protocol application server (SIP- AS) 144, an RTP engine 520, and a keyword spotting (KWS) module 146. RTP engine 520 may comprise an audio router 145 and an audio mixer 147. Telephone infrastructure 110 may be accessed by customer telephone devices 195 and non-customer devices 180 through an access network 190, or other existing infrastructure used to connect telephone calls between subscribers, where the customer device 195 is registered to a customer of the telephone network of telephone infrastructure 110. Telephone devices 195 and 180 may be mobile phones, landline phones, or other cell or internet enabled communication devices.

During a phone call established over IMS core 142, the voice of one or more phone subscribers that have previously opted-in (or subscribed) to receive digital assistant services can be processed via application server 510. According to some embodiments, a phone call may be between two or more parties. Where more than one party to the call is an opted-in subscriber, the voices of more than one party to the call may be processed via application server 510. Each subscriber's voice may be processed separately. According to some embodiments, only the voice of subscribers is processed, in order to comply with privacy regulations and concerns. If application server 510 detects one or more triggers generated based on digital assistant invocation phrases or keywords detected in the processed voice data of the subscriber, application server 510 may route the call to the appropriate digital assistant services hosted on digital assistant infrastructure 120a or 120b, which can handle subsequent commands spoken by the subscriber. Invocation phrases or keywords may be stored in a lookup table, database or other data structure within a memory device associated with application server 510. According to some embodiments, each digital assistant infrastructure 120 may be associated with one or more invocation phrases or keywords, application server 510 may be configured to detect and differentiate between at least two predetermined keywords or phrases, and the call may be routed to a different digital assistant infrastructure 120a or 120b depending on which keywords or phrases are detected by application server 510.

IMS core 142 may connect calls to and from non-customer telephone devices 180 and customer telephone devices 195. IMS core 142 may form part of current packet- switched telephone networks and may comprise a standardized framework for delivering IP multimedia services over an IP packet- switched network. Application server 510 may receive MSISDN data from IMS core 142 via Session Initiation protocol (SIP). The MSISDN data may be stored by IMS core 142 in a Home Subscriber Database (HSS) used by IMS core 142 to authorise a device to access the network and retrieve subscriber information including an initial filtering criteria (iFC) profile. iFC module 143 may be stored in IMS core 142, and may hold control logic for routing calls to application server 510. iFC module 143 may be configured to retrieve subscriber information via customer database 132, and to use this information to determine whether a call should be routed through application server 510. According to some embodiments, some customer profile information retrieved from customer database 132 may be stored within memory of IMS core 142 via a cache mechanism to improve system performance. iFC module 143 may be a standard module of IMS core 142 as used in current packet-switched telephone networks. IMS core 142 may further communicate voice data generated during a call to the application server 510 via the Real-time Transport Protocol (RTP).

Application server 510 may be configured to handle call establishment on the network and to interpret voice content of subscribers. In order to handle call establishment, application server 510 may comprise SIP- AS 144, audio router 145, keyword spotting module 146 and an audio mixer 147. Compressed and encoded voice signals or data may be passed to KWS module 146 via audio router 145, described in further detail below. In some embodiments, the SIP user agents may be implemented as proxy agents, where the signalling and RTP media is only handled by the SIP user agent as it passes through the SIP user agent, enabling modification of the media. In other embodiments, the SIP user agents may be implemented as back-to-back user agents, which may handle signalling and RTP Media as an endpoint, enabling control of the phone call and allowing functions such as call conferencing. Application server 510 may receive MSISDN and GUID data from carrier services module 136 in order to allow for subscriber data to remain anonymous during interactions with digital assistant services. Application server 510 may communicate with carrier services module 136 via the Secure Hypertext Transfer Protocol (HTTPS).

Audio router 145 may be configured to route calls to digital assistant infrastructure 120a. According to some embodiments, audio router 145 may communication with KWS module 146, to allow application server 510 to determine whether a subscriber spoke a keyword during an established phone call. KWS module 146 may use voice recognition and analysis software to enable keyword spotting to be performed, and may generate a trigger if a predetermined keyword or phrase is detected. For example, the open source PocketSphinx software may be used by KWS module 146 for voice recognition and analysis. If a predetermined keyword or phrase is recognised, audio router 145 may be configured to forward just the voice command of the subscriber of the established call through digital assistant infrastructure 120 as audio data. Voice data or samples of voice data generated by the subscriber may be forwarded to digital assistant infrastructure 120 in response to the trigger generated by KWS module 146, and digital assistant infrastructure 120 may be configured to process the subscriber's command and generate a response. The voice data or samples may be taken beginning at a time when the keyword or phrase is detected, and continuing for a predetermined amount of time after the keyword or phrase was first spoken.

Audio router 145 may use phoneme-based routing and/or rules-based routing to route voice data. In phoneme based routing, the voice data is routed based on whether or not a particular keyword, phrase or phoneme was detected in the processed audio received from IMS core 142. According to some embodiments, the routing location may be dependent on the detection of a particular phoneme. For example, the detection of one invocation phrase or keyword may cause audio router 145 to forward the subscriber's voice data from the call to digital assistant infrastructure 120a, while the detection of a second, different invocation phrase or keyword may cause audio router 145 to forward the subscriber's voice data from the call to digital assistant infrastructure 120b. KWS module 146 of audio router 145 may use a phoneme sequence detector to detect phonemes, which uses existing acoustic and phoneme models to determine if the audio data contained a phoneme. In rules-based routing, a call is routed based on a routing rule. For example, the rule may be established within a subscriber's user preferences. According to some embodiments, the rule may be to route only specific calls through digital assistant infrastructure 120. For example, only calls made to numbers specified by the subscriber, such as calls to specific family members, might be routed. According to some embodiments, a combination of rules-based and phoneme-based routing may be used.

According to some embodiments, RTP engine 520 may be a media relay controlled by SIP-AS 144. RTP engine 520 may be configured to decode the audio from the voice of the opted-in subscriber via audio router 145, and stream this to digital assistant infrastructure 120a. When RTP engine 520 receives audio from digital assistant infrastructure 120a, RTP engine 520 may further be configured to re-encode each leg of an ongoing call with the mixed audio including any audio data produced by digital assistant infrastructure 120a. If RTP engine 520 does not support a particular codec, it may act as a simple media relay.

RTP engine 520 may be configured to monitor for control messages from the SIP-AS 144. According to some embodiments, RTP engine 520 may be configured to listen for control messages on a User Datagram Protocol (UDP) port, and may listen through port numbers 12200 to 12220. For each media stream, RTP engine 520 may be configured to open two pairs of UDP ports on the public interface, using port numbers in the range of 40000 to 60000. RTP engine 520 may open one pair of UDP ports on odd port numbers for the media data, and one pair of UDP ports on the next even port numbers for meta data (such as RTP Control Protocol (RTCP) in case of RTP streams, for example).

For each media stream, RTP engine 520 may be configured to establish a WebSocket connection with digital assistant infrastructure 120a via audio router 145, and to send a "call ID" within the first data packet send as JavaScript Object Notation (JSON) prior to sending the voice stream. The voice stream itself may be sent as pulse-code modulation (PCM) audio in chunks of 640 bytes.

According to some embodiments, to improve the quality of the call, the particular RTP engine 520 used to route the call may be selected by SIP-AS 144 to be geographically close to the customer. For example, the audio packets of a subscriber call originating in Western Australia would be handled by an RTP engine 520 located in Western Australia. In the case where there are no RTP engines 520 deployed near the location of the subscriber, or in the case that the status of the deployed RTP engine 520 is unavailable, the audio packets will not be handled by any RTP engine 520 to avoid voice call quality impact. Instead, the audio packets may be handled by an independent audio router 145.

SIP-AS 144 determines the geographical location of the subscriber device based on a "cell ID" received by SIP- AS 144 from a "REGISTER" message issued by iFC module 143. SIP- AS 144 uses the cell ID to resolve the geographical location of the subscriber, and therefore to select an appropriate RTP engine 520. SIP-AS 144 queries the availability status of the given RTP engine 520, and uses a session description protocol (SDP) to route the audio to the selected RTP engine 510. KWS module 146 may be used to match an invocation phrase or keyword spoken by a subscriber to a particular digital assistant infrastructure 120a or 120b, and thereafter route the voice data of the opted-in subscriber who has spoken the keyword or phrase to the destination digital assistant infrastructure 120a or 120b via audio router 145. Audio router 145 does not route the whole call to digital assistant infrastructure 120, but only a copy of the audio data generated at the subscriber's device forming a voice command.

According to some embodiments, the voice command may comprise any audio data generated by the subscriber for a predetermined time period after invoking the digital assistant by speaking the predetermined keyword or phrase. For example, the voice command may include any audio generated for 15, 30, 45 or 60 seconds after the keyword or phrase is spoken. Audio router 145 of application server 510 may be configured to transmit the voice data of the subscriber and the GUID corresponding to the subscriber to a speech module 125 of the digital assistant infrastructure 120a. Application server 510 may communicate with speech module 125 via a full duplex secure communication channel. According to some embodiments, no subscriber information is passed to speech module 125 aside from the GUID, to provide anonymity to the subscriber.

Audio mixer 147 may be configured to mix any audio response generated by digital assistant infrastructure 120a into the call for the subscriber to hear. Audio mixer 147 may be configured to combine multiple sounds into one or more channels. Audio mixer 147 may be configured to manipulate and/or enhance one or more of a source's volume level, frequency content, dynamics, and panoramic position. Audio mixer 147 may be configured to mix the sound generated by the digital assistant infrastructure 120a into one or more audio legs or channels of the call.

Figure 2 shows a method 200 of routing calls made by a customer of the telephone network of telephone infrastructure 110. At step 201, a customer makes a phone call using telephone device 195, telephone device 195 being the call originating device. At step 202, the call is connected to access network 190. At step 203, the call is routed to the IMS core 142 via a communications interface such as the Gm interface for processing. The Gm interface may be the communication interface used to exchange messages between user equipment (UE) or voice over IP (VoIP) gateway and Proxy Call Session Control Function (P-CSCF) equipment. It may use the SIP protocol for communication. IMS core 142 handles the call via the SIP and RTP.

At step 204, initial filter criteria (iFC) module 143 stored on a Serving Call Session Control Function (S-CSCF) is used to determine whether the call should be looped via the SIP-AS 144 and/or application server 520 for processing, based on whether or not the caller is a subscriber who has opted-in to the digital assistant services. According to some embodiments, initial filter criteria (iFC) storing subscriber information relating to an account associated with the telephone device 195 is retrieved by iFC module 143. The iFC may be obtained from a local subscriber database stored on a Serving Call Session Control Function (S-CSCF). If it is determined that the caller is not a subscriber of the digital assistant services based on application of the iFC rules by iFC module 143, then IMS core 142 may route the call to the telephone device 180 at step 205, telephone device 180 being the call receiving device. According to some alternative embodiments, the call may be routed to another device depending on the routing rules specified by iFC module 143. For example, according to some embodiments, IMS core 142 may be configured to route the call to a different application server instead of SIP-AS 144 and/or application server 520, based on routing rules stored within iFC module 143.

If it is determined that the caller is a subscriber of the digital assistant services based on iFC module 143, then telephone device 195 becomes a subscription accessing device, and audio data generated by telephone device 195 is forwarded through SIP-AS 144 and/or application server 520 at step 206. At step 207, IMS Core 142 then routes the call to the destination device 180. At step 208, once the call is connected, keyword spotting (KWS) module 146 is initiated to monitor the audio data generated by the subscriber through telephone device 195. If a keyword or phrase is detected, the audio data can be forwarded to the digital assistant infrastructure 120 as described below with respect to Figure 4.

According to some embodiments, a subscriber may be able to access the digital assistant services without making a call to another party such as a person or business. Instead, the subscriber may be able to use the digital assistant services by calling a specific digital assistant or service provider number, which may allow them to make commands and requests to the digital assistant as if they were in a call with another person. In this scenario, a call between a subscriber and a service provider device the call would be routed through SIP- AS 144 and/or application server 520 as in Figure 2, but step 207 would not be performed. According to some alternative embodiments, the user's call may be terminated by an application instead of telephone device 180. According to some alternative embodiments, users may be able to access interactive voice response (IVR) digital assistants via a SIP trunk.

Figure 3 shows a method 300 of routing calls to a customer of the telephone network of telephone infrastructure 110. At step 301, a person makes a phone call via telephone device 180, being the call originating device. At step 302, the call is routed to the IMS core 142 via the Gm interface. At step 303, initial filter criteria (iFC) module 143 is used to determine whether the call should be looped via the SIP-AS 144 and/or application server 520 for processing. Specifically, subscriber information relating to an account associated with the telephone device 180 is retrieved by iFC module 143. If it is determined that the caller is not a subscriber of the digital assistant services based on filter criteria applied by iFC module 143, then IMS Core 142 routes the call to the telephone device 195, being the call receiving device, at step 304. If it is determined that the receiver of the call is a subscriber of the digital assistant services based on filter criteria applied by iFC module 143, then telephone device 180 is considered to be a subscription accessing device, and audio data generated by telephone device 180 is forwarded through SIP- AS 144 and/or application server 520 at step 305. At step 306, IMS Core 142 routes the call to access network 190, which terminates the call at destination device 195 at step 307 through access network 190 via the Gm interface. Alternatively, IMS Core 142 may route the call directly to the destination device. At step 308, once the call is connected, keyword spotting (KWS) module 146 is initiated to monitor the audio data generated by the subscriber through telephone device 180. If a keyword or phrase is detected, the audio data can be forwarded to the digital assistant infrastructure 120 as described below with respect to Figure 4.

Figure 4 shows a method 400 of accessing digital assistant services during an established call. At step 401, KWS module 146 detects a keyword or phrase in the voice of a subscriber during the phone call, based on audio data generated by a subscription accessing device 180 or 195. Based on the particular keyword or phrase spoken, at step 402, the user's voice command or request in the form of audio or voice data is forwarded to the digital assistant infrastructure 120. In particular, the voice data is communicated to speech module 125. The voice data forwarded to digital assistant infrastructure 120 may be voice data recorded starting from when the keyword or phrase was detected, and continuing for a predetermined length of time, or until a pause in speech or silence of a predetermined length is detected. According to some embodiments, the predetermined length of time may be configurable by a subscriber and stored in subscriber preferences in digital assistant database 124.. At step 403, the voice data is converted to text by speech services 150, and passed to intent module 126, to identify at least one request in the audio data, as described above with reference to Figure 1. At step 404, the text is processed by natural language services 160 to determine the intent in the subscriber's request, as further described above with reference to Figure 1. At step 405, the determined intent is passed to dispatcher module 127, which communicates with API 123 to determine whether or not the request should be forwarded to a third party application. This may be determined by accessing digital assistant database 124 to determine whether or not the subscriber has opted in to the particular services. If the request is not to be forwarded to a third party application, dispatcher module 127 may generate an appropriate response at step 406. At step 407, the response may be passed on to speech module 125 and converted to voice by speech services 150, to generate response audio data. At step 408, the generated voice data is mixed back into the subscriber's call by audio mixer 147. According to some embodiments, the generated voice data is only mixed into the subscriber's call. Alternatively, voice data may be mixed both into the subscriber's call and the non-subscriber's call. If the request is to be forwarded to a third party application, developer application 170 processes the request at step 409. At step 410, developer application 170 generates an appropriate response in the form of text. At step 411, the text response is passed on to speech module 125 and converted to voice data by speech services 150. At step 412, the generated voice data is mixed back into the subscriber's call by audio mixer 147.

It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the above-described embodiments, without departing from the broad general scope of the present disclosure. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.