Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
EFFICIENT ENCODING FOR SPEECH RECOGNITION
Document Type and Number:
WIPO Patent Application WO/2020/081544
Kind Code:
A1
Abstract:
A system may comprise a machine configured to display a code sequence related to an action that can be taken by the machine, a voice terminal, a database containing mappings between code sequences and actions to be taken by machines, and a computing device communicatively coupled to the machine and the voice terminal. The computing device may be configured to receive, from the voice terminal, a spoken audio representation of the code sequence, parse, by way of a speech recognition module, the spoken audio representation to determine a variety of codes and sub-codes, validate that the codes and sub-codes comprise various redundancies within the codes and sub-codes, look up, in the mappings of the database, the code sequence to determine the action to be taken by the machine, and transmit, to the machine, instructions for the machine to take the action.

Inventors:
RAMATCHANDIRANE NADARADJANE (US)
UPADHYAY VANDANA (US)
Application Number:
PCT/US2019/056292
Publication Date:
April 23, 2020
Filing Date:
October 15, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CONFIA SYSTEMS INC (US)
International Classes:
G10L15/00
Foreign References:
US20180277123A12018-09-27
US20100268792A12010-10-21
US20180300981A12018-10-18
US5642519A1997-06-24
US20150012275A12015-01-08
US20090209345A12009-08-20
Attorney, Agent or Firm:
BORELLA, Michael S. (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A system comprising:

a machine configured to display a code sequence related to an action that can be taken by the machine;

a voice terminal;

a database containing mappings between code sequences and actions to be taken by machines; and

a computing device communicatively coupled to the machine and the voice terminal and configured to:

receive, from the voice terminal, a spoken audio representation of the code sequence;

parse, by way of a speech recognition module, the spoken audio representation to determine a first code, a second code, a third code, and a fourth code of the code sequence, wherein the first code comprises three sub-codes, and wherein the three sub-codes comprise any combination of alphanumeric or special characters;

validate that the second code comprises words that are a pronounceable expansion of a first sub-code of the three sub-codes, that the third code comprises words that are a pronounceable expansion of a second sub-code of the three sub codes, and that the fourth code comprises a numeric checksum of the first code;

look up, in the mappings of the database, at least one of the first code, the second code, or the third code to determine that the code sequence relates to the action; and

transmit, to the machine, instructions for the machine to take the action.

2. The system of claim 1, wherein the first sub-code comprises at least a first alphanumeric character and a second alphanumeric character, and wherein the second sub code comprises at least a third alphanumeric character and a fourth alphanumeric character.

3. The system of claim 2, wherein the second code comprises a first acrostic of the first alphanumeric character and the second alphanumeric character within the first sub code, and wherein the third code comprises an acrostic of the third alphanumeric character and the fourth alphanumeric character within the second sub-code.

4. The system of claim 2, wherein a third sub-code of the three sub-codes comprises one or more numeric characters.

5. The system of claim 4, wherein the one or more numeric characters comprise a hexadecimal representation of the first alphanumeric character, the second alphanumeric character, the third alphanumeric character, and the fourth alphanumeric character.

6. The system of claim 1, wherein the voice terminal comprises at least part of a mobile phone, and the computing device receives the spoken audio representation of the code sequence by way of a voice call from the mobile phone.

7. The system of claim 1, wherein the computing device is further configured to: receive, from the voice terminal, a location of the voice terminal;

determine that the location of the voice terminal is within a threshold distance of a location of the machine; and

wherein transmitting instructions for the machine to take the action is in response to the location of the voice terminal being within the threshold distance of the location of the machine.

8. The system of claim 1, wherein the machine is a vending machine, wherein the code sequence is digitally displayed on the machine, and wherein the action is the machine dispensing an item.

9. The system of claim 1, wherein the voice terminal is associated with a unique identifier, wherein the unique identifier is associated with pre-established credentials, and wherein the computing device is further configured to:

validate that the pre-established credentials permit the machine to take the action.

10. The system of claim 1, wherein the computing device is further configured to: validate, in the mappings of the database, that the second code is associated with the third code.

11. A computer-implemented method comprising: receiving, by a computing device and from a voice terminal, a spoken audio representation of a code sequence displayed on a machine, wherein the computing device has access to a database containing mappings between code sequences and actions to be taken by machines;

parsing, by way of a speech recognition module of the computing device, the spoken audio representation to determine a first code, a second code, a third code, and a fourth code of the code sequence, wherein the first code comprises three sub-codes, and wherein the three sub-codes comprise any combination of alphanumeric or special characters;

validating, by the computing device, that the second code comprises words that are a pronounceable expansion of a first sub-code of the three sub-codes, that the third code comprises words that are a pronounceable expansion of a second sub-code of the three sub codes, and that the fourth code comprises a numeric checksum of the first code;

looking up, by the computing device and in the mappings of the database, at least one of the first code, the second code, or the third code to determine that the code sequence relates to an action to be taken by the machine; and

transmitting, by the computing device and to the machine, instructions for the machine to take the action.

12. The method of claim 11, wherein validating that the second code comprises words that are the pronounceable expansion of a first sub-code of the three sub-codes further comprises:

validating that the first sub-code comprises at least a first alphanumeric character and a second alphanumeric character.

13. The method of claim 12, wherein validating that the first sub-code comprises the first alphanumeric character and the second alphanumeric character further comprises: validating that the second code comprises a first acrostic of the first alphanumeric character and the second alphanumeric character within the first sub-code.

14. The method of claim 11, wherein validating that the third code comprises words that are the pronounceable expansion of the second sub-code of the three sub-codes further comprises:

validating that the second sub-code comprises at least a third alphanumeric character and a fourth alphanumeric character.

15. The method of claim 14, wherein validating that the second sub-code comprises the third alphanumeric character and the fourth alphanumeric character further comprises:

validating that the third code comprises an acrostic of the third alphanumeric character and the fourth alphanumeric character within the second sub-code.

16. The method of claim 11, wherein validating that the fourth code comprises the numeric checksum of the first code further comprises:

validating that the numeric checksum comprises a hexadecimal representation of the first sub-code and the second sub-code.

17. The method of claim 11, wherein the voice terminal comprises at least part of a mobile phone, and the computing device receives the spoken audio representation of the code sequence by way of a voice call from the mobile phone.

18. The method of claim 11, wherein receiving, from the voice terminal, the spoken audio representation of the code sequence further comprises:

receiving, from the voice terminal, a location of the voice terminal; and

determining that the location of the voice terminal is within a threshold distance of a location of the machine.

19. A non-transitory computer-readable medium, having stored thereon program instructions that, upon execution by a computing device, cause the computing device to perform operations comprising:

receiving, from a voice terminal, a spoken audio representation of a code sequence displayed on a machine, wherein the computing device has access to a database containing mappings between code sequences and actions to be taken by machines;

parsing, by way of a speech recognition module of the computing device, the spoken audio representation to determine a first code, a second code, a third code, and a fourth code of the code sequence, wherein the first code comprises three sub-codes, and wherein the three sub-codes comprise any combination of alphanumeric or special characters;

validating that the second code comprises words that are a pronounceable expansion of a first sub-code of the three sub-codes, that the third code comprises words that are a pronounceable expansion of a second sub-code of the three sub-codes, and that the fourth code comprises a numeric checksum of the first code;

looking up, in the mappings of the database, at least one of the first code, the second code, or the third code to determine that the code sequence relates to an action to be taken by the machine; and

transmitting, to the machine, instructions for the machine to take the action.

20. The non-transitory computer-readable medium of claim 19, wherein validating that the second code comprises words that are the pronounceable expansion of a first sub code of the three sub-codes further comprises:

validating that the first sub-code comprises at least a first alphanumeric character and a second alphanumeric character.

21. The non-transitory computer-readable medium of claim 20, wherein validating that the first sub-code comprises the first alphanumeric character and the second alphanumeric character further comprises:

validating that the second code comprises a first acrostic of the first alphanumeric character and the second alphanumeric character within the first sub-code.

22. The non-transitory computer-readable medium of claim 19, wherein validating that the third code comprises words that are the pronounceable expansion of the second sub code of the three sub-codes further comprises:

validating that the second sub-code comprises at least a third alphanumeric character and a fourth alphanumeric character.

23. The non-transitory computer-readable medium of claim 22, wherein validating that the second sub-code comprises the third alphanumeric character and the fourth alphanumeric character further comprises:

validating that the third code comprises an acrostic of the third alphanumeric character and the fourth alphanumeric character within the second sub-code.

24. The non-transitory computer-readable medium of claim 19, wherein validating that the fourth code comprises the numeric checksum of the first code further comprises: validating that the numeric checksum comprises a hexadecimal representation of the first sub-code and the second sub-code.

25. The non-transitory computer-readable medium of claim 19, wherein the voice terminal comprises at least part of a mobile phone, and the computing device receives the spoken audio representation of the code sequence by way of a voice call from the mobile phone.

26. The non-transitory computer-readable medium of claim 19, wherein receiving, from the voice terminal, the spoken audio representation of the code sequence further comprises:

receiving, from the voice terminal, a location of the voice terminal; and

determining that the location of the voice terminal is within a threshold distance of a location of the machine.

Description:
EFFICIENT ENCODING FOR SPEECH RECOGNITION

CROSS-REFERENCE TO RELATED APPLICATION

[1] This application claims priority to U.S. provisional patent application no. 62/747,978, filed October 19, 2018, which is hereby incorporated by reference in its entirety.

BACKGROUND

[2] Recent years have seen explosive growth of network-connected devices. Once limited to just personal computers and servers, now network connectivity is supported by doors and door locks, digital signage, network-enabled kiosks (e.g., parking meters, vending machines), and so on. Many of these devices interact with a user by way of the user’s client device (e.g., a smartphone) so that the devices can perform tasks on behalf of the user. Given that billions of low-cost network-connected devices are expected to be deployed in the future, it is desirable to be able to provide simple and efficient mechanisms with which a client device can interact with these devices.

SUMMARY

[3] The embodiments described herein provide an improved technique for efficiently and accurately capturing voice command data, particularly when related to machine transactions.

[4] Machines, including so-called“Internet of Things” (IOT) machines, that can be used in transactions described herein can include any device capable of presenting, providing and/or dispensing information, products, or services in association with a user transaction. A machine may display information related to a product or service, and/or the machine may provide the product or service. Thus, machines may be user-facing devices that are the initiating point for users to complete transactions. These machines can be connected using a network, e.g., that includes the Internet and can also be connected to other networks, e.g., wide area networks (WANs), local area networks (LANs), and/or other types of networks. For example, a vending machine might not be connected directly to the Internet, but it may be connected to a control station using a LAN, and/or to a network of vending machines connected with a WAN.

[5] A variety of client devices can be used to convey, send, and/or communicate a request during a transaction. A client device can be, for example, a personal computer, a mobile phone, a smartphone, a tablet, a wearable device, a smart speaker, or a smart watch. Client devices may be directly operated by the user and can communicate with the Internet via a connection, e.g., cellular wireless, Wifi, BLUETOOTH®, near-field communication (NFC), radio-frequency identification (RFID), or some other proximity networks. While client devices used in transactions are typically mobile devices, cellular machine-to-machine (M2M) systems in a connected automobile (car, truck, bus) and non-mobile devices can also be used.

[6] Users can include consumers, technicians, attendants, students or anyone or anything else capable of initiating a transaction involving products or services that involves IOT machines.

[7] Transactions can include user requests, covering a variety of user engagement situations such as information requests, action requests, access requests, registration requests, and other types of transactions. These transactions can include, for example, automated retail machines such as vending machines, machines that deliver tickets for movies or transit, parking meters, cafeteria machines, or connected vehicles. Other automated retail machines may include unattended self-service devices such as kiosks, electric vehicle (EV) charging stations, vehicle rental stations or devices (e.g., bike share or car share programs) automated beverage fountains (e.g., machines allowing“mixing” of product to fit users taste at the place of consumption), machines that deliver physical goods or digital goods. Transactions can also include information requests, option selections, pre-paid use, voting, surveys, reservations, registrations, authorization, access and control or monitoring operations (e.g. opening a gate or accessing a physical door).

[8] A transaction with a registered vending machine, for example, can be initiated when the user provides voice information corresponding to a code sequence, (e.g., an alphanumeric code) representative of an action for the machine to take. For example, a person or user may want to purchase an item from a vending machine. The vending machine may display a variety of code sequences which correspond to various actions to be taken by the vending machine. By way of example, the code“AB/NJ/72, Alpha Bravo, November Juliet, 72” may correspond to dispensing a candy bar. The vending machine may also display a telephone number for users to call.

[9] To initiate the transaction, the user can call the phone number listed on the vending machine and verbally recite the code sequence. The phone number may be associated with a remote network. The remote network may include voice recognition technology and mappings between the code sequences and the actions. The remote network may then instruct the machine to carry out the corresponding action. For example, the remote network may instruct the machine to dispense the candy bar. In another example, the remote network may instruct the machine to open a gate or provide access to a door.

[10] Particular implementations may realize the following advantage. The code sequences comprise various redundancies such that parsing the spoken audio representation of the code sequence can be performed quickly and accurately. Specifically, the code sequences are self-verifying which eliminates the need for more sophisticated voice recognition technology and allows for much simpler and faster voice recognition services.

[11] Accordingly, an example embodiment may involve a system comprising a machine, a voice terminal, a database containing mappings between code sequences and actions to be taken by machines, and a computing device. In this example, the computing device may be configured to receive, from the voice terminal, a spoken audio representation of a code sequence. The voice terminal may, for example, include a microphone on a client device (e.g. a smartphone or a smart watch).

[12] Additionally, the example embodiment may include parsing, by way of a speech recognition module, the spoken audio representation of the code sequence to determine a first code, a second code, a third code, and a fourth code of the sequence of codes, where the first code may comprise three sub-codes. The three sub-codes may include any combination of alphanumeric or special characters (e.g., ! , @ , # , / , $ , etc.). Each code within the sequence of codes may, for example, be a related to one or more of the other codes within the sequence, such that there is redundancy among the codes. Many other variations of code sequences are possible, including any number of codes and sub-codes.

[13] Furthermore, the example embodiment may include validating that the second code comprises words that are a pronounceable expansion of a first sub-code of the three sub codes and that the third code comprises words that are a pronounceable expansion of a second sub-code of the three sub-codes. Further, this embodiment may include validating that the fourth code comprises a numeric checksum of the first code or an alternate numeric representation of the code sequence.

[14] Moreover, the example embodiment may include, looking up, in the mappings of the database, at least one of the first code, the second code, or the third code to determine an action to be taken by the machine. The database may then transmit, to the machine, instructions for the machine to take the action. The action may, for example, include dispensing a selected product.

[15] A second example embodiment, may involve a method of receiving a spoken audio representation of a code sequence displayed on a machine, parsing the spoken audio representation to determine a number of codes and sub-codes, validating that the codes and sub-codes include a variation of the other codes within the sequence, looking up the code sequence in a database to determine an action to be taken by the machine, and transmitting instructions for the machine to take the action.

[16] In a third example embodiment, a non-transitory computer-readable medium on a computing device, may cause the computing device to perform operations comprising transmitting a spoken audio representation of a code sequence. The code sequence may be associated with an instruction for a machine. Additionally, the code sequence may comprise a number of codes and sub-codes.

[17] These as well as other embodiments, aspects, advantages, and alternatives will become apparent to those of ordinary skill in the art by reading the following detailed description, with reference where appropriate to the accompanying drawings. Further, it should be understood that this summary and other descriptions and figures provided herein are intended to illustrate embodiments by way of example only and, as such, that numerous variations are possible. For instance, structural elements and process steps can be rearranged, combined, distributed, eliminated, or otherwise changed, while remaining within the scope of the embodiments as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

[18] Figure 1 illustrates a schematic drawing of a computing device, in accordance with example embodiments.

[19] Figure 2 illustrates a schematic drawing of a server device cluster, in accordance with example embodiments.

[20] Figure 3A illustrates a block diagram of an environment for machine transactions using a voice recognition database, in accordance with example embodiments.

[21] Figure 3B illustrates a schematic of an environment for machine transactions using a voice recognition database, in accordance with example embodiments.

[22] Figure 4 is a composition diagram of a code sequence, in accordance with example embodiments.

[23] Figure 5 illustrates a flow chart, in accordance with example embodiments.

DETAILED DESCRIPTION

[24] Example methods, devices, and systems are described herein. It should be understood that the words“example” and“exemplary” are used herein to mean“serving as an example, instance, or illustration.” Any embodiment or feature described herein as being an “example” or“exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or features unless stated as such. Thus, other embodiments can be utilized and other changes can be made without departing from the scope of the subject matter presented herein.

[25] Accordingly, the example embodiments described herein are not meant to be limiting. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations. For example, the separation of features into“client” and“server” components may occur in a number of ways.

[26] Further, unless context suggests otherwise, the features illustrated in each of the figures may be used in combination with one another. Thus, the figures should be generally viewed as component aspects of one or more overall embodiments, with the understanding that not all illustrated features are necessary for each embodiment.

[27] Additionally, any enumeration of elements, blocks, or steps in this specification or the claims is for purposes of clarity. Thus, such enumeration should not be interpreted to require or imply that these elements, blocks, or steps adhere to a particular arrangement or are carried out in a particular order.

I. Example Computing Devices and Cloud-Based Computing Environments

[28] The following embodiments describe architectural and operational aspects of example computing devices and systems that may employ the disclosed implementations, as well as the features and advantages thereof.

[29] Figure 1 is a simplified block diagram exemplifying a computing device 100, illustrating some of the components that could be included in a computing device arranged to operate in accordance with the embodiments herein. Computing device 100 could be a client device (e.g., a device actively operated by a user), a server device (e.g., a device that provides computational services to client devices), or some other type of computational platform. Some server devices may operate as client devices from time to time in order to perform particular operations, and some client devices may incorporate server features.

[30] In this example, computing device 100 includes processor 102, memory 104, network interface 106, and an input / output unit 108, all of which may be coupled by a system bus 110 or a similar mechanism. In some embodiments, computing device 100 may include other components and/or peripheral devices (e.g., detachable storage, printers, and so on). [31] Processor 102 may be one or more of any type of computer processing element, such as a central processing unit (CPU), a co-processor (e.g., a mathematics, graphics, or encryption co-processor), a digital signal processor (DSP), a network processor, and/or a form of integrated circuit or controller that performs processor operations. In some cases, processor 102 may be one or more single-core processors. In other cases, processor 102 may be one or more multi-core processors with multiple independent processing units. Processor 102 may also include register memory for temporarily storing instructions being executed and related data, as well as cache memory for temporarily storing recently-used instructions and data.

[32] Memory 104 may be any form of computer-usable memory, including but not limited to random access memory (RAM), read-only memory (ROM), and non-volatile memory. This may include flash memory, hard disk drives, solid state drives, re-writable compact discs (CDs), re-writable digital video discs (DVDs), and/or tape storage, as just a few examples. Computing device 100 may include fixed memory as well as one or more removable memory units, the latter including but not limited to various types of secure digital (SD) cards. Thus, memory 104 represents both main memory units, as well as long-term storage. Other types of memory may include biological memory.

[33] Memory 104 may store program instructions and/or data on which program instructions may operate. By way of example, memory 104 may store these program instructions on a non-transitory, computer-readable medium, such that the instructions are executable by processor 102 to carry out any of the methods, processes, or operations disclosed in this specification or the accompanying drawings.

[34] As shown in Figure 1, memory 104 may include firmware 104A, kernel 104B, and/or applications 104C. Firmware 104A may be program code used to boot or otherwise initiate some or all of computing device 100. Kernel 104B may be an operating system, including modules for memory management, scheduling and management of processes, input / output, and communication. Kernel 104B may also include device drivers that allow the operating system to communicate with the hardware modules (e.g., memory units, networking interfaces, ports, and busses), of computing device 100. Applications 104C may be one or more user-space software programs, such as web browsers or email clients, as well as any software libraries used by these programs. Memory 104 may also store data used by these and other programs and applications.

[35] Network interface 106 may take the form of one or more wireline interfaces, such as Ethernet (e.g., Fast Ethernet, Gigabit Ethernet, and so on). Network interface 106 may also support communication over one or more non-Ethernet media, such as coaxial cables or power lines, or over wide-area media, such as Synchronous Optical Networking (SONET) or digital subscriber line (DSL) technologies. Network interface 106 may additionally take the form of one or more wireless interfaces, such as IEEE 802.11 (Wifi), BLUETOOTH®, global positioning system (GPS), or a wide-area wireless interface. However, other forms of physical layer interfaces and other types of standard or proprietary communication protocols may be used over network interface 106. Furthermore, network interface 106 may comprise multiple physical interfaces. For instance, some embodiments of computing device 100 may include Ethernet, BLUETOOTH®, and Wifi interfaces.

[36] Input / output unit 108 may facilitate user and peripheral device interaction with example computing device 100. Input / output unit 108 may include one or more types of input devices, such as a keyboard, a mouse, a touch screen, and so on. Similarly, input / output unit 108 may include one or more types of output devices, such as a screen, monitor, printer, and/or one or more light emitting diodes (LEDs). Additionally or alternatively, computing device 100 may communicate with other devices using a universal serial bus (USB) or high-definition multimedia interface (HDMI) port interface, for example.

[37] In some embodiments, one or more instances of computing device 100 may be deployed to support a clustered architecture. The exact physical location, connectivity, and configuration of these computing devices may be unknown and/or unimportant to client devices. Accordingly, the computing devices may be referred to as“cloud-based” devices that may be housed at various remote data center locations.

[38] Figure 2 depicts a cloud-based server cluster 200 in accordance with example embodiments. In Figure 2, operations of a computing device (e.g., computing device 100) may be distributed between server devices 202, data storage 204, and routers 206, all of which may be connected by local cluster network 208. The number of server devices 202, data storages 204, and routers 206 in server cluster 200 may depend on the computing task(s) and/or applications assigned to server cluster 200.

[39] For example, server devices 202 can be configured to perform various computing tasks of computing device 100. Thus, computing tasks can be distributed among one or more of server devices 202. To the extent that these computing tasks can be performed in parallel, such a distribution of tasks may reduce the total time to complete these tasks and return a result. For purpose of simplicity, both server cluster 200 and individual server devices 202 may be referred to as a“server device.” This nomenclature should be understood to imply that one or more distinct server devices, data storage devices, and cluster routers may be involved in server device operations.

[40] Data storage 204 may be data storage arrays that include drive array controllers configured to manage read and write access to groups of hard disk drives and/or solid state drives. The drive array controllers, alone or in conjunction with server devices 202, may also be configured to manage backup or redundant copies of the data stored in data storage 204 to protect against drive failures or other types of failures that prevent one or more of server devices 202 from accessing units of cluster data storage 204. Other types of memory aside from drives may be used.

[41] Routers 206 may include networking equipment configured to provide internal and external communications for server cluster 200. For example, routers 206 may include one or more packet-switching and/or routing devices (including switches and/or gateways) configured to provide (i) network communications between server devices 202 and data storage 204 via cluster network 208, and/or (ii) network communications between the server cluster 200 and other devices via communication link 210 to network 212.

[42] Additionally, the configuration of cluster routers 206 can be based at least in part on the data communication requirements of server devices 202 and data storage 204, the latency and throughput of the local cluster network 208, the latency, throughput, and cost of communication link 210, and/or other factors that may contribute to the cost, speed, fault- tolerance, resiliency, efficiency and/or other design goals of the system architecture.

[43] As a possible example, data storage 204 may include any form of database, such as a structured query language (SQL) database. Various types of data structures may store the information in such a database, including but not limited to tables, arrays, lists, trees, and tuples. Furthermore, any databases in data storage 204 may be monolithic or distributed across multiple physical devices.

[44] Server devices 202 may be configured to transmit data to and receive data from cluster data storage 204. This transmission and retrieval may take the form of SQL queries or other types of database queries, and the output of such queries, respectively. Additional text, images, video, and/or audio may be included as well. Furthermore, server devices 202 may organize the received data into web page representations. Such a representation may take the form of a markup language, such as the hypertext markup language (HTML), the extensible markup language (XML), or some other standardized or proprietary format. Moreover, server devices 202 may have the capability of executing various types of computerized scripting languages, such as but not limited to Python, PHP Hypertext Preprocessor (PHP), Active Server Pages (ASP), JavaScript, and/or other languages such as C++, C#, or Java. Computer program code written in these languages may facilitate the providing of web pages to client devices, as well as client device interaction with the web pages.

II. Code Sequence Voice Recognition Process

[45] Figure 3A illustrates a block diagram of an example environment 300 for machine transactions using voice recognition. For example, a user 310 using a client device 304 (e.g. a smartphone), which includes a voice terminal 312, may be near a machine 302. The machine 302 may be configured to display one or more series of code sequences 308. In one example, the code sequences 308 may be displayed by way of a digital screen on the machine 302. In another example, the code sequences 308 may be displayed physically by way of written text. In yet another example, the machine 302 may be configured to audibly recite the code sequences 308 to accommodate users who are visually impaired (e.g. a spoken representation via a machine generated voice). Similarly, the machine 302 may provide the code sequences 308 in braille to accommodate users who are visually impaired. Other examples are possible.

[46] These code sequences 308 may correspond to actions 314 to be taken by the machine 302, such as dispensing an item stored in the machine. The user 310 may initiate a transaction with the machine 302 by verbally reciting a selected code sequence into the voice client device 304, thus creating a spoken audio representation of the selected code sequence. The client device 304 may transmit the spoken audio representation by way of a network 212 which is configured to communicate with a server cluster 200. A database 204 is contained within the data storage 204 and may contain mappings between code sequences 308 and actions 314 to be taken by the machine 302. Software disposed upon the server cluster 200 may parse the spoken audio representation of the code sequence by way of a speech recognition module to determine the code sequence. The code sequence may then be matched, by way of the mappings in the database 204, to an action to be taken by the machine 302. The server cluster 200 may then transmit instructions to the machine 302 to take the corresponding action.

[47] In an example embodiment, the machine 302 may display a phone number 316. The user 310 may call the phone number 316 via the client device 304 and verbally recite a code sequence. Additionally, by calling the phone number 316, the client device 304 may access the network 212. In a different embodiment, the client device 304 may access the network 212 by way of an application. In this example, the user 310 may verbally recite the code sequence while using the application on the client device 304. Additionally, the machine 302 may display, for example, a barcode or a Quick Response (QR) code. The user 310 may scan the barcode or QR on the client device 304 which may then open the application on the client device 304.

[48] The client device 304 may obtain an estimate of its location by way of Global Positioning System (GPS) or other mechanisms. In an example embodiment, the client device 304 may provide this location estimate to the server cluster 200 or the server cluster 200 may obtain this information by other means. The server cluster 200 may then validate that the location of client device 304 is within a threshold distance of the location of the machine 302 or a machine 302 within a network of machines. In one example, the threshold distance may be 3 meters. In another example, the threshold distance may be 5 meters. Many other examples are possible. The network 212 may then transmit instructions for the machine 302 to take the action in response to determining that the location of the client device 304 is within the threshold distance of the location of the machine 302.

[49] In an example embodiment, the client device 304 may be associated with a unique identifier. The unique identifier may be associated with pre-established credentials and/or payment information. The computing device 100 may validate that the pre-established credentials permit the machine 302 to take the action.

[50] In an example, the user may establish a unique identifier by way of creating a user account identifying information associated with the user, which may include credit card or payment information, among other information. As such, every time the user seeks to purchase an item from the vending machine(s), the transaction can be authorized readily through the user’s account. Further, the machine 302 or the network 212 may readily recognize the user 310 or the client device 304 upon initiation of the transaction.

[51] As noted above, the client device 304 may for example be a smartphone. In an example embodiment the voice terminal 312 may be a microphone on the smartphone. In a different embodiment, the client device 304 may be a smart watch. In this example, the voice terminal 312 may be a microphone on the smart watch. Other examples are possible.

[52] As noted above, the machine 302 may be within a network of machines. The machine 302 may be, for example, a vending machine which dispenses any number of a variety of products (e.g., candy, soda, snacks, toys, etc.). In another embodiment, the machine 302 may dispense tickets to, for example, a movie theater, concert, or sporting event. In yet another example, the machine may function as a parking meter. In this embodiment, the code sequences may correspond to an amount of time and/or location for the user’s vehicle to be parked. In another example embodiment, the machine may play back music or media content. In this example, the code sequences may correspond to songs or other media to be played by the machine. Many other examples are possible.

[53] Figure 3B illustrates another schematic of an example environment 320 for machine transactions. The machine 302 and the client device 304 communicate by way of the server cluster 200. The machine 302 may display code sequences 308 associated, for example, with actions 314 to be taken by the machine 302. Different code sequences 308 may be provided for each unique type of item or service. In an example, the action 314 may be dispensing a selected candy bar from a vending machine. A user 310 may select a desired code from the code sequences 308 and verbally recite an audio representation of a code sequence into the voice terminal 312 (e.g. microphone on a smartphone) on the client device 304.

[54] Figure 4 is a composition diagram of a code sequence 400, in accordance with an example embodiment. Each code sequence of the plurality of code sequences 308 may comprise redundancies such that the voice recognition module may parse the spoken audio representation of the selected code sequence quickly and accurately. The code sequences 308 may, for example, comprise a variety of codes and sub-codes which may include similar identifiers.

[55] An example code sequence 400 may include numerous individual codes in a predetermined order. The example illustrated in Figure 4 provides four individual codes 402- 408, however code sequences may comprise any number of individual codes. The code sequence 400 may be developed with redundancies, as described below. The redundancies within the code sequence 400 allows for faster and more accurate identification by way of a speech recognition module.

[56] Code 402 may, for example, include a number of sub-codes 410-414. The sub-codes 410-414 may take the form of alphanumeric characters, special characters, or a combination thereof. In an example embodiment with three sub-codes, sub-code 410 and sub-code 412 may be alphabetic letters. By way of example, sub-code 410 may be“AB” and sub-code 412 may be“NJ”. Sub-code 414 may take the form of alphanumerical characters. The alphanumerical characters may be a numeric checksum of the alphabetic letters in sub code 410 and sub-code 412. For example, sub-code 414 may be a hexadecimal representation of a hash function computed over sub-code 410 and sub-code 412.

[57] Code 404 may, for example, include a pronounceable expansion of sub-code 410. The expansion may be an acrostic, in which the first letters of each word of code 404 corresponds to a letter of sub-code 410 in sequential order. By way of example, if sub-code 410 is“AB”, code 404 may be“Alpha Bravo”, as shown in Figure 4. In another example, code 404 may be“Alice Brown”. Many other examples are possible.

[58] Code 406 may include a pronounceable expansion of sub-code 412. The expansion may also be an acrostic, in which the first letters of each word of code 406 corresponds to a letter of sub-code 412 in sequential order. By way of example, if sub-code 412 is“NJ”, code 404 may be“November Juliet”, as shown in Figure 4. In another example, code 406 may be“New Jersey”. Many other examples are possible.

[59] Code 408 may, for example, comprise numbers in a decimal or hexadecimal base. Furthermore, the code 408 may be the spoken version of said numbers. The numbers in code 408 may be the same as sub-code 414. Alternatively, in a different embodiment, the numbers in code 408 may be a different representation of sub-code 414. In yet another example, code 408 may be a representation of the sub-codes 410-414 or the other codes 404- 406. Many variations are possible.

[60] In an example embodiment, the voice recognition module may parse the spoken audio representation of the code sequence 400 to determine code 402, code 404, code 406, and code 408. Further, the voice recognition module may determine that code 402 includes various sub-codes 410-414. The speech recognition module may then validate that code 404 includes words that are the pronounceable expansion of a sub-code 410 and that code 406 includes words that are the pronounceable expansion of sub-code 412. Additionally, the speech recognition module may validate code 408 is equivalent to sub-code 414. In a different example embodiment, the speech recognition module may validate that code 408 is an alternate representation of the code sequence 400.

[61] In another example embodiment, the voice recognition module may additionally validate that code 404 and code 406 are associated with each other in the mappings within the database 204. In a different example, the voice recognition module may validate that codes 404-408 are associated with code 402 in the mappings within the database 204. In yet another example, the code sequence 400 may include five or more individual codes. In a different example, code 402 may include four or more sub-codes. Many other variations of code sequences 400 and validation processes are possible.

[62] In an example where the machine 302 or the network 212 recognizes the user 310 or client device 304, the voice recognition module may only need to parse code 402 and code 404 to validate that they are associated with each other in the database 204. For example, if the user 310 frequently purchases the same item from a vending machine and code 402 and code 404 are associated with the code sequence corresponding to the action of dispensing this item, the network 212 may then transmit instructions for the machine 302 to dispense the item.

[63] In a different example, the quality of the audio transmission may be poor. The network 212 may prompt the user 310 to repeat the code sequence. Alternatively, the network 212 may prompt the user 310 to recite a word used for measuring the sound quality of the transmission. In a similar example, a word may be included in the code sequence for purposes of sound quality measurement.

[64] Because the code sequences comprise numerous redundancies, parsing the spoken audio representation of the code sequence can be performed quickly and accurately. Namely, the code sequences are self-verifying which detects and corrects audio transmission issues, such as perturbing noise and mistaken pronunciation. Furthermore, the self verification reduces need for repetition of the vocal command.

III. Example Operations

[65] Figure 5 illustrates a flow chart, in accordance with an example embodiment. The process depicted in Figure 5 could be carried out by a computing system, such as a computing device 100 or server cluster 200.

[66] Block 500 may involve receiving a spoken audio representation of the code sequence 400 from the voice terminal 312.

[67] In an example, the voice terminal comprises at least part of a mobile phone, and the computing device receives the spoken audio representation of the code sequence by way of a voice call from the mobile phone. Further the computing device may receive from the client device, a location of the client device, determine that the location of the client device is within a threshold distance of a location of the machine, and transmit instructions for the machine to take the action in response to the location of the voice terminal being within the threshold distance of the location of the machine.

[68] In another example, the voice terminal is associated with a unique identifier, wherein the unique identifier is associated with pre-established credentials, and wherein the computing device is further configured to validate that the pre-established credentials permit the machine to take the action.

[69] Block 502 may involve parsing, by way of a speech recognition module, the spoken audio representation to determine code 402, code 404, code 406, and code 408, and three sub-codes 410-414 within code 402. [70] In an example embodiment sub-code 410 comprises at least a first alphanumeric character and a second alphanumeric character and sub-code 412 comprises at least a third alphanumeric character and a fourth alphanumeric character. In this embodiment, sub-code 414 comprises two numeric characters.

[71] Block 504 may involve validating that code 404 comprises words that are a pronounceable expansion of sub-code 410, validating that code 406 comprises words that are a pronounceable expansion of sub-code 412, and validating that code 408 is a numeric checksum of code 410.

[72] In an example embodiment, code 404 comprises a first acrostic of the first alphanumeric character and the second alphanumeric character within sub-code 410 and code 406 comprises an acrostic of the third alphanumeric character and the fourth alphanumeric character within sub-code 412.

[73] In another example embodiment, the two numeric characters of sub-code 414 comprise a hexadecimal representation of the first alphanumeric character, the second alphanumeric character, the third alphanumeric character, and the fourth alphanumeric character.

[74] Block 506 may involve looking up, in the mappings of the database, at least one of code 402, code 404, code 406, or code 408 to determine that the code sequence relates to the action. In an example embodiment, this may involve validating, in the mappings of the database, that code 404 is associated with the code 406.

[75] Block 508 may involve transmitting, to the machine 302 instructions for the machine to take the action. In an example embodiment, the machine is a vending machine, wherein the code sequence is digitally displayed on the machine, and wherein the action is the machine dispensing an item.

IV. Conclusion

[76] The present disclosure is not to be limited in terms of the particular embodiments described in this application, which are intended as illustrations of various aspects. Many modifications and variations can be made without departing from its scope, as will be apparent to those skilled in the art. Functionally equivalent methods and apparatuses within the scope of the disclosure, in addition to those described herein, will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the appended claims.

[77] The above detailed description describes various features and operations of the disclosed systems, devices, and methods with reference to the accompanying figures. The example embodiments described herein and in the figures are not meant to be limiting. Other embodiments can be utilized, and other changes can be made, without departing from the scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations.

[78] With respect to any or all of the message flow diagrams, scenarios, and flow charts in the figures and as discussed herein, each step, block, and/or communication can represent a processing of information and/or a transmission of information in accordance with example embodiments. Alternative embodiments are included within the scope of these example embodiments. In these alternative embodiments, for example, operations described as steps, blocks, transmissions, communications, requests, responses, and/or messages can be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved. Further, more or fewer blocks and/or operations can be used with any of the message flow diagrams, scenarios, and flow charts discussed herein, and these message flow diagrams, scenarios, and flow charts can be combined with one another, in part or in whole.

[79] A step or block that represents a processing of information can correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique. Alternatively or additionally, a step or block that represents a processing of information can correspond to a module, a segment, or a portion of program code (including related data). The program code can include one or more instructions executable by a processor for implementing specific logical operations or actions in the method or technique. The program code and/or related data can be stored on any type of computer readable medium such as a storage device including RAM, a disk drive, a solid state drive, or another storage medium.

[80] The computer readable medium can also include non-transitory computer readable media such as computer readable media that store data for short periods of time like register memory and processor cache. The computer readable media can further include non- transitory computer readable media that store program code and/or data for longer periods of time. Thus, the computer readable media may include secondary or persistent long term storage, like ROM, optical or magnetic disks, solid state drives, compact-disc read only memory (CD-ROM), for example. The computer readable media can also be any other volatile or non-volatile storage systems. A computer readable medium can be considered a computer readable storage medium, for example, or a tangible storage device.

[81] Moreover, a step or block that represents one or more information transmissions can correspond to information transmissions between software and/or hardware modules in the same physical device. However, other information transmissions can be between software modules and/or hardware modules in different physical devices.

[82] The particular arrangements shown in the figures should not be viewed as limiting. It should be understood that other embodiments can include more or less of each element shown in a given figure. Further, some of the illustrated elements can be combined or omitted. Yet further, an example embodiment can include elements that are not illustrated in the figures.

[83] While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purpose of illustration and are not intended to be limiting, with the true scope being indicated by the following claims.