Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD OF SELECTING THE OPTIMAL VOICEPRINT
Document Type and Number:
WIPO Patent Application WO/2024/049311
Kind Code:
A1
Abstract:
The method of selecting the optimal voiceprint is characterized by the fact that using known methods, the voice data of the person communicating the data is identified in real time using a telephone or computer, preferably telephone telecommunications or a connection via Internet technologies of cloud data processing, in such a way that the communicating person connects via via remote means of communication via telephone or computer with a customer service system, the call is automatically directed to the Interactive Voice Response system where the communicator speaks a text of a minimum length, preferably one word, and through semantic information, the customer is automatically verified in the system with the one previously assigned to him in the database data of the customer's ID number, simultaneously during customer verification and during the operation of the IVR system, communication is provided by the stable field detection method in a noise-SNR pair, and after such verification of the customer and determining his ID in the system, the voiceprint database is automatically searched in the database voiceprints assigned to a given client, and when the voiceprint is consistent with the identified client voiceprint, the conversation with the client is processed according to voiceprints matched from the client's database, and when the voiceprint is not consistent with the identified client's voiceprint, communication with the client is processed according to tlie second, database of standard voiceprints or occurs the process of creating new voiceprints in this client's database.

Inventors:
MICHOCKI PAWEŁ (PL)
TYMECKI ANDRZEJ (PL)
Application Number:
PCT/PL2022/050053
Publication Date:
March 07, 2024
Filing Date:
August 30, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BIOMETRIQ SP Z O O (PL)
International Classes:
G10L15/22; G10L15/20; G10L17/22
Domestic Patent References:
WO2019136801A12019-07-18
Foreign References:
US20200312337A12020-10-01
US20180144742A12018-05-24
Other References:
NIKONOWICZ JAKUB: "METODA DETEKCJI STABILNEGO POLA JAKO ROZWINIĘCIE METODY DETEKCJI ENERGII NIEZNANYCH SYGNAŁÓW", DISSERTATION, 1 January 2018 (2018-01-01), pages 1 - 104, XP093146996
Attorney, Agent or Firm:
FILIPEK-MARZEC, Magdalena (PL)
Download PDF:
Claims:
Patent claims.

1. A method for selecting the optimal voiceprint, characterized in that by known methods, in real time using a telephone or computer, preferably telephone telecommunications or connection via Internet cloud data processing technologies, the voice data of the person communicating the data is identified in such a way that the communicating person connects via remote means of communication via telephone or computer with a customer service system, the call is automatically directed to the Interactive Voice Response system where the communicating person speaks a text of a minimum length, preferably one word, and through semantic information, the customer is automatically verified in the system with the previously assigned database of the customer's ID number, simultaneously during customer verification and during the operation of the IVR system, communication is provided by the stable field detection method in a noise-SNR pair, and after such verification of the customer and determining his ID in the system, the voiceprint database is automatically searched in the database voiceprint data assigned to a given client, and when the voiceprint is consistent with the identified client voiceprint, the conversation with the client is processed according to voiceprints matched from the client’s database, and when the voiceprint is not consistent, with the identified client's voiceprint, the communication with the client is processed according to the second database of standard voiceprints or the process of creating new' voiceprints in the database of this client takes place.

Description:
Method of selecting the optimal voiceprint

The subject of the invention is a method for selecting the optimal voiceprint, especially in customer service systems using remote communication means.

A quality dimension-based voiceprint recognition algorithm evaluation method is known from CN110335611B, including a target-related voiceprint recognition evaluation method and a target-unrelated voiceprint recognition evaluation method. For a single voice recognition algorithm, multidimensional evaluation can be used. The sensitivity of the algorithm to different parameters is obtained to optimize the algorithm for different parameters; for different voice recognition algorithms, more detailed comparison results can be provided, and the optimal recognition algorithm can be provided in conjunction with the application environment.

A method for voice interaction is known from US2020211545 A1, which includes receiving a voice signal to be detected in a predetermined period of time, performing voice identification on the voice signal to be detected to obtain a text to be detected, performing a first detection of the text to be detected, and providing a response according to with text to detect in response to the statement that the first detection was successful. In embodiments, the rate of misrecognition of a voice signal during voice interaction is reduced, thereby improving the user experience.

A method and device for speech synthesis are known from CN 111916052B. The method includes the following steps: obtaining the voice of at least one user; performing language recognition on the voice of at least one user, determining the language corresponding to each user's voice, performing voiceprint recognition on the voice of the corresponding user, and determining characteristics of the voiceprint of each user; if the general language of the current region exists in specific languages, specifying the generic language as the target language, where the current area is the area in which the user is currently located; if a general language does not exist in specific languages, designating a language whose linguistic proportion is greater than the established proportion as the target language; and based on each user's voice characteristics, outputting the target synthesized voice in the target language. According to the method, the synthesized voice is obtained by using a specific target language and target voiceprint features obtained through similarity, thereby improving the voice synthesis quality and user experience of the voice interaction system.

An information processing method is known from JP6908461 B2 which obtains first voice information indicating the user's voice, refers to a first database where the character string information and semantic information are related when the first character string generated from the first voice information does not match any z first database character string information, sends the first character string information to a server over the network, obtains at least one of the first semantic information and a control command from the server, the first semantic information instructs the device to perform a predetermined operation based on what at least one of the first semantic information and a control command associated with the character string information matching the first character string information in the second database on the server, sends to the speaker second voice information generated from the second character string information, and the second character string information is associated with the first semantic information in the first database.

The purpose of the invention is to develop such a method of voice recognition, and in particular a method of optimizing the selection of the optimal voiceprint from among many available in the database, characterized by different parameters of the generic source on the basis of which the voiceprint was created, in order to increase the effectiveness of voice biometrics.

The essence of the invention is a method for selecting the optimal voiceprint, which consists in identifying the voice data of the person communicating the data in this way in real time using known methods, using a telephone or computer, i.e. telephone telecommunications or connections via Internet technologies for data processing in the cloud, i.e. instant messengers, a way that: the communicating person connects via remote means of communication via telephone or computer to the customer service system, the call is automatically directed to the Interactive Voice Response system where the communicating person speaks a text of a minimum length, preferably one word, the customer is automatically verified in the system after customer ID number previously assigned to him in the database, simultaneously during customer verification and during the operation of the IVR system, the call is transmitted to the stable field detection method in a noise-SNR pair, and after such identification of the customer and determining his ID, the database is automatically searched in the system voiceprint data in the voiceprint database assigned to a given client, and when the voiceprint is compatible, the conversation with the client is processed according to voiceprints matched from the client's database, and when the voiceprint is not compatible, the conversation with the client is processed according to the second database of standard voiceprints or the process of creating new ones takes place voiceprints in this client's database.

The invention is disclosed in the drawing and in an embodiment.

Example. The invention is aimed at improving the effectiveness of voice biometrics subsystems used in customer service systems using remote communication means. In the developed solution, a call from the client is directed to the IVR system, where, in the interactive process, the client is forced to speak a text with a minimum duration, predefined in the system, of at least one word. After verifying the client and determining his ID, the system searches the database of voiceprints assigned to a given client. At the same time, the voice stream with the customer's statement is directed to a signal quality classifier that determines the signal-to-noise ratio for a specific voice stream containing the customer's statement and the type of interfering noise present in the speaker's background. Based on the information determined by the classifier, the database is searched for voiceprints that have been assigned an identified type of disturbing noise along with the measured signal-to-noise interval.

It is possible that a specific biometric engine will internally handle specific types of noise, and in such a situation, a voiceprint based on this specific type of noise will not be created for a given engine. For such a case, the most effective voiceprint will be used, appropriately marked in the voiceprint database.

In the next step, after matching the best voiceprint to a given scenario, i.e. the combination of interference type - signal to noise ratio, authentication of the person served via the voice channel will be performed. The proposed solution will also be equipped with an additional post-processing function, which will be responsible for expanding the voiceprint database. If the following conditions occur simultaneously

1. The voiceprint assigned to the biometric engine N and the detected parameter pair interference type - signal to noise ratio will not be found in the voiceprint database,

2. the N biometric engine will not natively handle the detected type of noise,

As part of post-processing, voiceprints will be created for all supported engines based on the acquired voice stream. If the subsystem does not have explicit information about whether the N engine natively supports the detected type of noise, the built-in algorithm will perform a voiceprint and then verify whether the voiceprint created for the detected type of noise, present at a specific signal-to-noise interval, guarantees improvement for a given engine, quality of biometric detection. The method of completing voiceprints will use a different fragment of the recorded voice stream to create the voiceprint than to verify the effectiveness of the voiceprint. If the increased effectiveness of the voiceprint is confirmed, it will be registered in the database along with data describing at least the engine to which the voiceprint was assigned, the noise type and the signal-to-noise ratio parameter.