SYSTEMS AND METHODS FOR GENERATING A CONTEXTUALLY AND CONVERSATIONALLY CORRECT RESPONSE TO A QUERY - THOMSON REUTERS GLOBAL RESOURCES UNLIMITED CO

Title:

SYSTEMS AND METHODS FOR GENERATING A CONTEXTUALLY AND CONVERSATIONALLY CORRECT RESPONSE TO A QUERY

Document Type and Number:

WIPO Patent Application WO/2019/211817

Kind Code:

Abstract:

The present disclosure relates to systems and methods for generating contextually, grammatically, and conversationally correct answers to input questions. Embodiments provide for linguistic and syntactic structure analysis of a submitted question in order to determine whether the submitted question may be answered by at least one headnote. The question is then further analyzed to determine more details about the intent and context of the question. A federated search process, based on the linguistic and syntactic structure analysis, and the additional analysis of the question is used to identify candidate question-answer pairs from a corpus of previously created headnotes. Machine learning models are used to analyze the candidate question-answer pairs, additional rules are applied to rank the candidate answers, and dynamic thresholds are applied to identify the best potential answers to provide to a user as a response to the submitted question.

Inventors:

CUSTIS TONYA (US)
SURPRENANT MATTHEW A (US)
LINDBERG ERIK (US)
MCELVAIN GAYLE (US)

Application Number:

PCT/IB2019/053658

Publication Date:

November 07, 2019

Filing Date:

May 03, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

THOMSON REUTERS GLOBAL RESOURCES UNLIMITED CO (CH)
CUSTIS TONYA (US)

International Classes:

G06F16/20; G06F16/14; G06F16/332; G06F16/338; G06F17/24; G06F17/27; G06Q50/18; G06Q50/20

Foreign References:

US20110125734A1	2011-05-26
US20070260472A1	2007-11-08
US20100030749A1	2010-02-04
US20110153601A1	2011-06-23
US20170177715A1	2017-06-22
US20170264806A1	2017-09-14

Other References:

See also references of EP 3769229A4

Attorney, Agent or Firm:

REES, Nathan (US)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A method comprising:

receiving a query from a user terminal, the query including a question having a linguistic and syntactic structure;

analyzing the linguistic and syntactic structure of the question to determine at least a context of the question;

generating at least one search query based on the analyzing the linguistic and syntactic structure of the question;

causing the at least one search query to be executed on at least one data store;

obtaining a plurality of candidate answers in response to the execution of the search query;

obtaining a linguistic and syntactic analysis of each candidate answer of the plurality of candidate answers, wherein the question is paired with each candidate answer of the plurality of candidate answers to form a plurality of question-answer pairs;

extracting at least one feature for each question-answer pair of the plurality of question-answer pairs;

feeding the extracted at least one feature for each question-answer pair into a ranking model;

scoring, by the ranking model, for each feature of the at least one feature, each question-answer pair, wherein a score for a particular question-answer pair resulting from the scoring represents a probability that the particular candidate answer of the particular question-answer pair is a correct answer to the question;

ranking the candidate answers of the plurality of candidate answers based on the score of each candidate answer; and

providing at least one of the ranked candidate answers as answer to the question in the query.

2. The method of claim 1, wherein the analyzing the linguistic and syntactic structure of the question includes one of: natural language processing, entity recognition, frame classification, key number classification, and embeddings analysis.

3. The method of claim 2, wherein the context of the question includes the intent of the question, and wherein the frame classification includes classifying the question into at least one frame category associated with the intent of the question.

4. The method of claim 2, wherein the at least one frame category is one of: Admissibility, Availability of Damages or Remedy, Burden of Proof, Construction of Instruments, Court Authority, Elements, Factors, and Tests, Accrual of Statute of Limitations, Tolling of Statute of Limitations, Duration of Statute of Limitations, Standard of Review, Enforceability of Contracts, and Others.

5. The method of claim 1, further comprising pre-processing the received question prior to the analyzing, wherein the pre-processing includes determining whether the question is of a type that can be answered by a headnote, and wherein each candidate answer of the plurality of candidate answers is a headnote.

6. The method of claim 1, wherein the obtaining the linguistic and syntactic analysis of each candidate answer of the plurality of candidate answers includes analyzing the linguistic and syntactic structure of each candidate answer subsequent to the obtaining the plurality of candidate answers in response to the execution of the search query.

7. The method of claim 1, wherein the obtaining the linguistic and syntactic analysis of each candidate answer of the plurality of candidate answers includes analyzing the linguistic and syntactic structure of each candidate answer prior to the obtaining the plurality of candidate answers in response to the execution of the search query, wherein the linguistic and syntactic analysis of each candidate answer is obtained from a database.

8. The method of claim 1, further comprising post-processing the ranked candidate answers, wherein the post-processing includes one of:

applying constraint rules to the ranked candidate answers, to ensure that the candidate answers contain required elements, wherein a particular ranked candidate answer is eliminated as a candidate answer based on a constraint rule being met; and

applying weighting rules to one of penalize and boost a ranked candidate answer.

9. The method of claim 1, wherein the providing at least one of the ranked candidate answers as answer to the question in the query includes performing a threshold determination to the at least one of the ranked candidate answers to determine whether the at least one of the ranked candidate answers will be provided as an answer to the question based on a threshold.

10. The method of claim 9, wherein the threshold is a threshold value of a probability that the at least one of the ranked candidate answers is a correct answer, wherein the at least one of the ranked candidate answers is determined to be provided as an answer to the question when a probability score of the at least one of the ranked candidate answers exceeds the threshold value.

11. The method of claim 1, wherein the extracted at least one feature includes at least one of: a linguistic similarity feature, a concept coordination feature, a topicality feature, an abstract/concrete classification feature, and a key number scoring function feature.

12. The method of claim 1, wherein the search query includes one of: a natural language search query, a more-like-this search query, and a semantic search query.

13. The method of claim 12, wherein the analyzing the linguistic and syntactic structure of the question includes determining a frame into which the question is classified, wherein the semantic search query includes:

obtaining a frame-specific search template, the frame-specific search template associated with the frame of the question, and wherein the frame-specific search template defines at least one placeholder corresponding to an element of the frame of the question; identifying an entity in the question corresponding to the element of the frame corresponding to the at least one placeholder; and

replacing the placeholder in the frame-specific search template with the identified entity to generate a complete frame-specific search template, wherein the generating at least one search query is based on the complete frame-specific search template.

14. A system comprising:

a question/answer processor configured to:

receive a query from a user terminal, the query including a question having a linguistic and syntactic structure; and

analyze the linguistic and syntactic structure of the question to determine at least a context of the question;

a query generator configured to:

generate at least one search query based on the analyzing the linguistic and syntactic structure of the question;

cause the at least one search query to be executed on at least one data store; and

obtain a plurality of candidate answers in response to the execution of the search query, wherein the question/answer processor is further configured to obtain a linguistic and syntactic analysis of each candidate answer of the plurality of candidate answers, and to pair the question with each candidate answer of the plurality of candidate answers to form a plurality of question-answer pairs;

a feature extractor configured to:

extract at least one feature for each question-answer pair of the plurality of question-answer pairs; and

feed the extracted at least one feature for each question-answer pair into a candidate ranker;

the candidate ranker configured to:

score, using a ranking model, for each feature of the at least one feature, each question-answer pair, wherein a score for a particular question-answer pair resulting from the scoring represents a probability that the particular candidate answer of the particular question-answer pair is a correct answer to the question; and

rank the candidate answers of the plurality of candidate answers based on the score of each candidate answer; and

an answer detector configured to provide at least one of the ranked candidate answers as answer to the question in the query.

15. The system of claim 14, wherein the configuration of the question/answer processor to analyze the linguistic and syntactic structure of the question includes configuration of the question/answer processor to perform one of: natural language processing, entity recognition, frame classification, key number classification, and embeddings analysis.

16. The system of claim 14, wherein the configuration of the question/answer processor to obtain the linguistic and syntactic analysis of each candidate answer of the plurality of candidate answers includes configuration of the question/answer processor to one of:

analyze the linguistic and syntactic structure of each candidate answer subsequent to obtaining the plurality of candidate answers in response to the execution of the search query; and analyze the linguistic and syntactic structure of each candidate answer prior to obtaining the plurality of candidate answers in response to the execution of the search query, wherein the linguistic and syntactic analysis of each candidate answer is obtained from a database.

17. The system of claim 14, wherein the configuration of the answer detector to provide at least one of the ranked candidate answers as answer to the question in the query includes configuration of the answer detector to perform a threshold determination to the at least one of the ranked candidate answers to determine whether the at least one of the ranked candidate answers will be provided as an answer to the question based on a threshold, wherein the threshold is a threshold value of a probability that the at least one of the ranked candidate answers is a correct answer, wherein the at least one of the ranked candidate answers is determined to be provided as an answer to the question when a probability score of the at least one of the ranked candidate answers exceeds the threshold value.

18. The system of claim 1, wherein the search query includes one of: a natural language search query, a more-like-this search query, and a semantic search query.

19. The system of claim 18, wherein the configuration of the question/answer processor to analyze the linguistic and syntactic structure of the question includes

configuration to determine a frame into which the question is classified, wherein the semantic search query includes:

20. A computer-based tool including non-transitory computer readable media having stored thereon computer code which, when executed by a processor, causes a computing device to perform operations comprising: