Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CAPTURING DETAILED STRUCTURE FROM PATIENT-DOCTOR CONVERSATIONS FOR USE IN CLINICAL DOCUMENTATION
Document Type and Number:
WIPO Patent Application WO/2019/078887
Kind Code:
A1
Abstract:
A method and system is provided for assisting a user to assign a label to words or spans of text in a transcript of a conversation between a patient and a medical professional and form groupings of such labelled words or spans of text in the transcript. The transcript is displayed on an interface of a workstation. A tool is provided for highlighting spans of text in the transcript consisting of one or more words. Another tool is provided for assigning a label to the highlighted spans of text. This tool includes a feature enabling searching through a set of predefined labels available for assignment to the highlighted span of text. The predefined labels encode medical entities and attributes of the medical entities. The interface further includes a tool for creating groupings of related highlighted spans of texts. The tools can consist of mouse action or key strokes or a combination thereof.

Inventors:
CO CHRISTOPHER (US)
LI GANG (US)
CHUNG PHILIP (US)
PAUL JUSTIN (US)
TSE DANIEL (US)
CHOU KATHERINE (US)
JAUNZEIKARE DIANA (US)
RAJKOMAR ALVIN (US)
Application Number:
PCT/US2017/057640
Publication Date:
April 25, 2019
Filing Date:
October 20, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
GOOGLE LLC (US)
International Classes:
G16H10/60; G16H15/00; G16H70/20
Foreign References:
US201762538112P2017-07-28
Other References:
SEID MUHIE YIMAM ET AL: "WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations", PROCEEDINGS OF THE 51ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 4 September 2013 (2013-09-04), pages 1 - 6, XP055478864
PONTUS STENETORP ET AL: "BRAT: a Web-based Tool for NLP-Assisted Text Annotation", PROCEEDINGS OF THE 13TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 23 April 2012 (2012-04-23), pages 102 - 107, XP055478849, Retrieved from the Internet [retrieved on 20180528]
DAIN KAPLAN ET AL: "Slate - a tool for creating and maintaining annotated corpora", J. LANG. TECHNOL. COMPUT. LINGUISTI., 31 October 2012 (2012-10-31), pages 91 - 103, XP055479314, Retrieved from the Internet [retrieved on 20180529]
SEID MUHIE YIMAM ET AL: "An adaptive annotation approach for biomedical entity and relation recognition", BRAIN INFORMATICS, vol. 3, no. 3, 27 February 2016 (2016-02-27), pages 157 - 168, XP055478856, ISSN: 2198-4018, DOI: 10.1007/s40708-016-0036-4
Attorney, Agent or Firm:
FAIRHALL, Thomas, A. (US)
Download PDF:
Claims:
Claims

What is claimed is:

1. A method of facilitating annotation of a recording of a medical practitioner-patient conversation, comprising the steps of:

a) generating a display of a transcript of the recording;

b) providing a tool for highlighting spans of text in the transcript consisting of one or more words;

c) providing a tool for assigning a label to the highlighted spans of text, wherein the tool includes a feature for searching through a set of predefined labels available for assignment to the highlighted span of text, and wherein the labels encode medical entities and attributes of the medical entities; and

d) providing a tool for creating groupings of related highlighted spans of texts.

2. The method of claim 1 , wherein the transcribed recording is indexed to time segment information. 3. The method of claim 1 or claim 2, wherein the tool b) permits only words or groups of words to be highlighted, and not individual characters.

4. The method of any of claims 1-3, wherein the medical entities are selected from the list of medical entities consisting of medications, procedures, symptoms, vitals, conditions, social history, medical conditions, surgery, imaging, provider, vaccine, reproductive history, examination, and medical equipment.

5. The method of claim 4, wherein at least one of the medical entities are arranged in hierarchical manner.

6. The method of claim 5, wherein at least one of the medical entities includes a symptom medical entity and different parts of the body within the symptom medical entity. 7. The method of claim 4, wherein one of the medical entities consists of a symptom medical entity and wherein the symptom medical entity includes attributes of at least severity, frequency, onset, location.

8. The method of any of claims 1-7, further comprising supplying the transcript to a pre- labeling system and receiving from the pre-labeling system a pre-annotated transcript containing suggested labels for spans of text in the transcript.

9. The method of claim 8, wherein the tool c) further comprises a display of a suggested label from the pre-annotated transcript and a tool to either reject or accept the suggested label. 10. The method of claim 8, wherein the pre-labeling system includes a named entity recognition model trained on at least one of medical textbooks, a lexicon of clinical terms, clinical documentation in electronic health records, and annotated transcripts of doctor- patient conversations. 1 1. The method of any of claims 1-10, wherein the tool b) and the tool d) comprise key stroke(s), mouse action or a combination of both.

12. The method of any of claims 1-11 , wherein the feature for searching in tool c) comprises a display of a scrollable list of available labels and a search box for entering a search term for searching through the list of available labels, and wherein tool c) further comprises key stroke(s), mouse action or a combination of both to assign a label.

13. A system for facilitating annotation of a recording of a medical practitioner-patient conversation, comprising:

a) an interface displaying a transcript of the recording;

b) a tool for highlighting spans of text in the transcript consisting of one or more words; c) a tool for assigning a label to the highlighted spans of text, wherein the tool includes a feature enabling searching through a set of predefined labels available for assignment to the highlighted span of text, and wherein the predefined labels encode medical entities and attributes of the medical entities; and

d) a tool for creating groupings of related highlighted spans of texts.

14. The system of claim 13, wherein the transcribed recording is indexed to time segment information.

15. The system of claim 13 or claim 14, wherein the tool b) permits only words or groups of words to be highlighted, and not individual characters.

16. The system of any of claims 13-15, wherein the medical entities are selected from the list of medical entities consisting of medications, procedures, symptoms, vitals, conditions, social history, medical conditions, surgery, imaging, provider, vaccine, reproductive history, examination, and medical equipment.

17. The system of claim 16, wherein at least one of the medical entities are predefined in hierarchical manner.

18. The system of claim 17, wherein at least one of the medical entities includes a symptom medical entity and different parts of the body within the symptom medical entity.

19. The system of claim 16, wherein one of the medical entities consists of a symptom medical entity and wherein the symptom medical entity includes attributes of at least severity, frequency, onset, location. 20. The system of any of claims 13-19, further comprising pre-labeling system generating a pre-annotated transcript containing suggested labels for spans of text in the transcript.

21. The system of claim 20, wherein the tool c) further comprises a display of a suggested label from the pre-annotated transcript and a tool to either reject or accept the suggested label.

22. The system of claim 20, wherein the pre-labeling system includes a named entity recognition model trained on at least one of medical textbooks, a lexicon of clinical terms, clinical documentation in electronic health records, and annotated transcripts of doctor- patient conversations.

23. The system of any of claims 13-22, further comprising a system for generating a machine learning model configured to automatically generate annotated transcribed audio recordings.

24. The system of any of claims 13-22, further comprising a system for generating a machine learning model configured to generate health predictions.

25. The system of any of claims 13-24, wherein the tool b) and the tool d) comprise key stroke(s), mouse action or a combination of both.

26. The system of any of claims 13-25, wherein the feature for searching in tool c) comprises a display of a scrollable list of available labels and a search box for entering a search term for searching through the list of available labels, and wherein tool c) further comprises key stroke(s), mouse action or a combination of both to assign a label.

27. A method of facilitating annotation of a recording of a conversation, comprising the steps of:

a) generating a display of a transcript of the recording;

b) providing a tool for highlighting spans of text in the transcript consisting of one or more words;

c) providing a tool for assigning a label to the highlighted spans of text, wherein the tool includes a feature for searching through predefined labels available for assignment to the highlighted span of text, and wherein the labels encode entities and attributes of the entities; and

d) providing a tool for creating groupings of related highlighted spans of texts.

28. The method of claim 27, wherein the recording consists of a recording between a patient and medical professional.

29. The method of claim 27 or claim 28, further comprising supplying the transcript to a pre-labeling system and receiving from the pre-labeling system a pre-annotated transcript containing suggested labels for spans of text in the transcript. 30. The method of any of claims 27-29, wherein the transcribed recording is indexed to time segment information.

31. The method of any of claims 27-30, wherein the tool b) and the tool d) comprise key stroke(s), mouse action or a combination of both.

32. The method of any of claims 27-31 , wherein the feature for searching in tool c) comprises a display of a scrollable list of available labels and a search box for entering a search term for searching through the list of available labels, and wherein tool c) further comprises key stroke(s), mouse action or a combination of both to assign a label.

33. The method of claim 29, wherein the tool c) further comprises a display of a suggested label from the pre-annotated transcript and tools to either reject or accept the suggested label. 34. The method of any of claims 27-33, wherein at least one of the entities is defined in a hierarchical manner.

Description:
Capturing detailed structure from patient-doctor conversations for use in clinical documentation Background

This disclosure is directed to a method and system for facilitating the annotation of transcribed audio or audio-visual recordings of medical encounters.

Conversations between patients and medical practitioners such as doctors and nurses and their conversations are often recorded. The record of the conversation, and a transcript, are part of the patient's medical record. The transcript can be created by a speech-to-text converter or created by a trained (human) medical transcriptionist listening to the recording.

A transcript without any annotation is of limited usefulness when it is reviewed by the physician, as they have to pore over many lines or pages of the transcript to find relevant information or understand the relatedness of different comments in the transcript.

Additionally, a collection of transcripts of medical encounters can be used to train machine learning models. Training a machine learning model requires a large amount of high quality training examples, i.e., labelled data. There is a need in the art for methods for facilitating the generation of transcripts of medical encounters that are annotated, that is, relevant words or phrases are highlighted and associated with medical concepts and grouped as being related to each other. This disclosure meets that need.

Summary

In a first aspect, a method of facilitating annotation of a recording of a medical practitioner-patient conversation is disclosed. The method includes a step of generating a display of the transcribed audio recording (i.e., transcript), for example on the display of a workstation used by a human ("scribe labeler") who is performing the annotation. A tool is provided for highlighting spans of text in the transcript consisting of one or more words. The tools can be simple mouse or keyboard shortcuts for selecting or highlighting one or more words.

The method further includes a step of providing a tool for assigning a label to the highlighted spans of text. The tool includes a feature for searching through a set of predefined labels available for assignment to the highlighted span of text. For example, when the scribe labeler highlights a word such as "stomachache" in the transcript a window pops up where the user can search through available labels, e.g. by scrolling or using a search tool. The labels encode medical entities (such as symptoms, medications, lab results, etc.) and attributes of the medical entities (e.g., severity, location, frequency, time of onset of a symptom entity).

In this document, the term "medical entities" is intended to refer to categories of discrete medical topics, such as symptoms, medications, lab results, vital signs, chief complaint, medical imaging, conditions, medical equipment, and so forth. The medical entities are predefined to be relevant to the context of the labelling task, and so in this case in one embodiment they could consist of the following list: medications, procedures, symptoms, vitals, conditions, social history, medical conditions, surgery, imaging, provider, vaccine, reproductive history, examination, and medical equipment. The medical entities could be structured in a hierarchical manner, such as the medical entity "medication" could be in the form of "medication:allergy" where "allergy" is a type or subclass of the overall class "medication." As another example, the medical entity "symptom" could be structured in a hierarchical manner of symptoms for different parts of the body, such as "symptom: eyes", "symptom:neurological", etc.

The term "attributes of the medical entities" simply means some descriptive property or characteristic of the medical entity, such as for example the medical entity "medical equipment" may have an attribute of "patient's actual use" meaning that the patient is currently using a piece of medical equipment. As another example, a symptom medical entity may have an attribute of "onset." A label of "symptom/onset' would be used as an annotation when there is word or phrase in the transcript indicating when the patient first started experiencing the symptom. As another example, a label of "medical

equipment/regularly' would be used as an annotation when there is a word or phrase in the transcript indicating the patient used some piece of medical equipment regularly, with "regularly" being the attribute of the medical entity "medical equipment."

The method further includes a step of providing a tool for grouping related highlighted spans of texts. The tool could be for example a combination of mouse clicks or keyboard shortcuts to establish the grouping. The groupings allow medical entities associated with labels assigned to the highlighted spans of text to be associated as a group. For example, in a conversation in which a patient describes a sharp chest pain that started last week, the text "sharp", "chest pain" and "last week" would be highlighted and labeled with symptom labels and attributes of severity, location, and time of onset, respectively and grouped together as all being related to each other.

In another aspect, a system is disclosed for facilitating annotation of a recording of a medical practitioner-patient conversation. The system includes a) an interface displaying a transcript of the recording; b) a tool for highlighting spans of text in the transcript consisting of one or more words; c) a tool for assigning a label to the highlighted spans of text, wherein the tool includes a feature enabling searching through predetermined labels available for assignment to the highlighted span of text, and wherein the labels encode medical entities and attributes of the medical entities; and d) a tool for creating groupings of related highlighted spans of texts.

The methods and systems are applicable to other types of transcripts, in which a set of predefined labels are created, e.g., by an operator, which are designed to be relevant to the annotation task at hand and the labels are associated with entities and attributes relevant to the transcript and annotation task. The tools of this disclosure are used in the same manner in these other possible implementations, such as for example transcripts of legal proceedings, such as deposition or trial, or transcripts of hearings before administrative bodies, such a city council, Congress, State Legislature, etc.

Brief Description of the Drawings

Figure 1 is a flow chart showing an environment in which the method can be performed.

Figure 2 is an illustration of workstation having a display and user interface for use by a human ("scribe labeler") to annotate a transcript of medical encounter. The user interface includes the tools described in conjunction with Figures 4-6. The term "user interface" is intended to refer to the combination of the display on the workstation and associated devices for providing user input, such as the mouse and keyboard.

Figure 3 is an illustration of the user interface of Figure 2 showing a list of transcripts which are ready for annotation.

Figure 4 is an illustration of a transcript of a medical encounter in which the scribe labeler is annotating certain words or phrases in the text. Figure 4 shows a search box which pops up which permits the scribe labeler to search for medical entities and associated attributes. Spans of text can be highlighted by use of a tool such as by clicking on the word or using drag techniques with a mouse.

Figure 5 is an illustration of the transcript of Figure 4 in which the scribe labeler is annotating the text "upper left" and a search box which pops up. Additionally, a proposed label for the phrase "upper left" for the medical entity "symptom" and attribute "location (on body)" is also displayed. The proposed label is generated by a pre-labelling system shown in Figure 1.

Figure 6 is illustration of the transcript of Figures 4 and 5 when the scribe labeler forms a grouping of the two highlighted spans of text "stomachache" and "three days". The tool for forming the grouping consists of a highlighting the two texts and then keyboard shortcut of holding down the "G" key, clicking on the highlighted spans of text, and releasing the "G" key. Figure 6 also shows the formation of the group in the Groups tab listing all the groups in the transcript at the bottom of the display.

Figure 7 is a more detailed illustration of the pre-labeler of Figure 1.

Figure 8 is an illustration of a machine learning model training system which receives as input a multitude of annotated transcripts in accordance with the features of Figure 1.

Detailed Description

This disclosure is directed to methods and systems for facilitating annotations of recordings of medical encounters, i.e., conversations between patients and medical practitioners such as doctors or nurses. The recordings could be audio or audio-visual recordings. The recordings are transcribed into written form. The transcripts could be generated by trained medical transcriptionists, that is by hand, or by the use of speech to text converters, which are known in the art. The output of the system is an annotated version of the transcript in which relevant medical information (i.e., spans of text, such as individual words or groups of words) in the text are labeled (i.e., tagged as being associated with medical entities and attributes of such entities), and grouped to express relatedness between the labelled text.

Figure 1 is a flow chart showing the environment in which the methods and systems of this disclosure are practiced. Patient consent for recording the encounter with the doctor or nurse is obtained at 102. Additionally, the patient is advised of the use of a transcript of the recording to be placed into the electronic health record and consent is obtained. The patient is further advised that the recording may be annotated and used for generating or training machine learning models and consent is obtained as well. In all cases where the transcripts are annotated or used for machine learning model training the transcript data is patient de-identified and used in compliance with all requirements for disclosure and use of a limited data set under HIPAA. Ethics review and institutional review board exemption is obtained from each institution. Patient data is not linked to any Google user data.

Furthermore, for the system 116 using annotated transcripts for machine learning model training includes a sandboxing infrastructure that keeps each electronic health record (or transcript) dataset separated from each other, in accordance with regulation, data license and/or data use agreements. The data in each sandbox is encrypted; all data access is controlled on an individual level, logged, and audited. At step 104, after the required patient consents are obtained, the patient consults with the medical practitioner and a recording, either audio or audio-visual, is obtained and stored in digital format.

At step 106, a written transcript of the recording is obtained, either by a trained transcriptionist or by use of a speech-to-text converter. The transcript is preferably accompanied by a time indexing, in which the words spoken in the transcript, or lines of text, are associated with elapsed time of the recording, as will be illustrated subsequently.

At step 108, an annotation of the transcript is performed by the scribe labeler in the manner described and explained in the subsequent figures. The annotations include the assignment of labels to spans of text in the transcript and groupings of spans of text to indicate their relatedness. In step 108 a display of the transcribed audio recording is generated, for example on the display of a workstation used by the scribe labeler. See Figures 2 and 4-6. A tool is provided for highlighting spans of text in the transcribed audio recording consisting of one or more words. The tool can be simple mouse or keyboard shortcuts for selecting or highlighting one or more words. A tool is also provided for assigning a label to the highlighted spans of text. The tool includes a feature for searching through predetermined labels available for assignment to the highlighted span of text. For example, when the scribe labeler highlights a word such as "stomachache" in the transcript a list pops up where the user can search through available labels, and a search tool is provided for performing a word search through the list of labels. The labels encode medical entities (such as symptoms, medications, lab results, etc.) and attributes of the medical entities (e.g., severity, location, frequency, time of onset of a symptom entity).

A tool is also provided for grouping related highlighted spans of texts. The groupings allow medical entities associated with labels to be grouped together. For example, in a conversation in which a patient describes a sharp chest pain that started last week, the text "sharp", "chest pain" and "last week" would be highlighted and labeled with symptom labels and attributes of severity, location, and time of onset, and grouped together, as they are all related to a single medical condition of the patient. This tool can consist of keyboard and/or or mouse action, as explained below.

The system may include a pre-labeler 110, shown in more detail in Figure 7. The pre-labeler is a computer system implementing a learned, automated word recognition model which identifies words or spans of text in the transcript which are likely to be the subject of labelling or grouping. The pre-labeler 110 provides input into annotation step 108 by providing suggested labels for highlighted spans of text when the scribe labeler performs the annotation of the transcript. This is shown in more detail in Figure 5. As a result of the annotation step 108 an annotated transcript file 1 12 is created, which consists of the transcript as well as annotations in the form of labelled or tagged spans of text (words or phrases) and groupings of the tagged spans of text. The annotated transcript file is in digital form, with the annotations and groupings in the file as metadata or otherwise. The annotated transcript file 1 12 is then added to the patient's electronic health record (EHR) 114 or supplied to a machine learning model training system 1 16. The machine learning model training system 116 may, for example, be a system for training a machine learning model to automatically annotate transcripts of medical encounters.

Alternatively, the machine learning model may use the annotated transcript as well as other data in the patient health record, for not only the individual patient, but also a multitude of other patients, to generate predictions of future medical events for example as described in the pending U.S. provisional application serial no. 62/538, 112 filed July 28, 2017, the content of which is incorporated by reference herein. The EHR 1 14 may be provided to the system 1 16 as indicated by the dashed line 1 14.

The annotated transcript file 1 12 may be fed back into the pre-labeler to enable further training the machine learning pre-labeler 110, as indicated by the dashed line 120. This aspect will be described in further detail later.

Figure 2 is an illustration of a workstation 200 which is used by a scribe labeler during the annotation step 108 of Figure 1. The workstation includes a central processing unit (general purpose computer 210) executing an application which provides for display of the transcript of the medical encounter and tools by which the user interface consisting of a keyboard 212, a mouse 214 and a monitor 216 allow for the highlighting of spans of text (words or phrases 230), assigning labels to the spans of text, and grouping of the highlighted spans of text as will be discussed below. The monitor 216 includes a display 218 of a transcript 222, and a scroll bar 224 for allowing the user to navigate to various portions of the transcript. A time index 220 of the transcript is shown at the top of the display 218. The time index includes a slider 221 which when moved horizontally back and forth allows for the portion of the transcript associated with a particular elapsed time to be displayed at the top of the display 118. In this case the time index 220 indicates that the transcript is 13 minutes 24 seconds duration and the slider 221 is all the way to the left, therefore the beginning of the transcript is shown at the top of the display. The transcript is in the form of numbered lines, followed by identification of who was speaking (doctor or patient), followed by a text transcript of what was said.

Figure 3 shows the display of a "to-do" list of transcripts in need of annotation which is provided on the user interface of Figure 2 when the scribe labeler logs on to the workstation of Figure 2. The individual transcripts are patient de-identified (that is, identified only by patient number in column 302 and not by name). Column 304 shows the elapsed time, column 306 shows the number of lines of text in the transcript, column 308 shows the patient's chief complaint associated with the medical encounter, and column 310 shows the nature or type of the medical encounter. When one of the transcripts is selected in Figure 3 (e.g., by clicking on the number in the column 302) the display of Figure 2 is generated.

Figure 4 is an illustration of the display 218 of the user interface along with a transcript 222, and time index 220. Time segment information for each utterance (sentence or word) is provided in the transcript and the time index 220 provides a slider tool 221 which moves right and left to jump to different portions of the transcript.

The interface provides a tool for text highlighting. In particular, mouse and keyboard shortcuts make highlighting spans of text easy. For example, a user can double click on a given word and the word is automatically highlighted on the display. Only words can be highlighted, not individual characters, reducing errors and increasing annotation speed. Other tools could be used for highlighting, such as by click and drag techniques with a mouse, a keyboard stroke (such as by putting the cursor over the word and hitting a particular key such as H, or CTRL-H), or a combination keyboard stroke and mouse action.

In the example of Figure 4, the user has highlighted the word "stomachache" (see 400). The user interface provides a tool for text tagging, i.e., labelling the highlighted term. Labels are applied to the highlighted spans of text essentially allowing the scribe labeler to inject information into the transcript, for example to indicate that the highlighted text

"stomachache" is a symptom, or a gastrointestinal symptom. In particular, when the user has highlighted the term "stomachache", a box (tool) 402 pops up which shows a list 404 of medical entities and associate attributes, a search term entry field 405 by which they can search the list 404, and a scroll bar 406 allowing the scribe labeler to scroll through the list and select a medical entity and associate attribute which is appropriate for the highlighted test. In the example Figure 4, the medical entity "Symptom:GI" and associated attribute "abdominal pain" was found in the list 404 and the user clicked on that combination of medical entity and attribute. The display includes a Table tab 410 at the bottom of the display which lists the labelled spans of text, including medical entity, attribute, location in the transcript (line 4) and the associated text span ("stomachache").

The scribe labeler does the same process and uses the same tools to highlight the span of text "three days", assign a label of medical entity "SymAttr" and attribute "duration" ("Symattr/duration') to the highlighted span of text "three days" and this additional annotation shows up in the Table of annotations 410. The scribe labeler then proceeds to highlight the span of text "upper left", 412. The scribe labeler again uses the tool 402 to ascribe a label to the span of text "upper left." Again this could be done using the tools described in Figure 4. As shown in Figure 5, in one embodiment where there is pre-labelling of the transcript, when the user highlights the span of text "upper left" a suggested label is shown in the box 502. This suggested label was assigned to the span of text "upper left" by the pre-labeler of Figure 1. The user can accept this suggestion by clicking on the box 502, or reject the suggestion by clicking on the X icon 504. In the situation of Figure 5 the scribe labeler accepted the suggestion by a mouse click (or any other alternative suitable user interface action, such as keyboard shortcut etc.) and the annotation is added to the Table 410 as shown in Figure 5 at 506. If the scribe labeler rejects the suggestion they can use the pop-up search tool 402 or scroll through the list of labels to find a suitable label.

It will be appreciated that the search tool 402 could pop up when the scribe labeler is taking action to highlight a span of text, and disappear after the label has been assigned, or alternatively it could be a persistent feature of the user interface during annotating.

As noted previously, the user interface of Figures 2 and 4-6 includes a tool for permitting the scribe labeler to group together highlighted and labelled spans of text which are conceptually or causally related to each other. For example, in Figure 6 the spans of text "stomachache", and "three days" are related to a gastrointestinal symptom, namely the type of symptom and the duration of the symptom. To make this grouping, the interface provides a tool in the form of combination of key strokes and mouse actions in the illustrated embodiment. In particular, the scribe labeler holds down the "G" key, clicks on the two highlighted spans of text, and then releases the "G" key. Of course, variations from this specific example of the tool for forming a grouping are possible and within the scope of this disclosure, such as combinations of mouse actions alone (e.g., selecting spans of text with a left click and then a right click to form the group), key strokes alone (e.g., ALT-G to select the highlighted spans of text and then ENTER to form the group), or other various possible combinations of mouse actions and key strokes. In Figure 6, the "2" icon 602 indicates the number of elements in the grouping (here two). The "X" icon 604 is click target to delete the grouping. The user has toggled the Groups tab 606 and the group of "stomachache" and "threedays" is shown as indicated at 608, along with the location in the transcript (line 4 for the location first element in the group in this example).

The search tool 402 of Figure 4 makes the process of locating the relevant label easy to navigate. In the example of medical transcripts, there may many hundreds of possible labels to choose from. For example, there may be ten or twenty predefined different medical entities and ten or twenty or more different attributes for each of the medical entities. The medical entities may be customized and organized in a hierarchical manner, as explained previously. These labels encode a medical ontology that is designed specifically for medical documentation. These labels encode medical entity information, such as medication, procedures, symptoms, conditions, etc., as well as attributes of the entities, such as onset, severity, frequency, etc., of a symptom, and whether or not the patient declined or refused (attributes) a medical procedure (entity).

The text grouping as shown in Figure 6 allows the scribe labeler to inject additional information into the transcript and in particular identify relationships or relatedness between concepts. For example the system and method of this disclosure allows the scribe labelers to specify groups of highlighted text such that entities can be associated with the attributes as a group.

The pre-labelling system 1 10 of Figure 1 is shown in more detail in Figure 7. The input to the system 110 is a text transcript 702 generated at step 108 of Figure 1. The system 1 10 uses a machine learning medical named entity recognition (NER) model 703 which identifies candidate information (words or phrases) in the transcript and suggested labels for such words or phrases based on supervised learning from trained examples, in the form of a pre-annotated transcript 704. Named entity recognition models are well known in the field of machine learning and are described extensively in the scientific literature. The NER model 703 needs its owned labelled training data. For this training data we use a large corpus of medical text books (over 120,000 medical text books) using deep learning word embedding, in conjunction with a large lexicon of existing medical ontologies, e.g., UMLS (unified medical language system) and SNOMED (systemized nomenclature of medicine). Additionally, the NER can be trained from annotated medical encounter transcripts. A NER model can also be trained from a hybrid of data sources, which may include medical and clinical text books, annotated transcripts from doctor- patient

conversations, and clinical documentation contained in anonymized electronic health records of a multitude of patients. The NER model may further be trained from feedback of the annotation of the transcript as performed in Figure 1 and Figure 7. For example, after the pre-labeling system generates the pre-annotated transcript 704 and the scribe labeler has proceeded to complete the annotation at step 108, there can be feedback of corrections between the suggested annotations in pre-annotated transcript 704 and annotated transcript 112 back into the NER model.

As shown in Figure 8, the annotated transcripts 112 can be supplied to a machine learning model training system. In one form, the model training system 1 16 uses the transcripts, along with other patient data, from a multitude of patients to generate machine learning models to make health predictions. Alternatively, the annotated transcripts could be used in the system 1 16 to develop deep learning models for automating the process of generating annotated transcripts of medical encounters.

The system and method of this disclosure has several advantages. In many natural language processing text annotation tools, relationships between must be identified in an explicit and cumbersome manner. In contrast, in this disclosure the labels (including predefined labels relevant to the annotation task) and labelling and groupings tools permit such relationships to be readily specified. The user can quickly search for labels by means of the search tools as shown in the Figures and select them with simple user interface action such as a click of a mouse. Moreover, groupings of conceptually or causally related highlighted spans of text can be created very quickly with simple user interface actions using a keyboard, mouse, or combination thereof, as explained above.

While the illustrated embodiment has described an interface and tools for assisting in labeling transcripts of medical encounters, the principles of this disclosure could be applied to other situations. In particular, a predefined list of labels is generated for entities and attributes of those entities, e.g., listing all the possible categories or classes of words of interest in a transcript and attributes associated with each of the categories or classes, analogous to the attributes of medical entities. The user interface actions described above would generally be performed in the same way, that is the scribe labeler would read the transcript and highlight words or other spans of text that are relevant to the annotation task, using simple user interface tools, and then tools would be enabled by which the scribe labeler could search through the available labels and assign them to the highlighted spans of text. Additionally, grouping tools are provided to form groups of related highlighted spans of text. The result is an annotated transcript. The methods could have usefulness in other types of transcripts, such as deposition or trial transcripts in the context of the legal profession, hearing transcripts of testimony of governmental bodies, etc.

An example of a list of labels for use in annotation of medical transcripts is set forth below in Table 1. It will be understood of course that variation from the list is possible and that in other contexts other labels will be defined. In the list, Entity 1 is a medical entity and Entity 2 is either a subcategory of the medical entity of Entity 1 or an attribute of the medical entity, and Entity 3 is either an attribute of the medical entity or a further subcategory of the medical entity of Entity 1 in a hierarchical schema. Table 1

Meds Benefit No Meds: Benefit: No

Meds Dosage Meds: Dosage:

Meds Quantity Meds:Quantity:

Frequency/Duratio

Meds n Meds: Frequency/Duration:

Instructions/Directi

Meds ons Meds: Instructions/Directions:

Route of

Meds Administration Meds: Route of Administration:

Meds Indication Meds:lndication:

Meds Allergy Meds:Allergy:

Meds Allergy Yes Meds:Allergy:Yes

Meds Allergy No Meds:Allergy:No

Meds Allergy Reaction Meds: Allergy: Reaction

Medical

Equipment Medical Equipment::

Medical

Medical Physician's Equipment: Physician's Equipment Intended Status Intended Status:

Medical

Equipment: Physician's

Medical Physician's Intended Status: Active,

Equipment Intended Status Active, Continued Continued

Medical

Equipment: Physician's

Medical Physician's Intended Status: Active,

Equipment Intended Status Active, Modified Modified

Medical

Equipment: Physician's Intended

Medical Physician's Recommended / To Status: Recommended / To

Equipment Intended Status Start Start

Medical

Equipment: Physician's Intended

Medical Physician's Completed/Finished/ Status:Completed/Finished/St Equipment Intended Status Stopped opped

Medical Patient's actual Medical Equipment: Patient's

Equipment use actual use:

Medical Patient's actual Medical Equipment: Patient's

Equipment use Yes, Regularly actual use:Yes, Regularly

Medical Patient's actual Medical Equipment: Patient's

Equipment use Yes, Intermittently actual use: Yes, Intermittently

Medical Patient's actual Medical Equipment: Patient's

Equipment use Yes, as Needed actual use:Yes, as Needed

Medical Patient's actual Medical Equipment: Patient's

Equipment use Stopped actual use:Stopped

Medical Patient's actual Medical Equipment: Patient's

Equipment use No actual use: No

Medical Medical Equipment:Side

Equipment Side Effect Effect:

Medical Side Effect Experienced Medical Equipment:Side Equipment Effect: Experienced

Medical Medical Equipment:Side Equipment Side Effect No Effect: No

Medical

Equipment Benefit Medical Equipment: Benefit:

Medical

Medical Equipment: Benefit: Experience Equipment Benefit Experienced d

Medical

Equipment Benefit No Medical Equipment: Benefit: No

Medical

Equipment Dosage Medical Equipment: Dosage:

Medical

Equipment Quantity Medical Equipment:Quantity:

Medical

Medical Frequency/Duratio Equipment: Frequency/Duratio Equipment n n:

Medical

Medical Instructions/Directi Equipment: Instructions/Directi

Equipment ons ons:

Medical

Equipment Indication Medical Equipment:lndication:

Condition Condition::

Condition Status Condition:Status:

Condition Status Active Condition:Status:Active

Condition Status Recurrence Condition:Status: Recurrence

Condition Status Inactive Condition:Status: Inactive

Condition Status Remission Condition:Status: Remission

Condition Status Resolved Condition:Status: Resolved

Time of Onset / Condition:Time of Onset /

Condition Duration Duration:

Condition Physician certainty Condition: Physician certainty:

Condition: Physician

Provisional / certainty: Provisional /

Condition Physician certainty Differential Differential

Condition: Physician

Condition Physician certainty Confirmed certainty:Confirmed

Condition: Physician

Condition Physician certainty Refuted certainty: Refuted

Condition Severity New Condition:Severity:New

Condition Severity Stable Condition:Severity:Stable

Condition Severity Improved Condition:Severity: Improved

Condition Severity Worsening Condition:Severity:Worsening

Condition Family Condition:Family:

History of (First Condition: Family: History of

Condition Family Degree) (First Degree)

History of (Non-first- Condition: Family: History of

Condition Family degree) (Non-first-degree)

Surgery Surgery::

Surgery Status Surgery:Status:

Surgery Status Completed Su rgery : Status : Com pleted Surgery:Status: Planned /

Surgery Status Planned / Anticipated Anticipated

Procedures /

Other Tests Procedures / Other Tests::

Procedures / Procedures / Other

Other Tests Status Tests: Status:

Procedures / Other

Procedures / Tests: Status: Scheduled/U pco Other Tests Status Scheduled/Upcoming ming

Procedures / Procedures / Other

Other Tests Status Completed Tests : Status : Com pleted

Procedures / Procedures / Other

Other Tests Status Not done Tests: Status: Not done

Procedures / Procedures / Other

Other Tests Status Declined/Refused Tests:Status: Declined/Refused

Procedures / Procedures / Other

Other Tests Status Recommended Tests:Status: Recommended

Procedures / Procedures / Other

Other Tests Result Tests: Result:

Procedures / Other

Procedures / Tests: Result: Value/Result/Find

Other Tests Result Value/Result/Finding ing

Procedures / Procedures / Other

Other Tests Result Normal Tests: Result: Normal

Procedures / Procedures / Other

Other Tests Result Abnormal Tests: Result: Abnormal

Labs Labs::

Labs Status Labs:Status:

Labs:Status:Scheduled/Upcom

Labs Status Scheduled/Upcoming ing

Labs Status Completed Labs:Status:Completed

Labs Status Declined/Refused Labs:Status: Declined/Refused

Labs Status Recommended Labs:Status: Recommended

Labs: Result: Value/Result/Findi

Labs Result Value/Result/Finding ng

Labs Result Normal Labs: Result: Normal

Labs Result Abnormal Labs: Result: Abnormal

Imaging Imaging::

Imaging Status Imaging: Status:

lmaging:Status:Scheduled/Upc

Imaging Status Scheduled/Upcoming oming

Imaging Status Completed lmaging:Status:Completed lmaging:Status:Declined/Refus

Imaging Status Declined/Refused ed

Imaging Status Recommended lmaging:Status: Recommended

Imaging: Result: Value/Result/Fi

Imaging Result Value/Result/Finding nding

Imaging Result Normal Imaging: Result: Normal

Imaging Result Abnormal Imaging: Result:Abnormal

Vaccine Vaccine::

Vaccine Status Vaccine: Status: Vacci ne : Status : Sched uled/U pc

Vaccine Status Scheduled/Upcoming oming

Vaccine Status Completed Vacci ne : Status : Com pleted

Vaccine:Status:Declined/Refus

Vaccine Status Declined/Refused ed

Vaccine Status Recommended Vaccine: Status: Recommended

Provider Provider::

Provider Type Provider: Type:

Provider: Type: Physician/Practi

Provider Type Physician/Practitioner tioner

Other Health Provider: Type:0ther Health

Provider Type Professional Professional

Provider Status of Referral Provider: Status of Referral:

Provider: Status of

Recommended / To Referral: Recommended / To

Provider Status of Referral Start Start

Provider: Status of Referral n¬

Provider Status of Referral On-going going

Discontinued/Stoppe Provider: Status of

Provider Status of Referral d Referral: Discontinued/Stopped

Provider: Status of

Provider Status of Referral Requested Referral: Requested

Urgent/Emergency Provider: Urgent/Emergency

Provider Care Care:

Provider Hospital Provider: Hospital:

Provider Follow-up Visit Provider: Follow-up Visit:

Patient

instructions/e Patient

ducation/reco instructions/education/recomm mmendation endation::

Social Hx Social Hx::

Lifestyle/Wellness Social Hx: Lifestyle/Wellness

Social Hx Habits Habits:

Social Hx Tobacco Social Hx:Tobacco:

Social Hx Tobacco Active Social Hx:Tobacco:Active

Second Hand Social Hx:Tobacco:Second

Social Hx Tobacco Smoking Hand Smoking

Social Hx Tobacco Former Social Hx:Tobacco: Former

Social Hx Tobacco Never Social Hx:Tobacco: Never

Current Social Hx:Tobacco:Current

Social Hx Tobacco Quantity/Freq Quantity/Freq

Social Hx:Tobacco: Former

Social Hx Tobacco Former Quantity/Freq Quantity/Freq

Social Hx Tobacco Counseling Social Hx:Tobacco:Counseling

Social Hx Alcohol Social Hx:Alcohol:

Social Hx Alcohol Active Social Hx:Alcohol:Active

Social Hx Alcohol Denies Social Hx:Alcohol: Denies

Social Hx Alcohol Former Social Hx:Alcohol: Former

Social Hx Alcohol Never Social Hx:Alcohol: Never

Current Social Hx:Alcohol:Current

Social Hx Alcohol Quantity/Freq Quantity/Freq Social Hx:Alcohol: Former

Social Hx Alcohol Former Quantity/Freq Quantity/Freq

Social Hx Alcohol Counseling Social Hx:Alcohol:Counseling

Marijuana or Drug Social Hx:Marijuana or Drug

Social Hx Use Use:

Marijuana or Drug Social Hx: Marijuana or Drug

Social Hx Use Active Use:Active

Marijuana or Drug Social Hx:Marijuana or Drug

Social Hx Use Former Use: Former

Marijuana or Drug Social Hx:Marijuana or Drug

Social Hx Use Never Use: Never

Marijuana or Drug Current Social Hx: Marijuana or Drug

Social Hx Use Quantity/Freq Use:Current Quantity/Freq

Marijuana or Drug Social Hx:Marijuana or Drug

Social Hx Use Former Quantity Use: Former Quantity

Marijuana or Drug Social Hx:Marijuana or Drug

Social Hx Use Counseling Use:Counseling

Socio Economic Social Hx:Socio Economic

Social Hx Status Status:

Socio Economic Social Hx:Socio Economic

Social Hx Status Home Status: Home

Socio Economic Social Hx:Socio Economic

Social Hx Status Occupation Status: Occupation

Socio Economic Social Hx:Socio Economic

Social Hx Status Insurance Status: Insurance

Social Hx Logistics Social Hx: Logistics:

Social

Social Hx Logistics Transportation Hx: Logistics:Transportation

Social Hx Sexual History Social Hx:Sexual History:

Social Hx:Sexual

Social Hx Sexual History Active History: Active

Social Hx:Sexual

Social Hx Sexual History Inactive History: Inactive

Social Hx:Sexual

Social Hx Sexual History Never History: Never

Social Hx:Sexual

Social Hx Sexual History Quantity of Partners History: Quantity of Partners

Social Hx Travel History Social Hx:Travel History:

Code Status Code Status / End Code Status / End of

/ End of Life of Life Life: Code Status / End of Life:

Reproductive

Hx Reproductive Hx::

Reproductive Gravida (Number Reproductive Hx: Gravida Hx of Pregnancies) (Number of Pregnancies):

Parity (Number of

Births Carried to a Reproductive Hx: Parity

Reproductive Viable Gestational (Number of Births Carried to a Hx Age) Viable Gestational Age):

Reproductive Number of Reproductive Hx: Number of Hx Premature Births Premature Births:

Reproductive Number of Natural Reproductive Hx: Number of Hx Abortions / Natural Abortions / Miscarriages Miscarriages:

Reproductive Number of Living Reproductive Hx: Number of Hx Children Living Children:

Reproductive Currently Reproductive Hx:Currently Hx Pregnant Pregnant:

Reproductive Current Reproductive Hx:Current Hx Gestational Age Gestational Age:

Anticipating

Planned or Reproductive Hx:Anticipating

Reproductive Unplanned Planned or Unplanned

Hx Pregnancy Pregnancy:

Reproductive Reproductive Hx: Infertility Hx Infertility Issue Issue:

Reproductive

Hx IVF Reproductive Hx:IVF:

Reproductive Last Menstrual Reproductive Hx:Last

Hx Period Menstrual Period:

Reproductive Menarche (Time Reproductive Hx: Menarche Hx of First Period) (Time of First Period):

Vitals Ht Vitals: Ht:

Vitals Ht Value/Result/Finding Vitals: Ht: Value/Result/Finding

Vitals Ht Normal Vitals: Ht: Normal

Vitals Ht Abnormal Vitals: Ht: Abnormal

Vitals Wt Vitals:Wt:

Vitals Wt Value/Result/Finding Vitals:Wt:Value/Result/Finding

Vitals Wt Normal Vitals: Wt: Normal

Vitals Wt Abnormal Vitals:Wt:Abnormal

Vitals BMI Vitals:BMI:

Vitals: BM 1 : Value/Result/Findin

Vitals BMI Value/Result/Finding 9

Vitals BMI Normal Vitals:BMI:Normal

Vitals BMI Abnormal Vitals:BMI:Abnormal

Vitals Temp Vitals:Temp:

Vitals:Temp:Value/Result/Findi

Vitals Temp Value/Result/Finding ng

Vitals Temp Normal Vitals:Temp:Normal

Vitals Temp Abnormal Vitals:Temp:Abnormal

Vitals HR Vitals:HR:

Vitals HR Value/Result/Finding Vitals:HR:Value/Result/Finding

Vitals HR Normal Vitals:HR:Normal

Vitals HR Abnormal Vitals:HR:Abnormal

Vitals BP Vitals: BP:

Vitals BP Value/Result/Finding Vitals: BP: Value/Result/Finding

Vitals BP Normal Vitals: BP: Normal

Vitals BP Abnormal Vitals: BP: Abnormal

Vitals Resp Rate Vitals: Resp Rate:

Vitals: Resp

Vitals Resp Rate Value/Result/Finding Rate: Value/Result/Finding

Vitals Resp Rate Normal Vitals: Resp Rate: Normal

Vitals Resp Rate Abnormal Vitals: Resp Rate: Abnormal

Vitals 02 Vitals:02: Vitals 02 Value/Result/Finding Vitals:02:Value/Result/Finding

Vitals 02 Normal Vitals:02: Normal

Vitals 02 Abnormal Vitals:02:Abnormal

Exam General Exam: General:

Exam:General:Value/Result/Fi

Exam General Value/Result/Finding nding

Exam Const Exam: Const:

Exam:Const: Value/Result/Find

Exam Const Value/Result/Finding ing

Exam Eyes Exam: Eyes:

Exam: Eyes: Value/Result/Findi

Exam Eyes Value/Result/Finding ng

Exam ENMT Exam: ENMT:

Exam: ENMT: Value/Result/Fin

Exam ENMT Value/Result/Finding ding

Exam Dental Exam: Dental:

Exam: Dental: Value/Result/Fin

Exam Dental Value/Result/Finding ding

Exam Neck Exam: Neck:

Exam: Neck: Value/Result/Findi

Exam Neck Value/Result/Finding ng

Exam Resp/Pulm Exam:Resp/Pulm:

Exam: Resp/Pulm: Value/Result

Exam Resp/Pulm Value/Result/Finding /Finding

Exam CV Exam:CV:

Exam CV Value/Result/Finding Exam:CV: Value/Result/Finding

Exam Lymph Exam: Lymph:

Exam: Lymph: Value/Result/Fin

Exam Lymph Value/Result/Finding ding

Exam GU Exam:GU:

Exam:GU:Value/Result/Findin

Exam GU Value/Result/Finding g

Exam MSK Exam: MSK:

Exam:MSK:Value/Result/Findi

Exam MSK Value/Result/Finding ng

Exam Derm Exam: Derm:

Exam: Derm: Value/Result/Findi

Exam Derm Value/Result/Finding ng

Exam Neuro Exam: Neuro:

Exam: Neuro: Value/Result/Find

Exam Neuro Value/Result/Finding ing

Exam Abd Exam: Abd:

Exam:Abd:Value/Result/Findin

Exam Abd Value/Result/Finding 9

Exam Breast Exam: Breast:

Exam: Breast: Value/Result/Fin

Exam Breast Value/Result/Finding ding

Exam Rectal Exam: Rectal:

Exam: Rectal: Value/Result/Fin

Exam Rectal Value/Result/Finding ding

Exam Prostate Exam Prostate:

Sym GU Vaginal Discharge Sym:GU:Vaginal Discharge:

Sym Suggest Entity Sym:Suggest Entity::