PATIENT HEALTH RECORD SIMILARITY MEASURE

Title:

PATIENT HEALTH RECORD SIMILARITY MEASURE

Document Type and Number:

WIPO Patent Application WO/2014/052921

Kind Code:

Abstract:

Computer-implemented methods for determining optimal treatments for a patient can include identifying successful treatments used in cohorts of persons considered similar to the patient. Tools implementing such methods can use biological sequence analysis techniques to identify practices best suited for the patient.

Inventors:

FREY LEWIS (US)
LENERT LESLIE (US)

Application Number:

PCT/US2013/062460

Publication Date:

April 03, 2014

Filing Date:

September 27, 2013

Export Citation:

Click for automatic bibliography generation Help

Assignee:

UNIV UTAH RES FOUND (US)

International Classes:

G16H10/60

Foreign References:

US20100312798A1	2010-12-09
US20120016206A1	2012-01-19
US20120010867A1	2012-01-12
US20090203533A1	2009-08-13
US20110041080A1	2011-02-17
US20070106536A1	2007-05-10

Attorney, Agent or Firm:

HILL, James, W. (Suite 1700Irvine, CA, US)

Download PDF:

View/Download PDF PDF Help

Claims:

WHAT IS CLAIMED IS:

1. A system for identifying treatment for a patient, comprising:

a patient data file input module configured to receive, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; and a processing module, wherein the processing module is configured to:

annotate each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, create a first time-sequential record of the patient, comprising each patient session; compare the first sequential record to other time-sequential records, of other patients; identify a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identify, by a processor, a health care intervention that was most effective for the cohort.

2. The system of claim 1 , further comprising an output module configured to output the identified intervention with an indication of a degree of effectiveness of the intervention.

3. The system of claim 1 , wherein the processor is configured to use a natural language processing technique to identify the respective intervention.

4. The system of claim 1 , wherein the processor is configured to use a dynamic programming algorithm to obtain the cohort of similar sequential records.

5. The system of claim 1 , wherein the processor is configured to receive data from a clinician prioritizing the significance of the respective intervention.

6. The system of claim 1 , wherein the processor is configured to identify health care interventions that were effective in the cohort of patients having similar sequential records for patients.

7. The system of claim 1 , wherein the treatment is for a cancer patient.

8. The system of claim 1 , further comprising an output module configured to output the identified treatment with an indication of a degree of effectiveness of the intervention.

9. The system of claim 1, wherein the processor is configured to identify the intervention using a natural language processing technique.

10. The system of claim, 1 , wherein the processor is configured to use a dynamic programming algorithm to obtain the cohort of similar sequential records.

1 1. The system of claim 1, wherein the processor is configured to receive data from a clinician prioritizing the significance of the respective cancer treatment.

12. The system of claim 1 , wherein the other time-sequential records comprise:

(a) responsive scores corresponding to a first patient being responsive to a health care intervention;

(b) unresponsive scores corresponding to a second patient being unresponsive to the health care intervention;

(c) improving scores corresponding to a third patient transitioning from being unresponsive to being responsive to the health care intervention; and

(d) degrading scores corresponding to a fourth patient transitioning from being responsive to being unresponsive to the health care intervention.

13. The system of claim 1 , wherein the processing module is further configured to categorize each entry of the first time-sequential record into one of a plurality of level indicators.

14. The system of claim 1 , wherein the processing module is further configured to apply a substitution matrix to assign penalties for substituting an indicator of the first sequential record for an indicator of one of the other time-sequential records.

15. The system of claim 14, wherein the penalties comprise ( 1 ) a first penalty for starting a gap between the first sequential record and one of the other time-sequential records (2) a second penalty for continuing a gap between the first sequential record and the one of the other time- sequential records.

16. The system of claim 1 , wherein the first time-sequential record comprises (1 ) a sequential indicator of the patient and (2) a non-sequential indicator of the patient.

17. The system of claim 1 , further comprising, determining a predictive feature of each of the indicators with respect to the degree of similarity between the first sequential record and the other time-sequential records.

18. The system of claim 17, further comprising, ranking the indicators according to the predictive features.

19. The system of claim 17, wherein the predictive feature is one of a positive predictive value, a negative predictive value, a sensitivity, and a specificity.

20. A method of predicting an outcome for a patient, comprising:

(a) receiving, by a processor, data files, each of the files representing results of a diagnostic test;

(b) annotating each of the files with a respective indicator of a time associated with the respective test, to create a respective patient test result;

(c) based on the indicators, creating a first time-sequential record of the patient, comprising each patient test result;

(d) comparing the first sequential record to other time-sequential records, of other patients;

(e) identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and

(f) predicting, by a processor and based on the cohort, a future test result of the patient, the future test result having a significant probability of being out of a predetermined range.

21 . The method of claim 20, further comprising outputting, to an output device, the future test result.

22. The method of claim 20, wherein the respective test annotated with the respective indicator of time is identified using a natural language processing technique.

23. The method of claim 20, wherein in step (e) a dynamic programming algorithm is used to obtain the cohort of similar sequential records.

24. The method of claim 20, further comprising the step of predicting when the future diagnostic test will go out of predetermined range for the patient.

25. The method of claim 20, wherein step (e) further comprises prioritizing, by a clinician, the significance of the respective test.

26. The method of claim 20, wherein step (e) further comprises a step of identifying, by a processor diagnostic tests that were out of predetermined range in the cohort of patients having similar sequential records for patients.

27. The method of claim 20, wherein the patient is a cancer patient.

28. The method of claim 27, wherein the respective test result annotated with the respective indicator of time is identified using a natural language processing technique.

29. The method of claim 27, wherein in step (e) a dynamic programming algorithm is used to obtain the cohort of similar sequential records.

30. The method of claim 27, further comprising the step of predicting when the future diagnostic is predicted to go out of a predetermined range for the patient.

31 . The method of claim 20, wherein the other time-sequential records comprise:

(a) normal scores corresponding to a first patient not having a disease state across a time period;

(b) unresponsive scores corresponding to a second patient having the disease state across the time period;

(d) degrading scores corresponding to a fourth patient succumbing to the disease state across the time period.

32. The method of claim 20, wherein creating the first time-sequential record comprises categorizing each entry of the first time-sequential record into one of a plurality of level indicators.

33. The method of claim 20, wherein comparing the first sequential record to the other time- sequential records comprises applying a substitution matrix to assign penalties for substituting an indicator of the first sequential record for an indicator of one of the other time-sequential records.

34. The method of claim 33, wherein the penalties comprise (1 ) a first penalty for starting a gap between the first sequential record and one of the other time-sequential records (2) a second penalty for continuing a gap between the first sequential record and the one of the other time- sequential records.

35. The method of claim 20, wherein the first time-sequential record comprises (1 ) a sequential indicator of the patient and (2) a non-sequential indicator of the patient.

36. The method of claim 20, further comprising, determining a predictive feature of each of the indicators with respect to the degree of similarity between the first sequential record and the other time-sequential records.

37. The method of claim 36, further comprising, ranking the indicators according to the predictive features.

38. The method of claim 36, wherein the predictive feature is one of a positive predictive value, a negative predictive value, a sensitivity, and a specificity.

39. A system for predicting patient test results for a patient, comprising: a patient data file input module configured to receive, by a processor, data files, each of the files representing a patient test result; and a processing module, wherein the processing module is configured to: annotate each of the files with a respective indicator of a time associated with the respective diagnostic test, to create a respective patient test record; based on the indicators, create a first time-sequential test record of the patient, comprising each patient diagnostic test; compare the first sequential test record to other time-sequential records, of other patients; identify a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and predict, by a processor, a future patient test result having a significant probability of being out of a predetermined range,

40. The system of claim 39, further comprising an output module configured to output the identified future patient test result.

41. The system of claim 39, wherein the processor is configured to use a dynamic programming algorithm to obtain the cohort of similar sequential records.

42. The system of claim 39, wherein the processor is configured to predict when the future diagnostic test will go out of predetermined range for the patient.

43. The system of claim 39, wherein the processor is configured to receive data from a clinician prioritizing the significance of the respective test result.

44. The system of claim 39, wherein the processor is configured to identify patient test results that were out of predetermined range for the cohort.

45. The system of claim 39, wherein the patient is a cancer patient.

46. The system of claim, 45, wherein the processor is configured to use a dynamic programming algorithm to obtain the cohort of similar sequential records.

47. The system of claim 45, further comprising the step of predicting when the future diagnostic test will go out of predetermined range for the patient.

48. The method of claim 39, wherein the other time-sequential records comprise:

(a) normal scores corresponding to a first patient not having a disease state across a time period;

(b) unresponsive scores corresponding to a second patient having the disease state across the time period;

(d) degrading scores corresponding to a fourth patient succumbing to the disease state across the time period.

49. The system of claim 39, wherein the processing module is further configured to categorize each entry of the first time-sequential record into one of a plurality of level indicators.

50. The system of claim 39, wherein the processing module is further configured to apply a substitution matrix to assign penalties for substituting an indicator of the first sequential record for an indicator of one of the other time-sequential records.

51 . The system of claim 50, wherein the penalties comprise ( 1 ) a first penalty for starting a gap between the first sequential record and one of the other time-sequential records (2) a second penalty for continuing a gap between the first sequential record and the one of the other time- sequential records.

52. The system of claim 39, wherein the first time-sequential record comprises (1) a sequential indicator of the patient and (2) a non-sequential indicator of the patient.

53. The system of claim 39, further comprising, determining a predictive feature of each of the indicators with respect to the degree of similarity between the first sequential record and the other time-sequential records.

54. The system of claim 53, further comprising, ranking the indicators according to the predictive features.

55. The system of claim 53, wherein the predictive feature is one of a positive predictive value, a negative predictive value, a sensitivity, and a specificity.

56. A method of identifying an event leading to a target selection by a user, comprising:

(a) receiving, by a processor, data files, each of the files representing an encounter between the user and a user interface;

(b) annotating each of the files with a respective indicator of a time associated with the encounter, to create a respective user session;

(d) comparing the user sequential record to other time-sequential records, of other users;

(e) identifying a cohort of users having similar sequential records by determining which of the other sequential records have a degree of similarity to the user sequential record; and

(f) identifying, by a processor, an event that most frequently precedes a target selection by the cohort.

57. The method of claim 56, further comprising displaying the event to the user via the user interface.

58. The method of claim 56, wherein the event comprises a display provided to the user via the user interface.

59. The method of claim 56, wherein the event comprises an input provided by the user to the user interface.

60. The method of claim 56, wherein the target selection is a purchase executed by the user via the user interface.

61 . The method of claim 56, wherein the user interface is a website.

62. The method of claim 56, wherein comparing the user sequential record to the other time- sequential records comprises applying a substitution matrix to assign penalties for substituting an indicator of the user sequential record for an indicator of one of the other time-sequential records.

63. The method of claim 62, wherein the penalties comprise (1 ) a first penalty for starting a gap between the user sequential record and one of the other time-sequential records (2) a second penalty for continuing a gap between the user sequential record and the one of the other time- sequential records.

64. The method of claim 56, wherein the user time-sequential record comprises ( 1 ) a sequential indicator of the user and (2) a non-sequential indicator of the user.

65. The method of claim 56, further comprising, determining a predictive feature of each of the indicators with respect to the degree of similarity between the user sequential record and the other time-sequential records.

66. The method of claim 65, further comprising, ranking the indicators according to the predictive feature.

67. The method of claim 65, wherein the predictive feature is one of a positive predictive value, a negative predictive value, a sensitivity, and a specificity.

68. A system for identifying an event leading to a target selection by a user, comprising: a user data file input module configured to receive, by a processor, data files, each of the files representing an encounter between the user and a user interface; and a processing module, wherein the processing module is configured to: annotate each of the files with a respective indicator of a time associated with the encounter, to create a respective user session; based on the indicators, create a user time-sequential record of the user, comprising each user session;

compare the user sequential record to other time-sequential records, of other users; identify a cohort of users having similar sequential records by determining which of the other sequential records have a degree of similarity to the user sequential record; and identify an event that most frequently precedes a target selection by the cohort.

69. The system of claim 68, further comprising a display module configured to display the event to the user via the user interface.

70. The system of claim 68, wherein the event comprises an input provided by the user to the user interface.

71. The system of claim 68, wherein the target selection is a purchase executed by the user via the user interface.

72. The system of claim 68, wherein the user interface is a website.

73. The system of claim 68, wherein the processing module is further configured to apply a substitution matrix to assign penalties for substituting an indicator of the user sequential record for an indicator of one of the other time-sequential records.

74. The system of claim 73, wherein the penalties comprise (1) a first penalty for starting a gap between the user sequential record and one of the other time-sequential records (2) a second penalty for continuing a gap between the user sequential record and the one of the other time- sequential records.

75. The system of claim 68, wherein the user time-sequential record comprises (1 ) a sequential indicator of the user and (2) a non-sequential indicator of the user.

76. The system of claim 68, wherein the processing module is further configured to determine a predictive feature of each of the indicators with respect to the degree of similarity between the user sequential record and the other time-sequential records.

77. The system of claim 76, wherein the processing module is further configured to rank the indicators according to the predictive feature.

78. The system of claim 76, wherein the predictive feature is one of a positive predictive value, a negative predictive value, a sensitivity, and a specificity.

79. A method of identifying a published event leading to a target market event, comprising:

(a) receiving, by a processor, data files, each of the files representing a sequence comprising a first published event and a first market event, occurring after the first event;

(b) annotating each of the files with a respective indicator of a time associated with the sequence, to create a respective event session;

(d) comparing the first sequential record to second time-sequential records, of other sequences comprising second published events and second market events, each occurring after a respective one of the second events;

(e) identifying a cohort of sequences having similar sequential records by determining which of the second sequential records have a degree of similarity to the first sequential record; and

(f) identifying, by a processor, an identified published event that most frequently precedes a target market event.

80. The method of claim 79, further comprising outputting, to an output device, the identified published event.

81 . The method of claim 79, wherein the published event comprises publication of a news article.

82. The method of claim 79, wherein the target market event is a change of a value of an asset.

83. The method of claim 79, wherein comparing the first sequential record to the second time-sequential records comprises applying a substitution matrix to assign penalties for substituting an indicator of the first sequential record for an indicator of one of the second time-sequential records.

84. The method of claim 83, wherein the penalties comprise (1 ) a first penalty for starting a gap between the user sequential record and one of the other time-sequential records (2) a second penalty for continuing a gap between the first sequential record and the one of the second time- sequential records.

85. The method of claim 79, further comprising, determining a predictive feature of each of the indicators with respect to the degree of similarity between the first sequential record and the second time-sequential records.

86. The method of claim 85, further comprising, ranking the indicators according to the predictive feature.

87. The method of claim 85, wherein the predictive feature is one of a positive predictive value, a negative predictive value, a sensitivity, and a specificity.

88. A system for identifying a published event leading to a target market event, comprising:

a user data file input module configured to receive, by a processor, data files, each of the files representing sequence comprising a first published event and a first market event, occurring after the first event; and a processing module, wherein the processing module is configured to: annotate each of the files with a respective indicator of a time associated with the sequence, to create a respective event session; based on the indicators, create a first time-sequential record of the user, comprising each event session; compare the first sequential record to second time-sequential records, of other sequences comprising second published events and second market events, each occurring after a respective one of the second events; identify a cohort of sequences having similar sequential records by determining which of the second sequential records have a degree of similarity to the first sequential record; and identify an identified published event that most frequently precedes a target market event.

89. The system of claim 88, further comprising a display module configured to display the identified published event.

90. The system of claim 88, wherein the published event comprises publication of a news article.

91. The system of claim 88, wherein the target market event is a change of a value of an asset.

92. The system of claim 88, wherein the processing module is further configured to apply a substitution matrix to assign penalties for substituting an indicator of the user sequential record for an indicator of one of the other time-sequential records.

93. The system of claim 92, wherein the penalties comprise (1) a first penalty for starting a gap between the user sequential record and one of the other time-sequential records (2) a second penalty for continuing a gap between the first sequential record and the one of the second time- sequential records.

94. The system of claim 88, wherein the user time-sequential record comprises (1 ) a sequential indicator of the asset and (2) a non-sequential indicator of the asset.

95. The system of claim 88, wherein the processing module is further configured to determine a predictive feature of each of the indicators with respect to the degree of similarity between the first sequential record and the second time-sequential records.

96. The system of claim 95, wherein the processing module is further configured to rank the indicators according to the predictive feature.

97. The system of claim 95, wherein the predictive feature is one of a positive predictive value, a negative predictive value, a sensitivity, and a specificity.

98. A method of identifying treatment for a patient, comprising:

(a) receiving, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention;

(b) annotating each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session;

(d) comparing the first sequential record to other time-sequential records, of other patients;

(e) identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and (f) identifying, by a processor, a health care intervention that was most effective for the cohort.

99. The method of claim 98, further comprising outputting, to an output device, the identified intervention with an indication of a degree of effectiveness of the intervention.

100. The method of claim 98, wherein the respective intervention annotated with the respective indicator of time is identified using a natural language processing technique.

101 . The method of claim 98, wherein in step (e) a dynamic programming algorithm is used to obtain the cohort of similar sequential records.

102. The method of claim 98, wherein step (e) further comprises prioritizing, by a clinician, the significance of the respective intervention.

103. The method of claim 98, wherein the treatment is for cancer.

104. The method of claim 103, further comprising outputting, to an output device, the identified treatment with an indication of a degree of effectiveness of the intervention.

105. The method of claim 103, wherein the respective intervention annotated with the respective indicator of time is identified using a natural language processing technique.

106. The method of claim 103, wherein in step (e) a dynamic programming algorithm is used to obtain the cohort of similar sequential records.

107. The method of claim 98, wherein the other time-sequential records comprise:

(a) responsive scores corresponding to a first patient being responsive to a health care intervention;

(b) unresponsive scores corresponding to a second patient being unresponsive to the health care intervention;

(c) improving scores corresponding to a third patient transitioning from being unresponsive to being responsive to the health care intervention; and

(d) degrading scores corresponding to a fourth patient transitioning from being responsive to being unresponsive to the health care intervention.

108. The method of claim 98, wherein creating the first time-sequential record comprises categorizing each entry of the first time-sequential record into one of a plurality of level indicators.

109. The method of claim 98, wherein comparing the first sequential record to the other time-sequential records comprises applying a substitution matrix to assign penalties for substituting an indicator of the first sequential record for an indicator of one of the other time-sequential records.

1 10. The method of claim 109, wherein the penalties comprise ( 1 ) a first penalty for starting a gap between the first sequential record and one of the other time-sequential records (2) a second penalty for continuing a gap between the first sequential record and the one of the other time-sequential records.

1 1 1 . The method of claim 98, wherein the first time-sequential record comprises (1 ) a sequential indicator of the patient and (2) a non-sequential indicator of the patient.

1 12. The method of claim 98, further comprising, determining a predictive feature of each of the indicators with respect to the degree of similarity between the first sequential record and the other time-sequential records.

1 13. The method of claim 1 12, further comprising, ranking the indicators according to the predictive features.

1 14. The method of claim 1 12, wherein the predictive feature is one of a positive predictive value, a negative predictive value, a sensitivity, and a specificity.

Description:

PATIENT HEALTH RECORD SIMILARITY MEASURE Related Applications

[0001] This application claims priority to U.S. App. Ser. No. 13/629,465, filed on September 27, 2012 and U.S. App. Ser. No. 13/629,473, filed on September 27, 2012, the entire contents of which are incorporated herein by reference.

Government License Rights

[0002] This invention was made with government support under 1 R01 GM 108346-01 awarded by the National Institute of General Medical Sciences. The Government has certain rights to this invention.

Field

[0003] The subject technology relates to methods of identifying effective treatments for patients using biological sequence analysis techniques.

Background

[0004] A patient's electronic medical record contains data that can be used by a cl inician to evaluate and treat a patient. A collection of patient electronic medical records may be vast in amount, especially when clinicians provide long-term care to patients or when clinicians provide care to many different patients. Processing this large amount of medical information can therefore be difficult.

Summary

[0005] The subject technology is illustrated, for example, according to various aspects described below. Various examples of aspects of the subject technology are described as clauses. These clauses are provided as examples and do not limit the subject technology. It is noted that any of the dependent clauses may be combined in any combination, and placed into a respective independent clause. The other clauses can be presented in a similar manner.

[0006] In some embodiments is a method of identifying treatment for a patient comprising; receiving, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; annotating each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, creating a first time-sequential record of the patient, comprising each patient session; comparing the first sequential record to other time-sequential records, of other patients; identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identifying, by a processor at least one health care intervention that was most effective for the cohort.

|0007] In some embodiments, the method includes outputting, to an output device, the identified at least one intervention with an indication of a degree of effectiveness of the at least one intervention. In certain embodiments, the respective intervention annotated with the respective indicator of time is identified using a natural language processing technique. In some embodiments, in the step of identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record, a dynamic programming algorithm is used to obtain the cohort of similar sequential records.

[0008] The algorithm comprises

[0009] In some embodiments, the most effective intervention is selected from the group consisting of: drug therapy, inpatient procedures, and outpatient procedures. In some embodiments, the other sequential records comprises over one million sequential records.

[0010] In certain embodiments, the method includes a step of prioritizing, by a clinician, the significance of the respective intervention. In some embodiments, the step of identifying a cohort includes identifying, by a processor, healthcare interventions that were effective in the cohort.

[0011] In some embodiments, the interventions that are annotated are selected from the group consisting of clinical terms, biological terms, genomic terms, and laboratory testing terms. [0012] In some embodiments, is a non-transitory computer-readable medium encoded with a computer program comprising instructions executable by a processor for: receiving, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; annotating each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, creating a first time-sequential record of the patient, comprising each patient session; comparing the first sequential record to other time-sequential records, of other patients; identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identifying, by a processor, at least one health care intervention that was most effective for the cohort.

[0013] In certain embodiments, the instructions further comprise code for outputting, to an output device, the identified at least one intervention with an indication of a degree of effectiveness of the at least one intervention. In some embodiments the instructions include code for annotating the files using a natural language processing technique. In some embodiments, the instructions comprise code for using a dynamic programming algorithm to obtain the cohort of similar sequential records. In some embodiments, the instructions comprise code for using the following algorithm

[0014] In some embodiments, the interventions are selected from the group consisting of: drug therapy, inpatient procedures, and outpatient procedures. In some embodiments, the instructions further comprise code for accessing over one million sequential records. In some embodiments, the instructions further comprise code for inputting, by a clinician, prioritizing data of the significance of the respective intervention. In some embodiments, the instructions further comprise code for identifying, by a processor, healthcare interventions that were effective in the cohort. In some embodiments, the interventions that are annotated are selected from the group consisting of clinical terms, biological terms, genomic terms, and laboratory testing terms. In some embodiments, the instructions further comprise code for processing by distributed computers. [0015] In some embodiments, the instructions further comprise code for processing patient files in an electronic medical record. In some embodiments, disclosed is a computing machine comprising the machine-readable medium encoded with a computer program comprising instructions executable by a processor for: a) receiving, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; annotating each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, creating a first time-sequential record of the patient, comprising each patient session; comparing the first sequential record to other time- sequential records, of other patients; identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identifying, by a processor, at least one health care intervention that was most effective for the cohort.

[0016] In some embodiments, disclosed is a system for identifying treatment for a patient comprising: a patient data file input module configured to receive, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; and a processing module, wherein the processing module is configured to: annotate each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, create a first time-sequential record of the patient, comprising each patient session; compare the first sequential record to other time- sequential records, of other patients; identify a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identify, by a processor, at least one health care intervention that was most effective for the cohort.

[0017] In some embodiments, the system comprises an output module configured to output the identified at least one intervention with an indication of a degree of effectiveness of the at least one intervention. The processor is configured to annotate the files using a natural language processing technique. The processor is configured to use a dynamic programming algorithm to obtain the cohort of similar sequential records. The dynamic programming algorithm comprises,

[0018] In certain embodiments, the processor is configured to annotate an intervention selected from the group consisting of: drug therapy, inpatient procedures, and outpatient procedures. In some embodiments, the processor is configured to access data files comprising over one mil lion sequential records. In some embodiments, the processor is configured to receive data from a clinician prioritizing the significance of the respective intervention. In some embodiments, the processor is configured to identify health care interventions that were effective in the cohort of patients having similar sequential records for patients.

[0019] According to certain embodiments, the processor is configured to annotate the terms from the group consisting of clinical terms, biological terms, genomic terms, and laboratory testing terms. In some embodiments, the system comprises a plurality of distributed computers. In some embodiments, wherein the processor is configured to process patient files in the an electronic medical record.

[0020] In some embodiments, disclosed is a method of identifying cancer treatments for a patient, comprising: receiving, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; annotating each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, creating a first time-sequential record of the patient, comprising each patient session; comparing the first sequential record to other time- sequential records, of other cancer patients; identifying a cohort of cancer patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identifying, by a processor, at least cancer treatment that was most effective for the cohort.

[0021] In some embodiments, the method includes outputting, to an output device, the identified at least one intervention with an indication of a degree of effectiveness of the at least one intervention. In some embodiments, terms annotated in the files are annotated using a natural language processing technique. In some embodiments, in the step of identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record, a dynamic programming algorithm is used to obtain the cohort of similar sequential records.

[0023] In some embodiments, the algorithm comprises

[0022] In some embodiments, the most effective intervention is selected from the group consisting of: drug therapy, inpatient procedures, and outpatient procedures. In some embodiments, the other patient records comprises over one million sequential records.

[0023] In certain embodiments, the method includes a step of prioritizing, by a clinician, the significance of the respective intervention.

[0024] In some embodiments, the terms annotated are selected from the group consisting of clinical terms, biological terms, genomic terms, and laboratory testing terms.

[0025] According to some embodiments, a non-transitory computer-readable medium encoded with a computer program comprising instructions executable by a processor to perform a method for identifying a cancer treatment for a patient, the instructions comprising code for: receiving, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; annotating terms in each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, creating a first time-sequential record of the patient, comprising each patient session; comparing the first sequential record to other time-sequential records, of other patients; identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identifying, by a processor, at least one health care intervention that was most effective for the cohort.

[0026] In some embodiments, the instructions further comprise code for outputting, to an output device, the identified at least one intervention with an indication of a degree of effectiveness of the at least one intervention. In some embodiments the instructions include code for annotating the files using a natural language processing technique. In some embodiments, the instructions comprise code for using a dynamic programming algorithm to obtain the cohort of similar sequential records. In some embodiments, the instructions comprise code for using the following algorithm

[0027] In some embodiments, the instructions further comprise code for annotating an intervention selected from the group consisting of: drug therapy, inpatient procedures, and outpatient procedures. In some embodiments, the instructions further comprise code for accessing over one million sequential records. In some embodiments, the instructions further comprise code prioritizing, by a clinician, the significance of the respective intervention. In some embodiments, the instructions further comprise code for identifying, by a processor cancer treatments that were effective in the cohort of patients having similar sequential records for patients. In some embodiments, the instructions further comprise code for annotating the terms in step from the group consisting of clinical terms, biological terms, genomic terms, and laboratory testing terms. In some embodiments, the instructions further comprise code for processing by distributed computers.

[0028] According to certain embodiments, the instructions further comprise code for processing patient files in an electronic medical record. In some embodiments, is a computing machine comprising the machine-readable medium encoded with a computer program comprising instructions executable by a processor for: receiving, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; annotating each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, creating a first time-sequential record of the patient, comprising each patient session; comparing the first sequential record to other time- sequential records, of other patients; identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identifying, by a processor, at least one health care intervention that was most effective for the cohort. [0029] In some embodiments is a system a for identifying cancer treatments for a patient, comprising: a patient data file input module configured to receive, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; and a processing module, wherein the processing module is configured to: annotate each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, create a first time-sequential record of the patient, comprising each patient session; compare the first sequential record to other time- sequential records, of other patients; identify a cohort of cancer patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identify, by a processor, at least one cancer treatment that was most effective for the cohort.

[0030] In some embodiments, the system comprises an output module configured to output the identified at least one intervention with an indication of a degree of effectiveness of the at least one intervention. In some embodiments, the processor is configured to annotate the files using a natural language processing technique. In some embodiments, the processor is configured to use a dynamic programming algorithm to obtain the cohort of similar sequential records. In some embodiments, the dynamic programming algorithm comprises,

H(/ - 1, j - 1) + w(a,, b,)match I mismatch

H(, j) = max \,\≤i≤m,\ < j≤n

HO - 1, j) + w{a,,-)deletion

H(, j - 1) + bj)insertion

[0031] In some embodiments, the processor is configured to annotate an intervention selected from the group consisting of: radiation therapy, and drug therapy. In some embodiments, the processor is configured to access data files comprising over one million sequential records. In some embodiments, the processor is configured to receive data from a clinician prioritizing the significance of the respective intervention. In some embodiments, the processor is configured to identify health care interventions that were effective in the cohort of patients having similar sequential records for patients.

[0032] In some embodiments, the processor is configured to annotate the terms from the group consisting of clinical terms, biological terms, genomic terms, and laboratory testing terms. In some embodiments, the system comprises a plurality of distributed computers. In some embodiments, wherein the processor is configured to process patient files in an electronic medical record.

[0033] In some embodiments, one or more of the other time-sequential records includes (a) normal scores corresponding to a first patient not having a disease state across a time period; (b) unresponsive scores corresponding to a second patient having the disease state across the time period; (c) improving scores corresponding to a third patient recovering from the disease state across the time period; and (d) degrading scores corresponding to a fourth patient succumbing to the disease state across the time period.

[0034] Creating the first time-sequential record may include categorizing each entry of the first time-sequential record into one of a plurality of level indicators. Comparing the first sequential record to the other time-sequential records may include applying a substitution matrix to assign penalties for substituting an indicator of the first sequential record for an indicator of one of the other time-sequential records. The penalties may include ( 1 ) a first penalty for starting a gap between the first sequential record and one of the other time-sequential records (2) a second penalty for continuing a gap between the first sequential record and the one of the other time-sequential records. The first time-sequential record may include (1 ) a sequential indicator of the patient and (2) a non-sequential indicator of the patient.

[0035] The methods and systems of the subject technology may determine a predictive feature of each of the indicators with respect to the degree of similarity between the first sequential record and the other time-sequential records. The indicators may be ranked according to the predictive features. The predictive feature may be one of a positive predictive value, a negative predictive value, a sensitivity, and a specificity.

[0036] In some embodiments is a method of identifying an event leading to a target selection by a user, comprising: (a) receiving, by a processor, data files, each of the files representing an encounter between the user and a user interface; (b) annotating each of the files with a respective indicator of a time associated with the encounter, to create a respective user session; (c) based on the indicators, creating a user time-sequential record of the user, comprising each user session; (d) comparing the user sequential record to other time-sequential records, of other users; (e) identifying a cohort of users having similar sequential records by determining which of the other sequential records have a degree of similarity to the user sequential record; and (f) identifying, by a processor, an event that most frequently precedes a target selection by the cohort.

[0037] The method may include displaying the event to the user via the user interface. The event may include a display provided to the user via the user interface. The event may include an input provided by the user to the user interface. The target selection may include a purchase executed by the user via the user interface. The user interface may be a website.

[0038] In some embodiments is a system for identifying an event leading to a target selection by a user, comprising: a user data file input module configured to receive, by a processor, data files, each of the files representing an encounter between the user and a user interface; and a processing module, wherein the processing module is configured to annotate each of the files with a respective indicator of a time associated with the encounter, to create a respective user session; based on the indicators, create a user time-sequential record of the user, comprising each user session; compare the user sequential record to other time-sequential records, of other users; identify a cohort of users having similar sequential records by determining which of the other sequential records have a degree of similarity to the user sequential record; and identify an event that most frequently precedes a target selection by the cohort.

[0039] The system may include a display module configured to display the event to the user via the user interface. The event may include an input provided by the user to the user interface. The target selection may include a purchase executed by the user via the user interface. The user interface may be a website. The processing module may be further configured to apply a substitution matrix to assign penalties for substituting an indicator of the user sequential record for an indicator of one of the other time-sequential records.

[0040] In some embodiments is a method of identifying a published event leading to a target market event, comprising: (a) receiving, by a processor, data files, each of the files representing a sequence comprising a first published event and a first market event, occurring after the first event; (b) annotating each of the files with a respective indicator of a time associated with the sequence, to create a respective event session; (c) based on the indicators, creating a first time- sequential record of the user, comprising each event session; (d) comparing the first sequential record to second time-sequential records, of other sequences comprising second published events and second market events, each occurring after a respective one of the second events; (e) identifying a cohort of sequences having similar sequential records by determining which of the second sequential records have a degree of similarity to the first sequential record; and (f) identifying, by a processor, an identified published event that most frequently precedes a target market event.

[0041] The method may include outputting, to an output device, the identified published event. The published event may include publication of a news article. The target market event may include a change of a value of an asset. Comparing the first sequential record to the second time- sequential records may include applying a substitution matrix to assign penalties for substituting an indicator of the first sequential record for an indicator of one of the second time-sequential records.

[0042] identify an identified published event that most frequently precedes a target market event.

[0043] The system may include a display module configured to display the identified published event. The published event may include publication of a news article. The target market event may include a change of a value of an asset. The processing module may be further configured to apply a substitution matrix to assign penalties for substituting an indicator of the user sequential record for an indicator of one of the other time-sequential records.

[0044] Additional features and advantages of the subject technology will be set forth in the description below, and in part will be apparent from the description, or may be learned by practice of the subject technology. The advantages of the subject technology will be realized and attained by the written description and claims hereof as well as the appended drawings.

[0045] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the subject technology as claimed.

Brief Description of the Drawings

[0046] The accompanying drawings, which are included to provide further understanding of the subject technology and are incorporated in and constitute a part of this specification, illustrate aspects of the subject technology and together with the description serve to explain the principles of the subject technology.

[0047] FIG. 1 shows a flowchart of a method of identifying treatment for a patient, according to some embodiments of the subject technology. [0048] FIG. 2 shows a flowchart of a method of identifying treatment for a patient, according to some embodiments of the subject technology.

[0049] FIG. 3 illustrates a simplified diagram of a system, in accordance with various embodiments of the subject technology.

[0050] FIG. 4 illustrates a simplified block diagram of a server, in accordance with various embodiments of the subject technology.

[0051] FIG. 5 is a conceptual block diagram illustrating an example of a system, in accordance with various embodiments of the subject technology.

Detailed Description

[0052] In the following detailed description, numerous specific details are set forth to provide a full understanding of the subject technology. It will be apparent, however, to one with ordinarily skilled in the art that the subject technology may be practiced without some of these specific details. In other instances, well-known structures and techniques have not been shown in detail so as not to obscure the subject technology.

[0053] According to some embodiments, a method of improving patient outcomes is provided by identifying best practice treatment for cohorts of patients and applying them to new patients that are identified as similar.

[0054] According to some embodiments, a method of predicting laboratory test results in the near-term for patients is provided by identifying patients that have a statistically significant probability of going out of a predetermined range based on patterns or similar cohorts of patients. As used herein, the term "significant probability" means having a statistically significant probability as viewed by a clinician, for example with a p value of less than 0.05. As used herein, the term "predetermined range" means per clinical guidelines or other guidelines. As used herein, the term "test result" means the outcome of a diagnostic test and the term "future test result" means a test result obtained in the future.

[0055] Optimal patient treatment can be achieved by identifying best treatment practices for similar patients. It has been discovered that identifying cohorts of patients who are similar to the patient and applying the best treatment practices found for the cohort may achieve optimal treatment for the patient for whom treatment is sought. Examples of illnesses or conditions for which such a method of applying the best treatment practices found for a similar cohort include cancers, autoimmune diseases, and neurodegenerative diseases. Current cancer treatments include radiation and chemotherapy, which have many serious negative side effects. It is therefore, beneficial to determine a treatment or treatments that may be most effective in a particular patient, prior to commencing any treatment with such negative side effects.

[0056] In addition, prediction of future patient laboratory tests can be valuable in treating and preventing disease. It has been discovered that identifying cohorts of patients who are similar to the patient and analyzing the test results of the cohort of patient may achieve accurate prediction of patient test results, and thereby identify patients that have a high probability of going out of range for a particular test. Such test results may include: blood diagnostic tests, (pressure, cholesterol levels, glucose levels, protein levels), urine analysis, blood platelet levels, tissue biopsies, protein levels, heart rate, and other tests.

[0057] Biological sequence analysis techniques have been used to process DNA, RNA and peptide sequences in order to better elucidate its structure, function, features and transformation. Such biological sequence analysis involves use of biological databases populated by the results of high-throughput production of gene and protein sequences. Comparing new sequences to those with known functions as stored in databases has increased understanding of the biology of an organism from which the new sequence comes. Sequence analysis has also been used to assign function to genes and proteins by the study of the similarities between the compared sequences.

[0058] Two main types of sequence alignment currently exist: pair-wise sequence alignment, which only compares two sequences at a time, and multiple sequence alignment, which compares many sequences at one time. Algorithms may be used to align pairs of sequences. Examples of such algorithms include the are the Needleman-Wunsch algorithm and the Smith- Waterman algorithm. Repeat matching alignment may also be used, in which repeating subsequence motifs are identified in the sequence, overlapping alignments where overhanging ends are not penalized. Hybrid alignment techniques may also be used. These hybrid techniques modify the dynamic programming formula to favor specific structures in the sequences. Complex insertion and deletion penalties that are dependent on the initiation and length of the gap or use an affine gap cost structure may also be used. In addition, heuristic alignment algorithms such as Basic Local Alignment Search Tool (BLAST) (Altschul et al. 1990) and alternate versions of BLAST and FASTA (Pearson & Lipman 1988) may also be used. BLAST uses highly matched short seed sequences from which to extend out the alignment. FASTA is a multistep approach that starts with exact matches, extends to ungapped matches and then identifies gapped alignments.

[0059] The Needleman-Wunsch algorithm (also referred to as the optimal matching algorithm) performs a global alignment on two sequences and may be used to align protein or nucleotide sequences. The Needleman-Wunsch algorithm is an example of dynamic programming, which simplifies a complicated problem by breaking it down into simpler sub-problems in a recursive manner. In this algorithm, scores for aligned characters are specified by a similarity matrix, which is a matrix of scores which express the similarity between two data points. Higher scores are given to more-similar characters, and lower or negative scores for dissimilar characters.

[0060] The Smith-Waterman algorithm is also an example of dynamic programming and has been used for performing local sequence alignment in order to determine similar regions between two nucleotide or protein sequences.

[0061] H(, j - 1) + w(-, bj)insertion

[0062] The Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure. The Smith- Waterman algorithm finds the optimal local alignment with respect to the scoring system being used. The scoring system may include the substitution matrix scheme and the gap-scoring scheme. A substitution matrix describes the rate at which one character in a sequence changes to other character states over time. Substitution matrices have been used in the context of amino acid or DNA sequence alignments, where the similarity between sequences depends on their divergence time and the substitution rates as represented in the matrix. The primary difference between the Smith-Waterman and the Needleman-Wunsch algorithm is that negative scoring matrix cells are set to zero, which renders the (thus positively scoring) local alignments visible. Backtracking starts at the highest scoring matrix cell and proceeds until a cell with score zero is encountered, yielding the highest scoring local alignment. The application technology used the aforementioned sequence analysis techniques to identify best practice treatment for cohorts of patients and applying them to new patients that are identified as similar by physicians and to identify patients that have high probability of going out of range based on patterns or similar cohorts of patients. Contrary to previous research, (see for example, Lee et al., "Local Alignment Tool for Clinical History: Temporal Semantic Search of Clinical Databases" AMIA 2010 Symposium Proceedings p. 437-441), use of a substitution matrix has been found by Applicant to be successful in identifying best practice treatment for cohorts of patients and applying them to new patients that are identified as similar by physicians and identifying patients that have high probability of going out of range based on patterns or similar cohorts of patients. The substitution matrix is initialized with tunable parameters of Match Weight (value set for identical match across diagonal (i,i) positions in matrix) and MisMatchWeight (value set for mismatches of variables (i,j) where i is not equal to j). The sequences are aligned with initialized matrix. The predictive utility of the aligned sequences is evaluated with cross validation. Aligned sequences that correctly predict outcome will result in the substitution matrix Match Weight and MisMatchWeight values incrementing, while alignments that incorrectly predict outcome will result in their weights decrementing. The adjustment of the substitution matrix stops when a cutoff is met for the predictive model.

[0063] According to some embodiments, as shown in FIG. 1 , a similarity matching method 10 may include accessing the electronic medical records (EMR) 20 of patients as stored on a non-transitory computer readable form, such as a computer hard-drive. This EMR may be systematic collection/database or log of electronic medical information about individual patients in digital format that can be shared across different health care settings i.e. accessed by different physicians at different healthcare facilities over a network (as shown in FIG. 3). The Veterans Affairs Informatics and Computing Infrastructure is an example of such a database. The EMR may be accessed via a network connection and may include a range of data, including medical history (e.g. tumor detected, heart attack, stroke, reduction in cognitive ability, onset of autoimmune disorder, anemia etc.), medication, allergies, immunization status, laboratory test results, radiology images, vital signs, personal statistics, like age and weight, and billing information. The EMR maybe updated in real time upon each encounter between the patient and a respective healthcare intervention. As used herein, the term "health care intervention" includes any of, and any combination of lab tests, imaging (x-rays, CT, MRI, ultrasound), surgeries, inpatient and outpatient medical procedures, physical, psychological and other interactions with any health care worker (doctor, nurse, pharmacist, therapist, etc.)

[0064] The EMR data may be retrieved and annotated with an indicator of time, thereby converting the EMR data into annotated sequences of health care interventions 25, and creating a sequential record made up of each health care intervention based on the time indicator. As used herein, the term "annotate" includes taking note, annotating, or otherwise supply an indication. As used herein, the term "time" as used herein includes any of, and any combination of: day, date, week, month, year, minute, hour, second, or shorter or longer period of time.

[0065] The data may be annotated with the indicator of time by identifying an intervention using a natural language processing technique. Natural language processing techniques may use machine learning to identify an intervention in EMR data and annotate these events with an indicator. For example, the natural language processing technique may identify and annotate clinical terms, biological terms, genomic terms, and laboratory testing terms. The annotated term may have a value (discrete or continuous) and a time.

[0066] Exemplary machine learning techniques may include Weka (Waikato Environment for Knowledge Analysis) and ML-Flex. The annotates and annotated sequences of events for the patient may then be converted into a system of annotating, such as a markup language, and stored on a computer readable medium 30. An example of such a markup language is an example of which is Extensible Markup Language (XML). This may be repeated for multiple patients to create a database of annotated patient sequences. The XML annotation or tag thus may have a tagged term, a value (discrete or continuous) and a time.

[0067] A processor may be used to process the annotated sequences in the patient database. The processor may use statistical and machine learning techniques to rank the predictive utility of individual annotations at predicting outcome of a clinical question 35. Distributed computers may process the data using various software frameworks, such as Apache Hadoop. Distributed computers may process the data using various software frameworks, such as Apache Hadoop, HBase and Accumulo to store and retrieve the sequential records. Feature selection may be performed on the XML tagged values in the record using subset selection techniques including but not limited to wrappers and filters that search through the space of possible features. Predictive utility rankings may be evaluated using methods including predictive classifiers and feature selection methods such as ReliefF to get a ranking of how well the features separate among the outcomes of the clinical question. ReliefF uses a nearest neighbor approach to numerically rank how well features distinguish between different outcomes. .

[0068] N annotates are selected based on the threshold of the predictive ability starting with annotates ranked with the highest predictive utility 40. A substitution matrix is then set 45. The substitution matrix may be composed of N x N cells that represent the substitutability of two annotates in a sequence. The sequences may then be aligned 60 using DNA sequence algorithms such as dynamic programming, an example of which is a Smith and Waterman algorithm:

J H(i - 1, j - 1) + w(oi, bj)match I mismatch

H(, ./) = max \,\≤i≤m,\≤ j < n

H(i - 1, j) + w{a, -)deletion

H{, j - 1) + w(-, bj)insertion

[0069] New features may be constructed from identified subsequences with high coverage and predictive ability for clinical outcome of interest 65. Machine learning techniques may then be performed with cross validation to predict outcomes to clinical questions of interest 70.

[0070] The predictive performance of learned models may be assessed and predictive alignments are used to incrementally improve substitution matrix 75. The threshold for improvement of substitution matrix predictive model performance over previous model may be set 50 and used to set the substitution matrix 45. The machine calculated substitution matrix, expert assessed substitution matrix, constructed subsequence features, predictive models and model parameters are stored on a non-transitory computer readable medium 55. In this manner, a cohort of similar sequential records may be obtained by determining which patient records as similar to or relevant to predicting the outcome of a clinical question. The sequences are aligned using DNA sequence alignment algorithms 80 and options and predicted outcomes are displayed 85 via an output device. As used herein, an output device includes any one or and/or a combination of displays, storage, print-out, etc.

[0071] Healthcare interventions that were most effective for the similar patient cohort and most predictive of future test results for a new patient may be outputted (e.g. on a display, or printout) by retrieving the new patient's EMR 90, converting the EMR data into an annotated sequence in order to answer clinical questions 95, and then aligning the sequences with DNA sequence alignment algorithms using a substitution matrix 80. In this manner, the most effective health care intervention options and predicted test results for the new patient may be outputted 85. The predicted time when the test results is predicted to go out of a predetermined range is also outputted. For example, the EMR for a new patient may be retrieved as data files of the patient's encounters with various physicians. Terms in the EMR may be identified using a natural language processing technique and annotated 95 with a time indicator to define a patient session. As used herein, the respective patient session is an intervention annotated with an indication of time. Thus, a sequential record may be created which includes each patient intervention based on the time indicators 95. The patient's sequential record may then be compared with other patients' sequential records that are similar to the patient's sequential record by aligning the sequences using DNA sequence algorithms using a substitution matrix 80. In this manner, a cohort of similar sequences may be obtained by determining which of the other patients EMR's are similar to the patient's sequential record and the identifying healthcare interventions (e.g. drug therapy, physical therapy, radiation therapy) that were most effective for patients in the cohort of similar sequential records. Furthermore, a cohort of similar sequences may be obtained by determining which of the other patients EMR's are similar to the patient's sequential record and then predicting outcomes based on patients in the cohort of similar sequential records

[0072] According to some embodiments, as shown in FIG. 2, a similarity matching method with expert input 100, may include accessing a clinical guideline 1 10 and converting it into sequences of annotated events 1 15. The annotated sequences may be represented as XML computer readable code 120. An expert, such as a physician, may input data ranking the importance and relevance of clinical events to annotate in clinical care sequences. The clinical expert aligns a subset of patient sequences with archetype sequences 125. The annotates are stored, and sequences for patients and XML annotate sequences are annotated as architypes for clinical care practices 130. Patient annotated sequences, architype sequences and sequences for expert analysis, assessment and incremental improvement are displayed 135.

[0073] Following the storage of the annotates 130, a substitution matrix composed of N x N cells that represent the substitutability of two annotates in a sequence is set 140. The sequences are aligned with DNA sequence alignment algorithms 160 as in 60 of FIG. 1 , using a substitution matrix. New features are constructed from identified subsequence with high coverage and predictive ability for clinical outcome of interest 1 65. A machine learning technique is performed with cross validation to predict outcomes to a clinical question of interest 170. The predictive performance of learned models may be assessed and predictive alignments maybe used to incrementally improve the substitution matrix 175. The learned models and constructed subsequence features and alignments may be displayed 200. Clinical experts may then assess predictive models, select features of relevance, and evaluate alignments for improving predictive models 205. The threshold for improvement of substitution matrix and predictive model performance over previous model may be set 150. The machine calculated substitution matrix, expert assessed substitution matrix, constructed subsequence features, predictive models and model parameters are stored on a computer readable memory 155. The sequences may then be aligned with DNA sequence alignment algorithms 180 using a substitution matrix. The display treatment options and predicted outcomes may then be displayed 185.

[0074] Treatment options and predicted outcomes for a new patient may be displayed by retrieving the new patient's EMR 190, converting the EMR data into annotated sequence in order to answer clinical questions 195, and then aligning the sequences with DNA sequence alignment algorithms using a substitution matrix 180. In this manner, the treatment options and predicted outcome for the new patient may be displayed 185.

Examples

[0075] Example 1 : A physician wanting to identify the best treatment for lowering the blood pressure of a patient may retrieve the EMR of the patient and submit the EMR for processing by a computer readable program executable by a processor, such as in a computer. The program may convert the EMR data into annotated sequences of events for the patient by annotating each event with an indicator. For example, the annotated sequence may be that the patient first had elevated blood pressure, a day later the patient was prescribed blood pressure medication A, three months later the patient then suffered a heart attack, six months later a different blood pressure medication B was prescribed, two years later the patient then suffered a stroke, and the patient's blood pressure remains elevated. The annotated sequences may then be converted to XML annotates. The patient's sequence may then be compared to a cohort of patients have similar sequences in order to determine which treatment was successful for those other patients. The step of obtaining a cohort of patient having similar sequences, may be achieved by converting EMR data of a large database of patients into annotated sequences of events for each patient by annotating each event with an indicator of time. Statistical and a machine learning technique as implemented by one or many distributed computer processors may be used to rank the predictive utility of individual annotates at predicting the outcome of treating the patient's high blood pressure. For example, the processors may identify a heart attack followed by stroke as the top two predictors in sequence having utility in the clinical question - (how to lower the patient's blood pressure). The executable program may then select the annotates heart disease, stroke, and current use of medication B in a substitution matrix in order to determine those patients with a similar subsequence to the patient's (i.e. identify those patients who suffered a sequence in which a heart attack was followed by a stroke and who are currently taking medication B). Using the executable program, these subsequences of patients determined to be similar are used to construct new features having high coverage and predictive ability for lowering blood pressure. The substitution matrix is saved in a database and the predictive clinical treatment is evaluated for success in the patient. Based on the evaluation, the predictive performance is assessed and incrementally improved.

[0076] Example 2: Example 2 is the same as Example 1 , except the relevance of clinical events to annotate in clinical care sequences may be ranked by an expert, such as a physician. The clinical expert may assign a subset of patient sequences with architype sequences.

[0077] Example 3: A physician wanting to predict a patient's laboratory test value may retrieve the EMR of the patient and submit the EMR for processing by a computer readable program executable by a processor, such as in a computer. The program may convert the EMR data into annotated sequences of laboratory test results for the patient by annotating each lab test with an indicator of time. For example, the annotated sequence may be that the patient first had high cholesterol levels, followed by high blood, and the physician would like to predict if and when the patient will have high blood glucose levels indicative of diabetes.

[0078] The annotated lab results sequences may then be converted to XML annotates. The patients' sequence may then be compared to a cohort of patients have similar lab test results followed by high glucose levels. The step of obtaining a cohort of patient having similar sequences, may be achieved by converting EMR data of a large database of patients into annotated sequences of events for each patient by annotating each lab test results event with an indicator of time. Statistical and a machine learning technique as implemented by one or many distributed computer processors may be used to rank the predictive utility of individual annotates at predicting if and when the patient may have high blood glucose levels. For example, the processors may identify high blood pressure followed by high cholesterol levels as the top two lab test value predictors in sequence having utility in the clinical question - (if and when the patient may have high glucose levels). The executable program may then select the annotates high blood pressure and high cholesterol levels in a substitution matrix in order to determine those patients with a similar subsequence to the patient's (i.e. identify those patients who suffered a sequence in which high blood pressure was followed by high cholesterol levels). Using the executable program, these subsequences of patients determined to be similar are used to construct new features having high coverage and predictive ability for high glucose levels. The substitution matrix is saved in a database and the predictive test is evaluated for success in the patient. Based on the evaluation, the predictive performance is assessed and incrementally improved.

[0079] Example 4: According to some embodiments, a Patient Health Record Similarity Measure (PHRSM) framework can use sequences of similar medical events to identify signal in patients' healthcare event data. The framework may be applied to data to predict, for example, intensive care unit (ICU) patient mortality. The EMR data can form inputs for algorithms to create models that predict in-hospital survival of patients. The challenge data set can include EMR with multiple variables for a plurality of anonymous patients with a death rate just under a given percentage.

[0080] In an exemplary implementation, challenge data from the PhysioNet 2012 challenge set included EMR with 37 variables for 4,000 anonymous patients with a death rate just under 14%. Although the challenge data set is limited by its size, it provides an event sequence for multiple patients in ICUs and allows us to compare the performance of the framework to other prediction methodologies such as Simplified Acute Physiology Scores (SAPS). SAPS are used widely to assess effectiveness of clinical care, medications and treatment in the context of severity of illness within hospitals. For the PhysioNet 2012 challenge, the SAPS I scoring system was used to predict in-hospital survival for ICU data. [0081] In one embodiment, machine learning methods are used to reduce the number of variables. Alternatively or in combination, a predictive model for severity of illness scale may be applied. Such a model may use a variety of physiologic measurements, such as elective surgery, age, and prior length of stay. An example of such a model is the Oxford Acute Severity of Illness Score ("OASIS' ^*) (Johnson AE, Kramer AA, Clifford GD. A new severity of illness scale using a subset of Acute Physiology and Chronic Health Evaluation data elements shows comparable predictive accuracy. Crit Care Med. 41 (7): 171 1 -8. 2013). The scale can use a reduced set of a number of clinical variables (e.g., ten clinical variables) that result in a cumulative score that ranges between a lower bound (e.g., 0) and an upper bound (e.g., 75). Unlike SAPS I, the maximum contribution of the variables are not identical. Some variables can contribute a maximum of ten severity points in OASIS, while other variables can contribute a maximum of four severity points. Other contributions are contemplated. For example, a variable may contribute a maximum of 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more severity points. Different variables can contribute different maximum severity points.

[0082] The data was used to demonstrate that sequence-based variables are informative for predicting patient mortality in four types of ICUs. The PHRSM approach may be based upon the flowchart of FIG. 1. Using the EMR data, the steps listed in the flowchart of FIG. 1 can be executed to generate predictive models that can inform care for critically ill and injured patients. The EMR data can be converted into tagged event sequences, which is used to create new sequential variables that are evaluated for their utility in predicting in-hospital survival. The specific method for constructing the sequence variables involves using a distance measure related to the alignment of the patient's sequence to archetypal sequences. The PHRSM framework can be used with a large range of machine learning algorithms. Alternatively, The PHRSM framework can be used with a rule generating classifier (e.g., partial decision tree classification or "PART"). One benefit of a rule generating classifier is that the rules can be examined for clinical relevance.

[0083] In an exemplary implementation, data were obtained from the PhysioNet 2012 challenge. The data set consisted of 4000 cases with 554 (13.9%) being in-hospital deaths and the remaining 3446 being survivors. The data were composed of results from multiple medical tests for each patient over a period of time (e.g., 48 hours). The data set included patient SAPS I scores. The SAPS I score uses maximum values for a number of variables (e.g., 14 variables) over a period of time (e.g., 24 hours) and translates them into a severity score within a range (e.g., 0 to 4), with higher values indicating more severe events. These values are added to produce a cumulative value within a range (e.g., 0 to 56) with cumulative scores greater than or equal to a threshold (e.g., 20) used to predict death.

[0084] In some embodiments, variable selection algorithms are not used to reduce the clinical variables (e.g., from 42 to 5). For example, a small number of clinical variables (e.g., four clinical variables) that were ranked as highly informative by a predictive model for severity (e.g., OASIS) can be chosen. These variables can be or include Age, Urine, Glasgow Coma Scale (GCS), and Mechanical Ventilator. Age and Urine are non-sequential variables and are used as informative variables in the classifier. While a non-sequential variable may vary over time (e.g., age increasing over time), the non-sequential variables remain constant across a sample time period for purposes of the comparison. An additional variable, ICU, can be used. For example, ICU can include ( 1 ) Coronary Care Unit ("ICU l "), (2) Cardiac Surgery Recovery Unit ("ICU2"), (3) Medical ICU ("ICU3"), and (4) Surgical ICU ("ICU4"). Informed by Johnson et al. (2013), the clinical Age variable can be used to create a new discrete variable that indicated if the patient above or below an age threshold (e.g., younger than 79 or was 79 or older). GCS and Mechanical Ventilator can be used to create new sequential variables using the PHRSM framework.

[0085] The GCS and Mechanical Ventilator variables have multiple measurements over a period of time (e.g., 48 hours) that can be converted into sequences. To convert these into a sequence, the measurements are grouped and made into discrete events. For example, each measurement may be categorized into one of a plurality of levels. For example, a measurement may determine the presence or absence of a condition, therefore indicating a categorization into one of two levels. By further example, a measurement may determine the presence or absence of each of a plurality of conditions, the plurality of conditions being combinable to indicate a categorization into one of more than two levels (e.g., contiguous ranges). A similar method can be used for both variables. The below description focuses on the method of GCS sequence construction.

[0086] The GCS scores are composed of a scale of consciousness measurements based on eye response, verbal response and motor response. The GCS can be divided into four levels, for example, with assigned letter labels: 3-7(D), 8- 13(R), 14(N) and 15(A). A score of 3 indicates no eye, verbal, or motor response to stimulus. A score of fifteen means all three areas are functioning normally. Each level contributes a different number of severity points to the overall score with D contributing 10 severity points with the patient being in a coma state, R contributing 4, N contributing 3, and A contributing 0 with the patient being fully responsive. For sequence analysis and sequential variable construction, these letters may be treated like amino acids in a protein sequence.

[0087] Assigning letters to the hourly GCS scores constructs a new PHRSM sequential variable over the period of time (e.g., 48 hours) for each patient. For missing hourly measures, the most recent previous GCS measure can be used. This results in one score per hour (e.g, a total of 48 scores). For example, a patient with GCS scores in the 3-7 range over an entire 48-hour time period would have a sequence of 48 letter D's (i.e., DDDDDDDDDDDDDDDDDDDDDDD D D D D D D D D D D D D D D D D D D D D D D D D D). A substitution matrix is used to assign penalties for substituting one letter for another, as occurs when aligning translated protein sequences composed from an alphabet of twenty amino acids. In the above example, four letters are provided: D, R, N, and A. The substitution matrix can be constructed based on the differences between severity scores for GCS with a substitution from D to A being a -10 point penalty. Table 1, below, provides the substitution matrix for GCS sequences. The penalty for starting a gap was -10 and continuing a gap was -1, where a gap is a stretch of the sequence that does not align with the comparison sequence.

[0088] The sequence for Mechanical Ventilator can be generated in the same fashion with patients being on or off the ventilator over the period of time (e.g., 48 hours). A different substitution matrix was used with an alphabet of two letters: D for on ventilator and A for off ventilator. Table 2 shows the substitution matrix for the Ventilator sequence. The gap penalties can be the same as for GCS sequences. [0089] Using the discretized clinical variable (e.g., GCS), each patient's sequence can be compared to four archetypal sequences:

(1 ) completely normal (e.g., responsive) scores across the time period (i.e., 48 A's across 48 hours)

(2) completely unresponsive scores across the time period (i.e., 48 D's across 48 hours)

(3) improving scores that are transitioning from unresponsive to responsive

(4) degrading scores that are transitioning from responsive to unresponsive

[0090] These four archetype sequences can be compared against all patients to get match distance scores using the substitution matrix. A sequence alignment system may be utilized, for example, the open source system UGENE (Okonechnikov , Golosova O, and Fursov M. Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics. 28: 1 166-1 167. 2012). Matches 5% or lower can be assigned zero. Since sequences can have multiple subsequence alignments, the best match for an archetype can be selected and then stored as a sequence based variable. The sequence alignment results in sequence-based variables that are within a range (e.g., between 0 and 480), with the upper bound of the range (e.g., 480) representing a match on all of a number of measurements (e.g., 48 measurements) in the sequence. This results in the generation of four new sequential variables, one for each archetype, related to the clinical variable (e.g., GCS).

[0091] The sequence alignment for Mechanical Ventilator was generated in the same fashion. However, only the archetypal sequence corresponding to the patient being off the ventilator across the time period was used. Thus, a total of five new sequential variables can be generated; four being generated relating to GCS and one being generated relating to the Mechanical Ventilator clinical measurement.

[0092] A classifier may be used to execute machine learning algorithms. One exemplary classifier is a partial decision tree rule generator named PART that is available through Weka, an open source machine-learning environment. PART uses decision trees to construct a hierarchical set of rules called a PART Decision List. The rules are applied in order with the first rule applying to the full set and the last rule applying to the remaining set. Default parameters can be used when executing the machine learning algorithms.

[0093] A separate classifier can be constructed for each ICU using nine variables: four GCS sequence-based variables, one mechanical ventilator sequence-based variable, three clinical variables (ICU, Age, and Urine), and a new discrete informative variable that indicated if the patient younger than 79 or 79 and older. The predictive ability of each classifier can be assessed using cross validation and a hold out data set to estimate the generalization performance (Kohavi, Ron. "A study of cross-validation and bootstrap for accuracy estimation and model selection." IJCA1. Vol. 14. No. 2. 1995). In view of a small mortality sample size within the ICUs, a subsample selection method can be used so that the classifiers are cross-validated on subsets with equal numbers of patients that survived or died in-hospital. The remaining patients can be used as hold out test sets (3210 out of 4000 patients). The generated classification models can be stored along with the substitution matrices for the sequence variables, constructed variables and model parameters.

[0094] In an exemplary implementation, classifiers were evaluated for performance using only two of the original clinical variables along with the five new sequential variables. Classifiers were compared with SAPS I, which uses fourteen of the clinical variables. In addition, it was determined how informative the new sequence based variables were.

[0095] Table 3, below, shows confusion matrices for SAPS I, the sequence feature classifier (SFC) and all four ICUs using SFCs. Since the PhysioNet 2012 challenge involved examining how well algorithms predict mortality, ability to predice in-hospital death or survival was evaluated. Looking at the SAPS I columns, SAPS I correctly predicted that 176 patients would die and 2868 patients would live. SAPS I incorrectly predicted 578 patients who actually lived as ones who would die. Similarly, SAPS I predicted 378 patients who actually died as ones who would live.

[0096] To compare performance across the algorithms, the PhysioNet 2012 challenge used the minimum of sensitivity or positive predictivity. Any predictive feature may be used, including positive predictive value, negative predictive value, sensitivity, specificity, combinations thereof, and the like. The higher the values for the minimum indicate better performance. As seen in Table 4, below, the minimum for SFC is greater than the SAPS I minimum. Sensitivity corresponds to a classifier's ability to identify true positive compared with false negatives. Using the data from Table 3, the SAPS I sensitivity is 176/(176+378) = 0.32. Positive predictivity corresponds to the classifier's ability to predict correctly the class of interest, in this case, death. Positive predictivity is calculated by dividing the number of true positives by the sum of both true and false positives. Using the data from Table 3, the SAPS I positive predictivity is 176/(176+578) = 0.23. The area under the ROC curve (AUC) (Cantor SB and attan MW. Determining the area under the ROC curve for a binary diagnostic test. Med Decis Making. 20(4):468-70. 2000) can be computed from Table 2 with SAP1 having and AUC of 0.58 and SFC having and AUC of 0.67. Table 4 also provides a summary of the sensitivity, positive predictivity AUC for the PhysioNet 2012 challenge data SAPS I, the overall the SFC along with the measures for each ICU.

[0097] Table 4, below, shows Number of Observed Deaths, Number of Observed Survival, Sensitivity Score, Positive Predictivity (Pos) Score, Minimum (Min) Value and Area Under the ROC (AUC) for SAPS I, the overall SFC, and the SFC for the four Intensive Care Units (ICUs).

[0098] Because of the small number of deaths occurring in ICU l and ICU2 the method for subsample selection results in higher sensitivity at the cost of positive predictivity.

[0099] Using the ReliefF algorithm implement in Weka, single predictive feature rankings were calculated for the variables in each ICU classifier (see Table 5). ReliefF uses a nearest neighbor algorithm to calculate how well single variables discriminate between classes. The sequence variables rank highly across the four ICUs with GCS sequence variables consistently scoring the highest.

[0100] Table 5, below, shows ranked attribute (ReliefF scores) for the four intensive care units (ICUs). All the variables, except Urine, Age, and Age_Discrete, are newly created sequential variables.

5 Off Ventilator GCS Improving GCS Coma Age Discrete

6 Age_Discrete GCS_Degrading GCS Improving GCS_Improving

7 Urine Age GCS Normal Off Ventilator

8 Age Age_Discrete Age_Discrete Urine

[0101] Each ICU classifier is described with respect to its rule set used to classify patients' in-hospital survival or death. Table 6, below, provides a list of the decision rules for each ICU. The high-ranking sequence variables are included in conjunctive rules that ranked as highest discriminating rules by the PART Decision List algorithm. The rules in the classifiers can be interpreted medically.

[0102] Table 6, below, shows PART Decision Lists for prediction life (L) and death (D) across the four Intensive Care Units (ICUs). All the variables, except Urine, Age, and Age_Discrete, are newly created sequential variables.

5 Remaining L Remaining L Age Discrete = L

Older AND

GCS Degrading > 91

6 Age Discrete = L

Older AND

GCS Degrading

<= 72

7 Age_Discrete = L

Younger

8 Remaining D

[0103] For example, Rule 1 for ICU l can be read as patient that are not consistently in a coma and are younger are predicted to live. Rule 2 can be read as, patients that do not have a consistent sequence of normal GCS scores, are degrading to lower GCS scores, are older and can be on a ventilator are predicted to die in-hospital. The use of sequence variables across the classifiers demonstrates their utility in predicting survival in-hospitals.

[0104] The sequence-based variables are informative and occurred as part of the first rule in all four classifiers. They provide trending and time series information that is lacking in other models of mortality, which use minimum or maximum values over a range and thus lose information. Sequence-base variables supported the generation of compact sets of rules, composed of eight rules or less for the different units. This exemplary approach supports the use of medically meaningful checklists that clinicians can understand in relationship to their patients. Other classifiers such as support vector machines, decision trees, Bayesian classifiers and others could also use the sequence-based variables for predictive analytics.

[0105] The SFC out performed SAPS I on the PhysioNet Challenge data with SFC having an AUC of 0.67 versus 0.58 for SAPS I. The AUC values were higher than SAPS I for each of the four ICU. The sequence variables in the SFC were highly ranked as informative by the ReliefF algorithm and were used prominently across the PART Decision List classifiers. Therefore, sequence variables created through the PHRSM framework can be used on medical data to predict outcomes and identify medically meaningful signal in patients' EMR data.

[0106] According to some embodiments, as shown in FIG. 6, analytical methods and systems of the subject technology may be applied to predict events, selections, trends, and behaviors relating to a user on a user interface. An exemplary method 600 is shown in FIG. 6. A user may, during a user session, have an encounter with a user interface. The user interface may include or be connected to a computer-implemented system, such as a personal computer, an electronic device, a website, a server, combinations thereof, and the like, as further disclosed herein.

[0107] During the user session, one or more events may occur and be recorded in a data file (operation 602). For example, a user may make selections or otherwise provide inputs to the user interface. The user interface may provide displays or outputs to the user. Each of these events may be recorded with an indicator of a time associated with the encounter (operation 604). Events may be recorded using tracking techniques. For example, a user may have a user account associated with a user interface, such that information provided by the user is recorded in a data file associated with the user account. By further example, a unique identifier associated with the user (e.g., IP address) is used to record selections and other events associated with the user during a user session. Data (e.g. cookies) may be used to store data associated with the user and/or events of a user session. By further example, exemplary implementations of tracking techniques include web beacons, tracking bugs, tags, page tags, web bugs, tracking pixels, pixel tags, 1 x 1 gifs, clear gifs, and JavaScript tags. Such implementations may recorder facilitate recording of events during a user section, such as selections made by a user, such as websites visited before an event, websites visited after an event, advertisements selected, purchases executed, and the like. Such information is stored, for example, as clickstream data. The data is stored with time indicators to create a time- sequential record of the user (operation 606). The time-sequential records may span or include one or more user sessions. The time-sequential records may include one or more sequential indicators of the user and one or more a non-sequential indicators of the user.

[0108] According to some embodiments, a target selection by the user is determined. The target selection may be a purchase executed by the user, display of a website, or any other selection made by the user and/or input to the user interface. The target selection may be desired outcome according to an operator (e.g., of the user interface).

[0109] According to some embodiments, a user sequential record is compared to other time-sequential records, of other users (operation 608). The other time-sequential records may include an indicator of whether or not the other users achieved the target selection. The comparison may include techniques disclosed herein. For example, a substitution matrix may be applied to assign penalties for substituting an indicator of the user sequential record for an indicator of one of the other time-sequential records, as disclosed herein. User penalties may be applied for starting a gap between the user sequential record and one of the other time-sequential records and/or continuing a gap between the user sequential record and the one of the other time-sequential records.

[0110] Among the other users, cohorts of users may be evaluated according to a degree of similarity with respect to the user sequential record (operation 610). A cohort of users having similar sequential records is identified by determining which of the other sequential records have a degree of similarity to the user sequential record. Accordingly, one or more events associated with the sequential records of the identified cohort is determined to be applicable to the user. An event preceding the occurrence of the target selection among the identified cohort is determined as being applicable to the user to potentially lead to the user making the target selection (operation 612). For example, an earlier purchase, website viewing, or other online activity by the identified cohort is determined as preceding a target purchase. Accordingly, the same or a similar event is provided or facilitated by the user interface with respect to the user. In an exemplary implementation, an advertisement displayed to the cohort prior to a target purchase is identified and/or displayed to the user to facilitate the same target purchase by the user. Conversely, events leading away from a target selection or not leading to a target selection can be identified in a cohort.

[0111] As disclosed further herein, one or more indicators may be evaluated on the basis of one or more predictive features thereof. Predictive feature include positive predictive value, negative predictive value, sensitivity, and specificity. For example, multiple indicators with respect to the user records and/or the other records may be separately evaluated and subsequently compared. Each indicator, or combinations thereof, may be separately correlated with the degree of similarity between the user sequential record and the other time-sequential records. Further, each indicator, or combinations thereof, may be separately correlated with the occurrence of the target selection according to the user records and/or the other records. The indicators may be ranked according to the predictive feature(s).

[0112] According to some embodiments, as shown in FIG. 7, analytical methods and systems of the subject technology may be applied to predict events, selections, trends, and behaviors relating to a financial asset of interest. An exemplary method 700 is shown in FIG. 7. The asset of interest may be any property or financial instrument having economic value, including but not limited to stocks, bonds, options, precious metals, equity, contractual rights, real estate, cash, combinations thereof and the like.

[0113] Across an events session, one or more events may occur, including changes in value of a financial asset, published reports of events (including market and non-market related events), broadcasts, social media entries, combinations thereof, and the like (operation 702). During the events session, one or more events may occur and be recorded in a data file associated with an asset of interest (operation 704). For example, an asset may increase or decrease in value, a market related event may be reported in a publication, a non-market related event may be reported in a publication, etc. Each of these events may be recorded with an indicator of a time associated with the event. Events may be monitored, collected, and/or recorded using data aggregators, news aggregators, social network aggregators, internet search engines, combinations thereof, and the like. For example, reports, publications, and broadcasts may be retrieved and analyzed for information provided therein. Information may be extracted from sources such as news websites, blogs, podcasts, video blogs, social media websites (e.g., Twitter), and the like. The information may directly relate to or reference the asset of interest, or the information may be indirectly related to the asset of interest.

[0114] The information is stored as data with time indicators to create a time-sequential record of the events session (operation 706). The time-sequential records may span or include one or more events sessions. The time-sequential records may include one or more sequential indicators of the asset of interest and one or more a non-sequential indicators of the asset of interest.

[0115] According to some embodiments, an action to be taken with respect to the asset of interest may be identified. The action may be the purchase, sale, transfer, trade of the asset of interest. According to some embodiments, a target event related to the asset of interest may be identified. The target event may be a desired event or an event in response to which an action is desirable. For example, the target event may be an increase or decrease in value of the asset of interest.

[0116] According to some embodiments, an asset sequential record is compared to other time-sequential records, of the same or other assets across the same or different time periods (operation 708). The other time-sequential records may include an indicator of whether or not the same or other assets achieved a target event. The comparison may include techniques disclosed herein. For example, a substitution matrix may be applied to assign penalties for substituting an indicator of the asset sequential record for an indicator of one of the other time-sequential records, as disclosed herein. User penalties may be applied for starting a gap between the asset sequential record and one of the other time-sequential records and/or continuing a gap between the asset sequential record and the one of the other time-sequential records.

[0117] Among the other users, cohorts of users may be evaluated according to a degree of similarity with respect to the user sequential record (operation 710). A cohort of assets and/or event sessions having similar sequential records is identified by determining which of the other sequential records have a degree of similarity to the asset sequential record. Accordingly, one or more events associated with the sequential records of the identified cohort is determined to be applicable to the asset. An event preceding the occurrence of the target event among the identified cohort is determined as being applicable to the asset to potentially lead to the target event (operation 712). For example, an increase or decrease in value of the asset of interest or another asset, publication of a news article containing particular information, and/or posting of information on a social networking platform is determined as preceding a target event. Accordingly, the same or a similar event is predicted or forecast with respect to the asset of interest. In an exemplary implementation, an action to be taken with respect to the asset of interest is identified and/or executed to achieve a result identical or similar to a result of the identified cohort. Conversely, events leading away from a target event or not leading to a target event can be identified in a cohort.

[0118] As disclosed further herein, one or more indicators may be evaluated on the basis of one or more predictive features thereof. Predictive feature include positive predictive value, negative predictive value, sensitivity, and specificity. For example, multiple indicators with respect to the asset records and/or the other records may be separately evaluated and subsequently compared. Each indicator, or combinations thereof, may be separately correlated with the degree of similarity between the asset sequential record and the other time-sequential records. Further, each indicator, or combinations thereof, may be separately correlated with the occurrence of the target event according to the asset records and/or the other records. The indicators may be ranked according to the predictive feature(s).

[0119] FIG. 3 illustrates a simplified diagram of a system 101 , in accordance with various embodiments of the subject technology. The system 101 may include one or more remote client devices 102 (e.g., client devices 102a, 102b, 102c, and 102d) in communication with a server computing device 106 (server) via a network 104. In some embodiments, the server 106 is configured to run applications that may be accessed and controlled at the client devices 102. For example, a user at a client device 102 may use a web browser to access and control an application running on the server 106 over the network 104. In some embodiments, the server 106 is configured to allow remote sessions (e.g., remote desktop sessions) wherein users can access applications and files on the server 106 by logging onto the server 106 from a client device 102. Such a connection may be established using any of several well-known techniques such as the Remote Desktop Protocol (RDP) on a Windows-based server.

[0120] By way of illustration and not limitation, in one aspect of the disclosure, stated from a perspective of a server side (treating a server as a local device and treating a client device as a remote device), a server application is executed (or runs) at a server 106. While a remote client device 102 may receive and display a view of the server application on a display local to the remote client device 102, the remote client device 102 does not execute (or run) the server application at the remote client device 102. Stated in another way from a perspective of the client side (treating a server as remote device and treating a client device as a local device), a remote application is executed (or runs) at a remote server 106.

[0121] By way of illustration and not limitation, a client device 102 can represent a computer, a mobile phone, a laptop computer, a thin client device, a personal digital assistant (PDA), a portable computing device, or a suitable device with a processor. In one example, a client device 102 is a smartphone (e.g., iPhone, Android phone, Blackberry, etc.). In certain configurations, a client device 102 can represent an audio player, a game console, a camera, a camcorder, an audio device, a video device, a multimedia device, or a device capable of supporting a connection to a remote server. In one example, a client device 102 can be mobile. In another example, a client device 102 can be stationary. According to one aspect of the disclosure, a client device 102 may be a device having at least a processor and memory, where the total amount of memory of the client device 102 could be less than the total amount of memory in a server 106. In one example, a client device 102 does not have a hard disk. In one aspect, a client device 102 has a display smaller than a display supported by a server 106. In one aspect, a client device may include one or more client devices. [0122] In some embodiments, a server 106 may represent a computer, a laptop computer, a computing device, a virtual machine (e.g., VMware® Virtual Machine), a desktop session (e.g., Microsoft Terminal Server), a published application (e.g., Microsoft Terminal Server) or a suitable device with a processor. In some embodiments, a server 106 can be stationary. In some embodiments, a server 106 can be mobile. In certain configurations, a server 106 may be any device that can represent a client device. In some embodiments, a server 106 may include one or more servers.

[0123] In one example, a first device is remote to a second device when the first device is not directly connected to the second device. In one example, a first remote device may be connected to a second device over a communication network such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or other network.

[0124] When a client device 102 and a server 106 are remote with respect to each other, a client device 102 may connect to a server 106 over a network 104, for example, via a modem connection, a LAN connection including the Ethernet or a broadband WAN connection including DSL, Cable, T l , T3, Fiber Optics, Wi-Fi, or a mobile network connection including GSM, GPRS, 3G, WiMax or other network connection. A network 104 can be a LAN network, a WAN network, a wireless network, the Internet, an intranet or other network. A network 104 may include one or more routers for routing data between client devices and/or servers. A remote device (e.g., client device, server) on a network may be addressed by a corresponding network address, such as, but not limited to, an Internet protocol (IP) address, an Internet name, a Windows Internet name service (WINS) name, a domain name or other system name. These illustrate some examples as to how one device may be remote to another device. But the subject technology is not limited to these examples.

[0125] According to certain embodiments of the subject technology, the terms "server" and "remote server" are generally used synonymously in relation to a client device, and the word "remote" may indicate that a server is in communication with other device(s), for example, over a network connection(s).

[0126] According to certain embodiments of the subject technology, the terms "client device" and "remote client device" are generally used synonymously in relation to a server, and the word "remote" may indicate that a client device is in communication with a server(s), for example, over a network connection(s).

[0127] In some embodiments, a "client device" may be sometimes referred to as a client or vice versa. Similarly, a "server" may be sometimes referred to as a server device or vice versa.

[0128] In some embodiments, the terms "local" and "remote" are relative terms, and a client device may be referred to as a local client device or a remote client device, depending on whether a client device is described from a client side or from a server side, respectively. Similarly, a server may be referred to as a local server or a remote server, depending on whether a server is described from a server side or from a client side, respectively. Furthermore, an application running on a server may be referred to as a local application, if described from a server side, and may be referred to as a remote application, if described from a client side.

[0129] In some embodiments, devices placed on a client side (e.g., devices connected directly to a client device(s) or to one another using wires or wirelessly) may be referred to as local devices with respect to a client device and remote devices with respect to a server. Similarly, devices placed on a server side (e.g., devices connected directly to a server(s) or to one another using wires or wirelessly) may be referred to as local devices with respect to a server and remote devices with respect to a client device.

[0130] FIG. 4 illustrates a simplified block diagram of a server 106, in accordance with various embodiments of the subject technology. The server 106 comprises a first display module 202, a user input module 204, a second display module 206, a patient input module 208, and an adjustment module 210. In some embodiments, the server 106 is communicatively coupled with the network 104 via a network interface. The modules can be implemented in software, hardware and/or a combination of both. Features and functions of these modules according to various aspects are further described in the subject technology.

[0131] FIG. 5 is a conceptual block diagram illustrating an example of a system, in accordance with various embodiments of the subject technology. A system 601 may be, for example, a client device (e.g., client device 102) or a server (e.g., server 106). The system 601 may include a processing system 602. The processing system 602 is capable of communication with a receiver 606 and a transmitter 609 through a bus 604 or other structures or devices. It should be understood that communication means other than busses can be utilized with the disclosed configurations. The processing system 602 can generate audio, video, multimedia, and/or other types of data to be provided to the transmitter 609 for communication. In addition, audio, video, multimedia, and/or other types of data can be received at the receiver 606, and processed by the processing system 602.

[0132] The processing system 602 may include a processor for executing instructions and may further include a machine-readable medium 619, such as a volatile or non-volatile memory, for storing data and/or instructions for software programs. The instructions, which may be stored in a machine-readable medium 610 and/or 619, may be executed by the processing system 602 to control and manage access to the various networks, as well as provide other communication and processing functions. The instructions may also include instructions executed by the processing system 602 for various user interface devices, such as a display 612 and a keypad 614. The processing system 602 may include an input port 622 and an output port 624. Each of the input port 622 and the output port 624 may include one or more ports. The input port 622 and the output port 624 may be the same port (e.g., a bi-directional port) or may be different ports.

[0133] The processing system 602 may be implemented using software, hardware, or a combination of both. By way of example, the processing system 602 may be implemented with one or more processors. A processor may be a general-purpose microprocessor, a microcontroller, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a state machine, gated logic, discrete hardware components, or any other suitable device that can perform calculations or other manipulations of information.

[0134] A machine-readable medium can be one or more machine-readable media. Software shall be construed broadly to mean instructions, data, or any combination thereof, whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise. Instructions may include code (e.g., in source code format, binary code format, executable code format, or any other suitable format of code).

[0135] Machine-readable media (e.g., 619) may include storage integrated into a processing system, such as might be the case with an ASIC. Machine-readable media (e.g., 610) may also include storage external to a processing system, such as a Random Access Memory (RAM), a flash memory, a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable PROM (EPROM), registers, a hard disk, a removable disk, a CD-ROM, a DVD, or any other suitable storage device. Those skilled in the art will recognize how best to implement the described functionality for the processing system 602. According to one aspect of the disclosure, a machine-readable medium is a computer-readable medium encoded or stored with instructions and is a computing element, which defines structural and functional interrelationships between the instructions and the rest of the system, which permit the instructions' functionality to be realized. In one aspect, a machine-readable medium is a non-transitory machine-readable medium, a machine-readable storage medium, or a non-transitory machine-readable storage medium. In one aspect, a computer-readable medium is a non-transitory computer-readable medium, a computer-readable storage medium, or a non-transitory computer-readable storage medium. Instructions may be executable, for example, by a client device or server or by a processing system of a client device or server. Instructions can be, for example, a computer program including code.

[0136] An interface 616 may be any type of interface and may reside between any of the components shown in FIG. 6. An interface 616 may also be, for example, an interface to the outside world (e.g., an Internet network interface). A transceiver block 607 may represent one or more transceivers, and each transceiver may include a receiver 606 and a transmitter 609. A functionality implemented in a processing system 602 may be implemented in a portion of a receiver 606, a portion of a transmitter 609, a portion of a machine-readable medium 610, a portion of a display 612, a portion of a keypad 614, or a portion of an interface 616, and vice versa.

[0137] As used herein, the word "module" refers to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example C++. A software module may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpretive language such as BASIC. It will be appreciated that software modules may be callable from other modules or from themselves, and/or may be invoked in response to detected events or interrupts. Software instructions may be embedded in firmware, such as an EPROM or EEPROM. It will be further appreciated that hardware modules may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors. The modules described herein are preferably implemented as software modules, but may be represented in hardware or firmware.

[0138] it is contemplated that the modules may be integrated into a fewer number of modules. One module may also be separated into multiple modules. The described modules may be implemented as hardware, software, firmware or any combination thereof. Additionally, the described modules may reside at different locations connected through a wired or wireless network, or the Internet.

[0139] In general, it will be appreciated that the processors can include, by way of example, computers, program logic, or other substrate configurations representing data and instructions, which operate as described herein. In other embodiments, the processors can include controller circuitry, processor circuitry, processors, general purpose single-chip or multi-chip microprocessors, digital signal processors, embedded microprocessors, microcontrollers and the like.

[0140] Furthermore, it will be appreciated that in one embodiment, the program logic may advantageously be implemented as one or more components. The components may advantageously be configured to execute on one or more processors. The components include, but are not limited to, software or hardware components, modules such as software modules, object- oriented software components, class components and task components, processes methods, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.

[0141] The foregoing description is provided to enable a person skilled in the art to practice the various configurations described herein. While the subject technology has been particularly described with reference to the various figures and configurations, it should be understood that these are for illustration purposes only and should not be taken as limiting the scope of the subject technology.

[0142] There may be many other ways to implement the subject technology. Various functions and elements described herein may be partitioned differently from those shown without departing from the scope of the subject technology. Various modifications to these configurations will be readily apparent to those skilled in the art, and generic principles defined herein may be applied to other configurations. Thus, many changes and modifications may be made to the subject technology, by one having ordinary skill in the art, without departing from the scope of the subject technology.

[0143] It is understood that the specific order or hierarchy of steps in the processes disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged. Some of the steps may be performed simultaneously. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.

[0144] Terms such as "top," "bottom," "right," "left" and the like as used in this disclosure should be understood as referring to an arbitrary frame of reference, rather than to the ordinary gravitational frame of reference. Thus, a top surface, a bottom surface, a front surface, and a rear surface may extend upwardly, downwardly, diagonally, or horizontally in a gravitational frame of reference.

[0145] A phrase such as "an aspect" does not imply that such aspect is essential to the subject technology or that such aspect applies to all configurations of the subject technology. A disclosure relating to an aspect may apply to all configurations, or one or more configurations. An aspect may provide one or more examples of the disclosure. A phrase such as "an aspect" may refer to one or more aspects and vice versa. A phrase such as "an embodiment" does not imply that such embodiment is essential to the subject technology or that such embodiment applies to all configurations of the subject technology. A disclosure relating to an embodiment may apply to all embodiments, or one or more embodiments. An embodiment may provide one or more examples of the disclosure. A phrase such "an embodiment" may refer to one or more embodiments and vice versa. A phrase such as "a configuration" does not imply that such configuration is essential to the subject technology or that such configuration applies to all configurations of the subject technology. A disclosure relating to a configuration may apply to all configurations, or one or more configurations. A configuration may provide one or more examples of the disclosure. A phrase such as "a configuration" may refer to one or more configurations and vice versa.

[0146] Furthermore, to the extent that the term "include," "have," or the like is used in the description or the claims, such term is intended to be inclusive in a manner similar to the term "comprise" as "comprise" is interpreted when employed as a transitional word in a claim. [0147] The word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

[0148] A reference to an element in the singular is not intended to mean "one and only one" unless specifically stated, but rather "one or more." The term "some" refers to one or more. All structural and functional equivalents to the elements of the various configurations described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and intended to be encompassed by the subject technology. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the above description.

Previous Patent: METHODS OF PRODUCING ACETOIN AND 2,3-BUTANEDIOL USING PHOTOSYNTHETIC MICROORGANISMS

Next Patent: MANAGING ITEM QUERIES