Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DIGITAL HEALTH PROGNOSTIC ANALYZER FOR MULTIPLE MYELOMA MORTALITY PREDICTIONS
Document Type and Number:
WIPO Patent Application WO/2018/081696
Kind Code:
A1
Abstract:
Computer-implemented systems and methods are provided for constructing a numerical model to generate a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time. Reference data for a plurality of patients diagnosed with multiple myeloma is received. First variables selected from the reference data are deemed predictive of mortality for a first predetermined period of time, and second variables selected from the reference data are deemed predictive of mortality for a second predetermined period of time. A first computer model comprising a combination of variables of the first selected variables and first weighting factors is generated. A second computer model comprising a combination of variables of the second selected variables and second weighting factors is generated. The first computer model and the second computer model are trained using the reference data to determine numerical values for the respective first and second weighting factors.

Inventors:
SRINIVASAN SHANKAR (US)
ELION-MBOUSSA ALBERT (US)
Application Number:
PCT/US2017/059008
Publication Date:
May 03, 2018
Filing Date:
October 30, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CELGENE CORP (US)
International Classes:
G01N33/48
Foreign References:
US20100057651A12010-03-04
US20120237488A12012-09-20
US20160053327A12016-02-25
US20100125462A12010-05-20
US20120107862A12012-05-03
Attorney, Agent or Firm:
PEARSON, Douglas et al. (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A computer-implemented method of constructing a computer model to generate a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time, the method comprising:

receiving reference data for a plurality of patients diagnosed with multiple myeloma, the reference data comprising for respective patients of the plurality of patients (i) data for variables of a set of patient variables, and (ii) survival data indicative of an amount of time between the patient's cancer diagnosis and the patient's death or between the cancer diagnosis and a date at which the patient is last known to be alive;

generating multiple candidate computer models comprising different combinations of the variables of the set of patient variables, each of the candidate computer models including multiple weighting factors associated with the variabl es, each variable of each candidate computer model having an associated weighting factor;

conducting multiple computerized numerical regression analyses for the multiple candidate computer models based on the data for the variables and the survival data to determine first selected variables and second selected variables from the set of patient variables, the first selected variables satisfying one or more selection criteria to be deemed predictive of mortality for a first predetermined period of time for patients diagnosed with multiple myeloma and the second selected variables satisfying one or more selection criteria deemed to be predictive of mortality for a second predetermined period of time for patients diagnosed with multiple myeloma; generating a first computer model comprising a combination of variables of the first selected variables and first weighting factors associated with the respective first selected variables;

generating a second computer model comprising a combination of variables of the second selected variables and second weighting factors associated with the respective selected second variables;

training the first computer model and the second computer model with a processing system using the reference data to determine numerical values for the respective first and second weighting factors; and

updating the first computer model and the second computer model to include the determined numerical values for the first weighting factors and the second weighting factors for each selected variable of the first and second selected variables such that the first computer model is configured to generate probability data that a patient satisfying certain first selectable criteria will die within the first predetermined period of time and such that the second computer model is configured to generate probability data that a patient satisfying certain second selectable criteria will die within the second predetermined period of time.

2. The computer-implemented method of claim 1 , wherein the first selected variables comprise a first variable indicative of the patient's age, a second variable indicative of the patient's Eastern Cooperative Oncology Group (ECOG) performance status, a third variable indicative of the patient's history of hypertension, a fourth variable indicative of a stage of the patient's multiple myeloma disease, a fifth variable indicative of whether the patient has renal insufficiency, a sixth variable indicative of the patient's platelet count, and a seventh variable indicative of the patient's mobility.

3. The computer-implemented method of claim 1 or 2, wherein the second selected variables comprise a first variable indicative of the patient's age, a second variable indicative of the patient's mobility, a third variable indicative of the patient's Del(l TP) from FISH and cytogenetic forms, a fourth variable indicative of a stage of the patient's multiple myeloma disease, a fifth variable indicative of the patient's platelet count, a sixth variable indicative of whether the patient has a history of solitary plasmacytoma, a seventh variable indicative of the patient' s ECOG performance status, an eighth variable indicative of the patient's history of diabetes, a ninth variable indicative of whether the patient has renal insufficiency, and a tenth variable indicative of whether the patient has used triplet therapy.

4. The computer- implemented method of any one of the preceding claims, comprising validating the first and second computer models with testing using additional independent data, not used in training the first, and second computer models.

5. The computer-implemented method of any one of the preceding claims, comprising providing a graphical user interface with selectable input fields adapted to receive input information from a user, the processing system processing the input information and numerical data of at least one of the first computer model and the second computer model so as to render to the user a probability that the patient will die within at least one of the first predetermined time and the second predetermined time.

6. The computer- implemented method of any one of the preceding claims, wherein the determining of the first selected variables and the second selected variables comprises: analyzing each variable of the set of patient variables independently of the other variables to determine variables that have a degree of univariate association with patient death within the first predetermined period of time that is above a threshold, and

analyzing each variable of the set of patient variables independently of the other variables to determine variables that have a degree of univariate association with patient death within the second predetermined period of time that is above the threshold.

7. The computer- implemented method of any one of the preceding claims, wherein the training of the first computer model and the second computer model comprises:

processing the reference data to determine, for patients represented in the reference data, numerical measures for respective variables of the first selected variables, and conducting a first computerized numerical regression analysis based on the determined numerical measures to determine the first weighting factors; and

processing the reference data to determine, for patients represented in the reference data, numerical measures for respective variables of the second selected variables, and conducting a second computerized numerical regression analysis based on the determined numerical measures to determine the second weighting factors.

8. The computer- implemented method of any one of the preceding claims, further comprising:

determining variables of the first and second selected variables for which an amount of data missing from the reference data is above a predetermined amount; and prior to the training of the first and second computer models, performing an imputation procedure to impute data for the variables having the amount of data missing above the predetermined amount.

9. A system for constructing a numerical model to generate a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time, the system comprising:

a processing system; and

computer-readable memory in communication with the processing system, encoded with instructions for commanding the processing system to execute steps comprising:

receiving reference data for a plurality of patients diagnosed with multiple myeloma, the reference data comprising for respective patients of the plurality of patients (i) date, for variables of a set of patient variables, and (ii) survival data indicative of an amount of time between the patient's cancer diagnosis and the patient's death or between the cancer diagnosis and a date at which the patient is last known to be alive;

generating multiple candidate computer models comprising different combinations of the variables of the set of patient variables, each of the candidate computer models including multiple weighting factors associated with the variables, each variable of each candidate computer model having an associated weighting factor,

conducting multiple computerized numerical regression analyses for the multiple candidate computer models based on the data for the variables and the survival data to determine first selected variables and second selected variables from the set of patient variables, the first selected variables satisfying one or more selection criteria to be deemed predictive of mortality for a first predetermined period of time for patients diagnosed with multiple myeloma and the second selected variables satisfying one or more selection criteria deemed to be predictive of mortality for a second predetermined period of time for patients diagnosed with multiple myeloma;

generating a first computer model comprising a combination of variables of the first selected variables and first weighting factors associated with the respective first selected variables;

generating a second computer model comprising a combination of variables of the second selected variables and second weighting factors associated with the respective selected second variables,

training the first computer model and the second computer model using the reference data to determine numerical values for the respective first and second weighting factors, and

updating the first computer model and the second computer model to include the determined numerical values for the first weighting factors and the second weighting factors for each selected variable of the first and second selected variables such that the first computer model is configured to generate probability data that a patient satisfying certain first selectable criteria will die within the first predetermined period of time and such that the second computer model is configured to generate probability data that a patient satisfying certain second selectable criteria will die within the second predetermined period of time.

10. The computer- implemented system of claim 9, wherein the first selected variables comprise a first variable indicative of the patient's age, a second variable indicative of the patient's ECOG performance status, a third variable indicative of the patient's history of hypertension, a fourth variable indicative of a stage of the patient's multiple myeloma disease, a fifth variable indicative of whether the patient has renal insufficiency, a sixth variable indicative of the patient's platelet count, and a seventh variable indicative of the patient's mobility.

11. The computer- implemented system of claim 9 or 10, wherein the second selected variables comprise a first variable indicative of the patient's age, a second variable indicative of the patient's mobility, a third variable indicative of the patient's Del(17P) from FISH and cytogenetic forms, a fourth variable indicative of a stage of the patient's multiple myeloma disease, a fifth variable indicative of the patient's platelet count, a sixth variable indicative of whether the patient has a history of solitary plasmacytoma, a seventh variable indicative of the patient's ECOG performance status, an eighth variable indicative of the patient's history of diabetes, a ninth variable indicative of whether the patient has renal insufficiency, and a tenth variable indicative of whether the patient has used triplet therapy.

12. The computer- implemented system of any one of claims 9 to 11, wherein the steps comprise:

validating the first and second computer models with testing using additional independent data not used in training the first and second computer models.

13. The computer- implemented system of any one of claims 9 to 12, wherein the steps comprise:

providing a graphical user interface with selectable input fields adapted to receive input information from a user; and

processing the input information and numerical data, of at least one of the first computer model and the second computer model so as to render to the user a probability that the patient will die within at least one of the first predetermined time and the second predetermined time.

14. The system of any one of claims 9 to 13, wherein the determining of the first selected variables and the second selected variables comprises:

analyzing each variable of the set of patient variables independently of the other variables to determine variables that have a degree of univariate association with patient death within the first predetermined period of time that is above a threshold; and

analyzing each variable of the set of patient variables independently of the other variables to determine variables that have a degree of univariate association with patient death within the second predetermined period of time that is above the threshold.

15. The system of any one of claims 9 to 14, wherein the training of the first computer model and the second computer model comprises:

processing the reference data to determine, for patients represented in the reference data, numerical measures for respective variables of the first selected variables, and conducting a first computerized numerical regression analysis based on the determined numerical measures to determine the first weighting factors; and

processing the reference data to determine, for patients represented in the reference data, numerical measures for respective variables of the second selected variables, and conducting a second computerized numerical regression analysis based on the determined numerical measures to determine the second weighting factors.

16. The system of any one of claims 9 to 15, wherein the steps further comprise:

determining variables of the first and second selected variables for which an amount of data missing from the reference data is above a predetermined amount; and prior to the training of the first and second computer models, performing an imputation procedure to impute data for the variables having the amount of data missing above the predetermined amount.

17. A non-transitory computer-readable storage medium for constructing a numerical model to generate a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time, the computer-readable storage medium comprising computer executable instructions which, when executed, cause a processing system to execute steps comprising:

receiving reference data for a plurality of patients diagnosed with multiple myeloma, the reference data comprising for respective patients of the plurality of patients (i) data for variables of a set of patient variables, and (ii) survival data indicative of an amount of time between the patient's cancer diagnosis and the patient's death or between the cancer diagnosis and a date at which the patient is last known to be alive;

generating multiple candidate computer models comprising different combinations of the variables of the set of patient variables, each of the candidate computer models including multiple weighting factors associated with the variabl es, each variable of each candidate computer model having an associated weighting factor;

conducting multiple computerized numerical regression analyses for the multiple candidate computer models based on the data for the variables and the survival data to determine first selected variables and second selected variables from the set of patient variables, the first selected variables satisfying one or more selection criteria to be deemed predictive of mortality for a first predetermined period of time for patients diagnosed with multiple myeloma and the second selected variables satisfying one or more selection criteria deemed to be predictive of mortali ty for a second predetermined period of time for patients diagnosed with multiple myeloma;

generating a first computer model comprising a combination of variables of the first selected variables and first weighting factors associated with the respective first selected variables; generating a second computer model comprising a combination of variables of the second selected variables and second weighting factors associated with the respective selected second variables;

training the first computer model and the second computer model using the reference data to determine numerical values for the respective first and second weighting factors; and

updating the first computer model and the second computer model to include the determined numerical values for the first weighting factors and the second weighting factors for each selected variable of the first and second selected variables such that the first computer model is configured to generate probability data that a patient satisfying certain first selectable criteria will die within the first predetermined period of time and such that the second computer model is configured to generate probability data that a patient satisfying certain second selectable criteria will die within the second

predetermined period of time.

18. The non-transitory computer-readable storage medium of claim 17, wherein the first, selected variables comprise a first variable indicative of the patient's age, a second variable indicative of the patient's ECOG performance status, a third variable indicative of the patient's history of hypertension, a fourth variable indicative of a stage of the patient's multiple myeloma disease, a fifth variable indicative of whether the patient has renal insufficiency, a sixth variable indicative of the patient's platelet count, and a seventh variable indicative of the patient's mobility.

19. The non-transitory computer-readable storage medium of claim 17 or 18, wherein the second selected variables comprise a first variable indicative of the patient's age, a second variable indicative of the patient's mobility, a third variable indicative of the patient's Del(17P) from FISH and cytogenetic forms, a fourth variable indicative of a stage of the patient's multiple myeloma disease, a fifth variable indicative of the patient's platelet count, a sixth variable indicative of whether the patient has a history of solitary plasmacytoma, a seventh variable indicative of the patient's ECOG performance status, an eighth variable indicative of the patient's history of diabetes, a ninth variable indicative of whether the patient has renal insufficiency, and a tenth variable indicative of whether the patient has used triplet therapy.

20. The non-transitory computer-readable storage medium of any one of claims 17 to 19, wherein the steps comprise:

validating the first and second computer models with testing using additional independent data not used in training the first and second computer models.

21. The non-transitory computer-readable storage medium of any one of claims 17 to 20, wherein the steps comprise:

providing a graphical user interface with selectable input fields adapted to receive input information from a user; and

processing the input information and numerical data of at least one of the first computer model and the second computer model so as to render to the user a probability that the patient will die within at least one of the first predetermined time and the second predetermined time.

22. The non-transitory computer-readable storage medium of any one of claims 17 to 21, wherein the determining of the first selected variables and the second selected variables comprises:

analyzing each variable of the set of patient variables independently of the other variables to determine variables that hav e a degree of univariate association with patient death within the first predetermined period of time that is above a threshold; and

analyzing each variable of the set of patient variables independently of the other variables to determine variables that have a degree of univariate association with patient death within the second predetermined period of time that is above the threshold.

23. The non-transitory computer-readable storage medium of any one of claims 1 7 to 22, wherein the training of the first computer model and the second computer model comprises:

processing the reference data to determine, for patients represented in the reference data, numerical measures for respective variables of the first selected variables, and conducting a first computerized numerical regression analysis based on the determined numerical measures to determine the first weighting factors; and

processing the reference data to determine, for patients represented in the reference data, numerical measures for respective variables of the second selected variables, and conducting a second computerized numerical regression analysis based on the determined numerical measures to determine the second weighting factors.

24. The non-transitory computer-readable storage medium of any one of claims 17 to 23, wherein the steps further comprise:

determining variables of the first and second selected variables for which an amount of data missing from the reference data is above a predetermined amount: and prior to the training of the first and second computer models, performing an imputation procedure to impute data for the variables having the amount of data missing above the predetermined amount.

25. A computer- implemented method of generating a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time, the method comprising:

receiving input data for a patient diagnosed with multiple myeloma, the input data comprising data for multiple variables of a set of patient variables;

processing the input data with a processing system to determine a first numerical measure indicative of an age of the patient;

processing the input data with the processing system to determine a second numerical measure indicative of a stage of the patient's multiple myeloma disease; processing the input data with the processing system to determine a third numerical measure indicative of the patient's mobility; and applying a numerical model associated with a predetermined period of time to the first numerical measure, the second numerical measure, and the third numerical measure to determine a probability that the patient will die within the predetermined period of time, the numerical model including

a first variable and an associated first weighting factor, the first variable receiving a value of the first numerical measure,

a second variable and an associated second weighting factor, the first variable receiving a value of the second numerical measure, and

a third variable and an associated third weighting factor, the third variable receiving a value of the third numerical measure.

26. The computer- implemented method of claim 25, the numerical model including additional variables that receive values of additional numerical measures determined from the input data including one or more numerical measures indicative of one or more of the patient's history of hypertension, ECOG performance status, renal sufficiency, plateiet count, history of diabetes, Del(17P) from FISH and cytogenetic forms, solitary plasmacytoma, and triplet therapy use.

27. The computer- implemented method of claim 25 or 26, further comprising:

processing the input data with the processing system to determine a fourth numerical measure indicative of the patient's platelet count; processing the input data with the processing system to determine a fifth numerical measure indicative of whether the patient has renal insufficiency or a history of diabetes or hypertension; and

applying the numerical model to the fourth numerical measure and the fifth numerical measure to determine the probability, the numerical model including

a fourth variable and an associated fourth weighting factor, the fourth variable receiving a value of the fourth numerical measure, and

a fifth variable and an associated fifth weighting factor, the fifth variable receiving a value of the fifth numerical measure.

28. The computer- implemented method of any one of the claims 25 to 27, further comprising:

processing the input data with the processing system to determine a fourth numerical measure indicative of an ECOG performance status of the patient; and

applying the numerical model to the fourth numerical measure to determine the probability, the numerical model including a fourth variable and an associated fourth weighting factor, the fourth variable receiving a value of the fourth numerical measure.

29. A system for generating a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time, the system comprising:

a processing system; and computer-readable memory in communication with the processing system encoded with instructions for commanding the processing system to execute steps comprising:

receiving input data for a patient diagnosed with multiple myeloma, the input data comprising data for multiple variables of a set of patient variables;

processing the input data to determine a first numerical measure indicative of an age of the patient:

processing the input data to determine a second numerical measure indicative of a stage of the patient's multiple myeloma disease;

processing the input data to determine a third numerical measure indicative of the patient's mobility; and

applying a numerical model associated with a predetermined period of time to the first numerical measure, the second numerical measure, and the third numerical measure to determine a probability that the patient will die within the predetermined period of time, the numerical model including

a first variable and an associated first weighting factor, the first variable receiving a value of the first numerical measure,

a second variable and an associated second weighting factor, the first variable receiving a value of the second numerical measure, and

a third variable and an associated third weighting factor, the third variable receiving a value of the third numerical measure.

30. The system of ciaim 29, the numerical model including additional variables that receive values of additional numerical measures determined from the input data including one or more numerical measures indicative of one or more of the patient's history of hypertension, ECOG performance status, renal sufficiency, platelet count, history of diabetes, Del(lTP) from FISH and cytogenetic forms, solitary plasmacytoma, and triplet therapy use.

31. The system of claim 29 or 30, wherein the steps further comprise:

processing the input data to determine a fourth numerical measure indicative of the patient's platelet count;

processing the input data to determine a fifth numerical measure indicative of whether the patient has renal insufficiency or a history of diabetes or hypertension; and applying the numerical model to the fourth numerical measure and the fifth numerical measure to determine the probability, the numerical model including

a fourth variable and an associated fourth weighting factor, the fourth variable receiving a value of the fourth numerical measure, and

a fifth variable and an associated fifth weighting factor, the fifth variable receiving a value of the fifth numerical measure.

32. The system of any one of the claims 29 to 31, wherein the steps further comprise: processing the input data to determine a fourth numerical measure indicative of an ECOG performance status of the patient; and applying the numerical model to the fourth numerical measure to determine the probability, the numerical model including a fourth variable and an associated fourth weighting factor, the fourth variable recei ving a value of the fourth numerical measure.

33. A non-transitory computer-readable storage medium for generating a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time, the computer-readable storage medium comprising computer executable instructions which, when executed, cause a processing system to execute steps comprising:

receiving input data for a patient diagnosed with multiple myeloma, the input data comprising data for multiple variables of a set of patient variables;

processing the input data to detemiine a first numerical measure indicative of an age of the patient;

processing the input data to detemiine a second numerical measure indicative of a stage of the patient' s multiple myeloma disease,

processing the input data to determine a third numerical measure indicative of the patient's mobility; and

applying a numerical model associated with a predetermined period of time to the first numerical measure, the second numerical measure, and the third numerical measure to determine a probability that the patient will die within the predetermined period of time, the numerical model including

a first variable and an associated first weighting factor, the first variable receiving a value of the first numerical measure, a second variable and an associated second weighting factor, the first variable receiving a value of the second numerical measure, and

a third variable and an associated third weighting factor, the third variable receiving a value of the third numerical measure.

34. The non-transitory computer-readable storage medium of claim 33, the numerical model including additional variables that receive values of additional numerical measures determined from the input data including one or more numerical measures indicative of one or more of the patient's history of hypertension, ECO G performance status, renal sufficiency, platelet count, history of diabetes, Del(17P) from FISH and cytogenetic forms, solitary plasmacytoma, and triplet therapy use.

35. The non-transitory computer-readable storage medium of claim 33 or 34, wherein the steps further comprise:

processing the input data to determine a fourth numerical measure indicative of the patient's platelet count;

processing the input data to determine a fifth numerical measure indicative of whether the patient has renal insufficiency or a history of diabetes or hypertension; and applying the numerical model to the fourth numerical measure and the Fifth numerical measure to determine the probability, the numerical model including

a fourth variable and an associated fourth weighting factor, the fourth variable receiving a value of the fourth numerical measure, and a fifth variable and an associated fifth weighting factor, the fifth variable receiving a value of the fifth numerical measure.

36. The non-transitory computer-readable storage medium of any one of the claims 33 to 35, wherein the steps further comprise:

processing the input data to determine a fourth numerical measure indicative of an ECOG performance status of the patient; and

applying the numerical model to the fourth numerical measure to determine the probability, the numerical model including a fourth variable and an associated fourth weighting factor, the fourth variable receiving a value of the fourth numerical measure.

Description:
DIGITAL HEALTH PROGNOSTIC ANALYZER FOR MULTIPLE MYELOMA MORTALITY PREDICTIONS

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to U.S. Provisional Patent Application No. 62/414,975, filed October 31, 2016, entitled "Digital Health Prognostic Analyzer for Multiple Myeloma Mortality Predictions," which is incorporated herein by reference in its entirety.

FIELD

[0ΘΘ2] The present disclosure relates to computer-based systems and methods for predicting likelihoods of near-term and long-term mortality in medical patients and, more particularly, relates to technology involving computer models for predicting mortality in patients diagnosed with multiple myeloma.

BACKGROUN D

[0003] Multiple myeloma is a cancer formed by malignant plasma cells. Healthy plasma cells help humans fight infections by making antibodies that recognize and attack genns. Multiple myeloma causes cancer cells to accumulate in the bone marrow, where the cancer cells crowd out healthy blood ceils. Instead of producing the antibodies for attacking germs, the cancer cells produce abnormal proteins that can cause various problems (e.g., kidney problems). Present approaches for predicting mortality for multiple myeloma patients may involve the Revised international Staging System (ISS) based upon sophisticated numerical models, such as described in "Revised International Staging System for Multiple Myeloma: A Report From International Myeloma Working Group," A. Pal umbo, et ai., J Clin

Oncol 2015, 33 :2863-2869, which models are complex and require the use of computer processing, e.g., in carrying out a K-adaptive partitioning algorithm among other numerical approaches. However, the present inventors have observed that the numeri cal computer models of the technological approaches noted above are too constrained, suffer from limits in the numbers and types of numerical variables and predictors, and ultimately provide only a crude, qualitative prediction of patient mortality and not specific numerical predictions.

SUMMARY

[0004] Inventive computer models involving numerical algorithms described herein provide technicai solutions that may overcome the technological problems mentioned above by, for example, providing a model that is not limited in the numbers and types of numerical variables and predictors and ultimately provides specific numerical predictions instead of qualitative predictions. The present disclosure provides computer-implemented systems and methods for constructing a numerical model to generate a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time. The present disclosure further provides computer-implemented systems and methods for generating a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time. The methodologies provided herein provide a technical effect and improved technical performance of enhanced precision and accuracy of final results and intermediate results over prior computational methods. Further, the methodologies provided herein, by narrowing the universe of variables to those that are screened as the most important or most predictive, provide the technical effect of improved technical performance by pennitting the computational models to be trained more quickly, using less computational resources, less memory and less bandwidth, than would be required for significantly more variables, and permit the same technical enhancements when executing the finally trained model. These technical effects are explained in further detail below.

[00Θ5] In an example, a computer-implemented method for constructing a numerical model to generate a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time is described. Reference data for a plurality of patients diagnosed with multiple myeloma is received. The reference data comprises for respective patients of the plurality of patients (i) data for variables of a set of patient variables, and (ii) survival data indicative of an amount of time between the patient's cancer diagnosis and the patient' s death or between the cancer diagnosis and a date in a database when the patient is last known to be alive. Multiple candidate computer models comprising different combinations of the variables of the set of patient variables are generated. Each of the candidate computer models includes multiple weighting factors associated with the variables, and each variable of each candidate computer model has an associated weighting factor. Multiple computerized numerical regression analyses for the multiple candidate computer models are conducted based on the data for the variables and the survival data to determine first selected variables and second selected variables from the set of patient variables. The first sel ected variables satisfy one or more selection criteria to be deemed predictive of mortality for a first predetermined period of time for patients diagnosed with multiple myeloma, and the second selected variables satisfy one or more selection criteria deemed to be predictive of mortality for a second predetermined period of time for patients diagnosed with multiple myeloma. A first computer model compri sing a combination of variables of the first selected variables and first weighting factors associated with the respective first selected variables is generated. A second computer model comprising a combination of variables of the second selected variables and second weighting factors associated with the respective selected second variables is generated. The first computer model and the second computer model are trained using the reference data to determine numerical values for the respective first and second weighting factors. The first computer model and the second computer model are updated to include the determined numerical values for the first weighting factors and the second weighting factors for each selected variable of the first and second selected variables such that the first computer model is configured to generate probability data that a patient satisfying certain first selectable criteria will die within the first predetermined period of time and such that the second computer model is configured to generate probability data that a patient satisfying certain second selectable criteria will die within the second predetermined period of time. The conducting of the multiple computerized numerical regression analyses based on the data for the variables and the survival data to determine the first and second selected variables implements a more sophisticated variable selection than prior computational methods and thus provides a technical effect and improved technical performance of enhanced precision and accuracy of final results and intermediate results generated by the first and second computer models over the prior computational methods. This technical effect is achieved without further disadvantages {e.g., increase of computati on time, need for additional computational resources, etc.). Further, by conducting the multiple computerized numerical regression analyses to determine the first and second selected variables and generating the first and second computer models based on the first and second selected variables, respectively, the universe of variables is narrowed to those that are screened as most important or predictive, thus providing the technical effect of improved technical performance by permitting the first and second computer models to be trained more quickly, using less computational resources, less memory and less bandwidth, than would be required for significantly more variables, and permits the same technical enhancements when executing the finally trained first and second computer models.

[0006] In an example, the first selected variables include a first variable indicative of the patient's age, a second variable indicative of the patient' s Eastern Cooperative Oncology Group (ECOG) performance status, a third variable indicative of the patient's history of hypertension, a fourth variable indicative of a stage of the patient' s multiple myeloma disease, a fifth variable indicative of whether the patient has renal insufficiency, a sixth variable indicative of the patient's platelet count, and a seventh variable indicative of the patient' s mobility. In an example, the second selected variables include a first variable indicative of the patient' s age, a second variable indicative of the patient' s mobility, a third variable indicative of the pati ent' s Del(17P) from FISH and cytogenetic forms, a fourth variable indicative of a stage of the patient' s multiple myeloma disease, a fifth variable indicative of the patient's platelet count, a sixth variable indicative of whether the patient has a hi story of solitary plasmacytoma, a seventh variable indicative of the patient' s ECOG performance status, an eighth variable mdicative of the patient's history of diabetes, a ninth variable indicative of whether the patient has renal insufficiency, and a tenth variable indicative of whether the patient has used triplet therapy. The use of the first and second computer models including these variables implements a more sophisticated set of variables than prior computational methods and thus provides a technical effect and improved technical performance of enhanced preci sion and accuracy of final results and intermediate results generated by the first and second computer models over the prior computational methods. This technical effect is achieved without further disadvantages (e.g., increase of computation time, need for additional computational resources, etc.). Further, by conducting the multiple computerized numerical regression analyses to narrow the universe of variables to these particular variables, this provides the technical effect of improved technical performance by permitting the first and second computer models to be trained more quickly, using less computational resources, less memory and less bandwidth, than would be required for significantly more variables, and permits the same technical enhancements when executing the finally trained first and second computer models.

[0007] In an example, the computer-implemented method includes validating the first and second computer models with testing using additional independent data not used in training the first and second computer models. Further, in an example, the computer-implemented method includes providing a graphical user interface with selectable input fields adapted to receive input information from a user, the processing system processing the input information and numerical data of at least one of the first computer model and the second computer model so as to render to the user a probability that the patient will die within at least one of the first predetermined time and the second predetermined time. By conducting the multiple computerized numerical regression analyses to determine the first and second selected variables and generating the first and second computer models based on the first and second selected variables, respectively, this narrows the universe of variables and thus provides the technical effect of improved technical performance by requiring less input information from the user, such that a smaller amount of input data is processed to render the probability. Processing the smaller amount of input data enables the probability to be rendered more quickly, using less computational resources, less memory and less bandwidth, than would be required for a larger amount of input data. [0008] In an example, the determining of the first selected variables and the second selected variables in the computer-implemented method includes analyzing each variable of the set of patient variables independently of the other variables to determine variables that have a degree of univariate association with patient death within the first predetermined period of time that is above a threshold, and analyzing each variable of the set of patient variables independently of the other variables to determine variables that have a degree of univariate association with patient death within the second predetermined period of time that is above the threshold. The independent analyses of each variable of the set of patient variables to determine the first and second selected variables that have a degree of univariate association with patient death that is above a threshold implements a more sophisticated variable selection than prior computational methods and thus provides a technical effect and improved technical performance of enhanced precision and accuracy of final results and intermediate results generated by the first and second computer models over the prior computational methods,

[0009] In an example, the training of the first computer model and the second computer model includes processing the reference data to determine, for patients represented in the reference data, numerical measures for respective variables of the first selected variables, and conducting a first computerized numerical regression analysis based on the determined numerical measures to determine the first weighting factors. The training of the first computer model and the second computer model further includes processing the reference data to determine, for patients represented in the reference data, numerical measures for respective variables of the second selected variables, and conducting a second computerized numerical regression analysis based on the determined numerical measures to determine the second weighting factors. In an example, the computer-implemented method further includes determining variables of the first and second selected variables for which an amount of data missing from the reference data is above a predetermined amount, and prior to the training of the first and second computer models, performing an imputation procedure to impute data for the variables having the amount of data missing above the predetermined amount.

[0010] An exemplar}' system for constructing a numerical model to generate a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time includes a processing system and computer-readable memory in communication with the processing system encoded with instructions for commanding the processing system to execute steps. In executing the steps, reference data for a plurality of patients diagnosed with multiple myeloma i s received. The reference data comprises for respective patients of the plurality of patients (i) data for variables of a set of patient vari ables, and (ii) survival data indicative of an amount of time between the patient's cancer diagnosis and the patient' s death or between the cancer diagnosis and a date at which the patient is last known to be alive. Multiple candidate computer models comprising different combinations of the variables of the set of patient variables are generated. Each of the candidate computer models includes multiple weighting factors associated with the variables, and each variable of each candidate computer model has an associated weighting factor. Multiple computerized numerical regression analyses for the multiple candidate computer models are conducted based on the data for the variables and the survival data to determine first selected variables and second selected variables from the set of patient variables. The first selected variables satisfy one or more selection criteria to be deemed predictive of mortality for a first predetermined period of time for patients diagnosed with multiple myeloma, and the second selected variables satisfy one or more selection criteria deemed to be predictive of mortality for a second

predetermined period of time for patients diagnosed with multiple myeloma. A first computer model compri sing a combination of variables of the first selected variables and first weighting factors associated with the respective first selected variables is generated. A second computer model comprising a combination of variables of the second selected variables and second weighting factors associated with the respective selected second variables is generated. The first computer model and the second computer model are trained using the reference data to determine numerical values for the respective first and second weighting factors. The first computer model and the second computer model are updated to include the determined numerical values for the first weighting factors and the second weighting factors for each selected variable of the first and second selected variables such that the first computer modei is configured to generate probability data that a patient satisfying certain first selectable criteria will die within the first predetermined period of time and such that the second computer model is configured to generate probability data that a patient satisfying certain second selectable criteria will die within the second predetermined period of time. The conducting of the multiple computeri zed numerical regression analyses based on the data for the variables and the survival data to determine the first and second selected variables implements a more sophisticated variable selection than prior systems and thus provides a technical effect and improved technical performance of enhanced preci sion and accuracy of final results and intermediate results generated by the first and second computer models over the prior systems. This technical effect is achieved without further disadvantages (e.g., increase of computation time, need for additional computational resources, etc.). Further, by conducting the multiple computerized numerical regression analyses to determine the first and second selected variables and generating the first and second computer models based on the first and second selected variables, respectively, the universe of variables is narrowed to those that are screened as most important or predictive, thus providing the technical effect of improved technical performance by permitting the first and second computer models to be trained more quickly, using less

computational resources, less memory and less bandwidth, than would be required for significantly more variables, and permits the same technical enhancements when executing the finally trained first and second computer models.

[0011] In an example system, the first selected variables include a first variable indicative of the patient's age, a second variable indicative of the patient' s Eastern Cooperative Oncology Group (ECOG) performance status, a third variable indicative of the patient's history of hypertension, a fourth variable indicative of a stage of the patient' s multiple myeloma disease, a fifth variable indicative of whether the patient has renal insufficiency, a sixth variable indicative of the patient's platelet count, and a seventh variable indicative of the patient's mobility. In an example, the second selected variables include a first variable indicative of the patient's age, a second variable indicative of the patient' s mobility, a third variable indicative of the patient's Del(17P) from FISH and cytogenetic forms, a fourth variable indicative of a stage of the patient' s multiple myeloma disease, a fifth variable indicative of the patient's platelet count, a sixth variable indicative of whether the patient has a history of solitary plasmacytoma, a seventh variable indicative of the patient's ECOG performance status, an eighth variable indicative of the patient's history of diabetes, a ninth variable indicative of whether the patient has renal insufficiency, and a tenth variable indicative of whether the patient has used triplet therapy. The use of the first and second computer models including these variables implements a more sophisticated set of variables than prior computational methods and thus provides a technical effect and improved technical performance of enhanced precision and accuracy of final results and intermediate results generated by the numerical model over the prior systems. Further, by conducting the multiple computerized numerical regression analyses to narrow the universe of variables to these particular variables, this provides the technical effect of improved technical performance by permitting the first and second computer models to be trained more quickly, using less

computational resources, less memory and less bandwidth, than would be required for significantly more variables, and permits the same technical enhancements when executing the finally trained first and second computer models.

[0012] In an example, the computer-readable memory of the system is encoded with instructions for commanding the processing system to execute the steps including validating the first and second computer models with testing using additional independent data not used in training the first and second computer models, in an example, the steps further include providing a graphical user interface with selectable input fields adapted to receive input information from a user, and processing the input information and numerical data of at least one of the first computer model and the second computer model so as to render to the user a probability that the patient will die within at least one of the first predetermined time and the second predetermined time. In an example of the system, the determining of the first selected variables and the second selected variables includes analyzing each variable of the set of patient variables independently of the other variables to determine variables that have a degree of univariate association with patient death within the first predetermined period of time that is above a threshold, and analyzing each variable of the set of patient variables independently of the other variables to determine variables that have a degree of univariate association with patient death within the second predetermined period of time that is above the threshold. In an example of the system, the training of the first computer model and the second computer model includes processing the reference data to determine, for patients represented in the reference data, numerical measures for respective variables of the first selected variables, and conducting a first computerized numerical regression analysis based on the determined numerical measures to determine the first weighting factors, and processing the reference data to determine, for patients represented in the reference data, numerical measures for respective variables of the second selected variables, and conducting a second computerized numerical regression analysis based on the determined numerical measures to determine the second weighting factors. In an example, the computer-readable memory of the system is encoded with instructions for commanding the processing system to execute the steps including determining variables of the first and second selected variables for which an amount of data missing from the reference data is above a predetermined amount, and prior to the training of the first and second computer models, performing an imputation procedure to impute data for the variables having the amount of data missing above the predetermined amount. The above-described operations provide technical effects and improved technical performance for the reasons explained above.

[0013] An exemplary non-transitory computer-readable storage medium for constructing a numerical model to generate a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time comprises computer executable instructions which, when executed, cause a processing system to execute steps. In executing the steps, reference data for a plurality of patients diagnosed with multiple myeloma is received. The reference data comprises for respective patients of the plurality of patients (i) data for variables of a set of patient variables, and (ii) survival data indicative of an amount of time between the patient' s cancer diagnosi s and the pati ent' s death or between the cancer diagnosis and a date at which the patient is last known to be alive. Multiple candidate computer models comprising different combinations of the variables of the set of patient variables are generated. Each of the candidate computer models includes multiple weighting factors associated with the variables, and each variable of each candidate computer model has an associated weighting factor. Multiple computerized numerical regression analyses for the multiple candidate computer models are conducted based on the data for the variables and the survival data to determine first selected variables and second selected variables from the set of patient variables. The first sel ected variables satisfy one or more selection criteria to be deemed predictive of mortality for a first predetermined period of time for patients diagnosed with multiple myeloma, and the second selected vari ables sati sfy one or more selection criteria deemed to be predictive of mortality for a second predetermined period of time for patients diagnosed with multiple myeloma. A first computer model comprising a combination of variables of the first selected variables and first weighting factors associated with the respective first sel ected variables is generated. A second computer model comprising a combination of variables of the second selected variables and second weighting factors associated with the respective selected second variables is generated. The first computer model and the second computer model are trained using the reference data to determine numerical values for the respective first and second weighting factors. The first computer model and the second computer model are updated to include the determined numerical values for the first weighting factors and the second weighting factors for each selected variable of the first and second selected variables such that the first computer model is configured to generate probability data that a patient satisfying certain first selectable criteria will die within the first predetermined period of time and such that the second computer model i s configured to generate probability data that a patient satisfying certain second selectable criteria will die within the second predetermined period of time. The conducting of the multiple computerized numerical regression analyses based on the data for the variables and the survival data to determine the first and second selected variables implements a more sophisticated variable selection than prior non-transitory computer-readable storage mediums and thus provides a technical effect and improved technical performance of enhanced precision and accuracy of final results and intermediate results generated by the first and second computer models over the prior storage mediums. Thi s technical effect is achieved without further

disadvantages (e.g., increase of computation time, need for additional computational resources, etc.). Further, by conducting the multiple computerized numerical regression analyses to determine the first and second selected variables and generating the first and second computer models based on the first and second selected variables, respectively, the universe of variables is narrowed to those that are screened as most important or predictive, thus providing the technical effect of improved technical performance by permitting the first and second computer models to be trained more quickly, using less computational resources, less memory and less bandwidth, than would be required for significantly more variables, and permits the same technical enhancements when executing the finally trained first and second computer models.

[0014] In an example non-transitory computer-readable storage medium, the first selected variables include a first variable indicative of the patient' s age, a second variable indicative of the patient' s Eastern Cooperative Oncology Group (ECOG) performance status, a third variable indicative of the patient's history of hypertension, a fourth variable indicative of a stage of the patient's multiple myeloma disease, a fifth variable indicative of whether the patient has renal insufficiency, a sixth variable indicative of the patient's platelet count, and a seventh variable indicative of the patient's mobility. In an example, the second selected variables include a first variable indicative of the patient's age, a second variable indicative of the patient's mobility, a third variable indicative of the patient' s Del(17P) from FISH and cytogenetic forms, a fourth variable indicative of a stage of the patient's multiple myeloma disease, a fifth variable indicative of the patient's platelet count, a sixth variable indicative of whether the patient has a history of solitary plasmacytoma, a seventh variable indicative of the patient's ECOG performance status, an eighth variable indicative of the patient's history of diabetes, a ninth variable indicative of whether the patient has renal insufficiency, and a tenth variable indicative of whether the patient has used triplet therapy. The use of the first and second computer models including these variables implements a more sophisticated set of variables than prior computational methods and thus provides a technical effect and improved technical performance of enhanced precision and accuracy of final results and intermediate results generated by the numerical model over the prior systems. Further, by conducting the multiple computerized numerical regression analyses to narrow the universe of variables to these particular variables, this provides the technical effect of improved technical performance by permitting the first and second computer models to be trained more quickly, using less computational resources, less memory and less bandwidth, than would be required for significantly more variables, and permits the same technical enhancements when executing the finally trained first and second computer models.

[0015] In an example, the non-transitory computer-readable storage medium comprises computer executable instructions which, when executed, cause the processing system to execute the steps including validating the first and second computer models with testing using additional independent data not used in training the first and second computer models. In an example non-transitory computer- readable storage medium, the steps include providing a graphical user interface with selectable input fields adapted to receive input information from a user and processing the input information and numerical data of at least one of the first computer model and the second computer model so as to render to the user a probability that the patient will die within at least one of the first predetermined time and the second predetermined time. In an example non-transitory computer-readable storage medium, the determining of the first selected variables and the second selected variables includes analyzing each variable of the set of patient variables

independently of the other variables to determine variables that have a degree of univariate association with patient death within the first predetermined period of time that is above a threshold, and analyzing each variable of the set. of patient variabl es independently of the other variables to determine variables that have a degree of univariate association with patient death within the second predetermined period of time that is above the threshold. In an example non-transitory computer-readable storage medium, the training of the first computer model and the second computer model includes processing the reference data to determine, for patients represented in the reference data, numerical measures for respective variables of the first selected variables, and conducting a first computerized numerical regression analysis based on the determined numerical measures to determine the first weighting factors, and processing the reference data to determine, for patients represented in the reference data, numerical measures for respecti ve variables of the second selected variables, and conducting a second computerized numerical regression analysis based on the determined numerical measures to determine the second weighting factors. In an example, the non-transitory computer-readable storage medium comprises computer executable instructions which, when executed, cause the processing system to execute the steps including determining variables of the first and second selected variables for which an amount of data missing from the reference data is above a predetermined amount, and prior to the training of the first and second computer models, performing an imputation procedure to impute data for the variables having the amount of data missing above the predetermined amount. These operations provide technical effects and improved technical performance for the reasons explained above.

[0016] As noted above, the present disclosure also provides computer- implemented systems and methods for generating a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time. In an example computer-implemented method for generating a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time, input data for a patient diagnosed with multiple myeloma i s received. The input data comprises data for multiple variables of a set of patient variables. The input data is processed to determine a first numerical measure indicative of an age of the patient. The input data is processed to determine a second numerical measure indicative of a stage of the patient' s multiple myeloma disease. The input data is processed to determine a third numerical measure indicative of the patient' s mobility. A numerical model associated with a predetermined period of time is applied to the first numerical measure, the second numerical measure, and the third numerical measure to determine a probability that the patient will die within the predetermined period of time. The numeri cal model includes a first variable and an associated first weighting factor, the first variable receiving a value of the first numerical measure. The numerical model also includes a second variable and an associated second weighting factor, the first variable receiving a value of the second numerical measure. The numerical model further includes a third variable and an associated third weighting factor, the third variable receiving a value of the third numerical measure. The application of the numerical model including the first, second, and third variables, configured to receive the first, second, and third numerical measures, respectively, implements a more sophisticated set of variables than prior computational methods and thus provides a technical effect and improved technical performance of enhanced precision and accuracy of final results and intermediate results generated by the numerical model over the prior computational methods.

[0Θ17] In examples, the numerical model may include additional variables that receive values of additional numerical measures determined from the input data including numerical measures indicative of the patient's history of hypertension, performance status, renal sufficiency, platelet count, history of diabetes, Del(I7P) from FISH and cytogenetic forms, hyperdiploidy, extrameduilary plasmacytoma, novel therapy use, triplet therapy use, and solitary plasmacytoma. The application of the numerical model including these additional variables implements a more sophisticated set of variables than prior computational methods and thus provides a technical effect and improved technical performance of enhanced precision and accuracy of final results and intermediate results generated by the numerical model over the prior computational methods. [0018] In an example, the computer-implemented method includes processing the input data with the processing system to determine a fourth numerical measure indicative of the patient's plate! et count, and processing the input data with the processing system to determine a fifth numerical measure indicative of whether the patient has renal insufficiency or a history of diabetes or hypertension. The numerical model is applied to the fourth numerical measure and the fifth numerical measure to determine the probability, where the numerical model includes a fourth variable and an associated fourth weighting factor, the fourth variable receiving a value of the fourth numerical measure and a fifth variable and an associated fifth weighting factor, the fifth variable receiving a value of the fifth numerical measure. The application of the numerical model including the fourth and fifth variables and associated weighting factors implements a more sophisticated set of variables than prior computational methods and thus provides a technical effect and improved technical performance of enhanced precision and accuracy of final results and intermediate results generated by the numerical model over the prior computational methods.

[0019] In an example, the computer-implemented method includes processing the input data with the processing system to determine a fourth numerical measure indicative of a performance status of the patient, and applying the numerical model to the fourth numerical measure to determine the probability, the numerical model including a fourth variable and an associated fourth weighting factor, the fourth variable receiving a value of the fourth numerical measure. The application of the numerical model including the fourth variable and associated weighting factor implements a more sophisticated set of variables than prior computational methods and thus provides a technical effect and improved technical performance of enhanced precision and accuracy of final results and intermediate results generated by the numerical model over the prior computational methods.

[0020] An exemplary system for generating a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time inciudes a processing system and computer-readable memory in communication with the processing system encoded with instructions for commanding the processing system to execute steps. In executing the steps, input data for a patient diagnosed with multiple myeloma is received. The input data comprises data for multiple variables of a set of patient variables. The input data is processed to determine a first numerical measure indicative of an age of the patient. The input data is processed to determine a second numerical measure indicative of a stage of the patient's multiple myeloma disease. The input data is processed to determine a third numerical measure indicative of the patient's mobility. A numerical model associated with a predetermined period of time is applied to the fi rst numerical measure, the second numerical measure, and the third numerical measure to determine a probability that the patient will die within the predetermined period of time. The numerical model includes a first variable and an associated first weighting factor, the first variable receiving a value of the first numerical measure. The numerical model also inciudes a second variable and an associated second weighting factor, the first variable receiving a value of the second numerical measure. The numerical model further includes a third variable and an associated third weighting factor, the third variable receiving a value of the third numerical measure. The application of the numerical model including the first, second, and third variables, configured to receive the first, second, and third numerical measures, respectively, implements a more sophisticated set of variables than prior systems and thus provides a technical effect and improved technical performance of enhanced precision and accuracy of final results and intermediate results generated by the numerical model over the prior systems.

[0021] In examples, the numerical model may include additional variables that receive values of additional numeri cal measures determined from the input data including numerical measures indicative of the patient's history of hypertension, performance status, renal sufficiency, platelet count, history of diabetes, Del(17P) from FISH and cytogenetic forms, hyperdiploidy, extramedullary plasmacytoma, novel therapy use, tripl et therapy use, and solitary plasmacytoma. In an exampl e of the system, the computer-readable memory is encoded with the instructions for commanding the processing system to execute the steps including processing the input data to determine a fourth numerical measure indicative of the patient's platelet count, processing the input data to determine a fifth numerical measure indicative of whether the patient has renal insufficiency or a history of diabetes or hypertension, and applying the numerical model to the fourth numerical measure and the fifth numerical measure to determine the probability. The numerical model includes a fourth variable and an associated fourth weighting factor, the fourth variable receiving a value of the fourth numerical measure, and a fifth variable and an associated fifth weighting factor, the fifth variable receiving a value of the fifth numerical measure. In another example of the system, the computer-readable memory is encoded with the instructions for commanding the processing system to execute the steps including processing the input data to determine a fourth numerical measure indicative of a performance status of the patient, and applying the numerical model to the fourth numerical measure to determine the probability, the numerical model including a fourth variable and an associated fourth weighting factor, the fourth variable receiving a value of the fourth numerical measure. These operations provide technical effects and improved technical performance for the reasons explained above.

[0022] An exemplary non-transitory computer-readable storage medium for generating a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time comprises computer executable instructions which, when executed, cause a processing system to execute steps. In executing the steps, input data for a patient diagnosed with multiple myeloma is received. The input data comprises data for multiple variables of a set of patient variables. The input data is processed to determine a first numerical measure indicative of an age of the patient. The input data is processed to determine a second numerical measure indicative of a stage of the patient' s multiple myeloma disease. The input data is processed to determine a third numerical measure indicative of the patient' s mobility. A numerical model associated with a predetermined period of time is applied to the first numerical measure, the second numerical measure, and the third numerical measure to determine a probability that the patient will die within the predetermined period of time. The numerical model includes a first variable and an associated first weighting factor, the first variable receiving a value of the first numerical measure. The numerical model also includes a second variable and an associated second weighting factor, the first variable receiving a value of the second numerical measure. The numerical model further includes a third variable and an associated third weighting factor, the third variable receiving a value of the third numerical measure. The application of the numerical model including the first, second, and third variables, configured to receive the first, second, and third numerical measures, respectively, implements a more sophisticated set of variables than prior systems and thus provides a technical effect and improved technical performance of enhanced precision and accuracy of final results and intermediate results generated by the numerical model over the prior systems.

[0023] In examples, the numerical model may include additional variables that receive values of additional numeri cal measures determined from the input data including numerical measures indicative of the patient's history of hypertension, performance status, renal sufficiency, platelet count, history of diabetes, Del(17P) from FISH and cytogenetic forms, hyperdiploidy, extramedullary plasmacytoma, novel therapy use, tripl et therapy use, and solitary plasmacytoma. In an exampl e, the non-transitory computer-readable storage medium comprises the computer executable instructions which, when executed, cause the processing system to the execute steps including processing the input data to determine a fourth numerical measure indicative of the patient's pl atelet count, processing the input data to determine a fifth numerical measure indicative of whether the patient has renal insufficiency or a history of diabetes or hypertension, and applying the numerical model to the fourth numerical measure and the fifth numerical measure to determine the probability. The numerical model includes a fourth variable and an associated fourth weighting factor, the fourth variable receiving a value of the fourth numerical measure, and a fifth variable and an associated fifth weighting factor, the fifth variable receiving a value of the fifth numerical measure. In another example, the non-transitory computer- readable storage medium comprises the computer executable instructions which, when executed, cause the processing system to the execute steps including processing the input data to determine a fourth numerical measure indicative of a performance status of the patient, and applying the numerical model to the fourth numerical measure to determine the probability, the numerical model including a fourth variable and an associated fourth weighting factor, the fourth variable receiving a value of the fourth numerical measure. These operations provide technical effects and improved technical performance for the reasons explained above.

[0024] The subject matter described herein provides many technical advantages. As described below, the computer-based techniques of the present disclosure provide processes for constructing a numerical model for predicting death in patients diagnosed with multiple myeloma in an automated manner that requires no human intervention or minima! human intervention. In embodiments described bel ow, the constructing of the numerical model includes (i) processing large amounts of reference data via multiple regression analyses to automatically determine predictors of death in patients diagnosed with multiple myeloma, (ii) performing an imputation process to automatically generate data for variables of the reference data determined to have missing data., and (iii) automatically bui lding and training the numerical model, which includes the predictors of death and associated weighting factors that take into account the relative contributions of each of the predictors. After being generated, the numerical model is applied to new data for a pati ent diagnosed with multiple myeloma to generate a probability that the patient will die within a predetermined period of time. The processes described herein thus enable an accurate, multivariate analysis of a patient' s prognosis to be performed in a relatively fast, automated manner that requires no human intervention or only minimal human intervention and provide technological refinements over existing technological approaches by virtue of more sophisticated variable selection and implementation, such that the approaches described herein may provide improved mortality predictions. These technical advantages and others are described in detai l below.

[0025] The details of one or more variations of the subj ect matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims.

DESCRIPTION OF DRAWINGS

[0026] Fig. 1 is a block diagram illustrating an exemplary system for generating a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time according to an embodiment of the present disclosure.

[0027] Figs. 2A-2H illustrate screenshots of exemplary software utilizing the systems and methods described herein, according to embodiments of the present disclosure.

[0028] Fig. 3A is a flowchart depicting steps of an exemplary method for constructing a numerical computer model to generate a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time.

[0029] Figs. 3B-3D depict exemplary patient variables included in reference data for training the numerical computer model, according to embodiments of the present disclosure.

[0030] Fig. 3E is a table including a summary of inferences from a logistic model developed using multiple imputation.

[0031] Fig. 3F-1 is a table including an exemplary summary of inferences from a Cox model developed using multiple imputation.

[0032] Fig. 3F-2 is a table including another exemplar}' summary of inferences from a Cox model developed using multiple imputation.

[0033] Figs. 3G, 3H-1 , 3H-2, 3H-3, 31-1, 31-2, 31-3, 3J-1, 3J-2, 3J-3, 3K-1 , 3K-1 , and 3K-3 depict exemplar}' prediction matrices generated according to embodiments of the present disclosure. [0034] Figs. 4A and 4B are flowcharts depicting steps of respective exemplary methods for generating a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time.

[0035] Fig. 5 is a flowchart depicting steps of an exemplary method for generating a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time.

[0036] Figs. 6A, 6B, and 6C depict exemplary systems for implementing the techniques described herein.

DETAILED DESCRIPTION

[0037] Fig. 1 is a block diagram 100 illustrating an exemplary system for generating a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time. To generate this probability, the exemplary system of Fig. 1 comprises a computer-based system for automatically processing patient data 102 for the patient, where the patient data 102 comprises data for multiple variables of a set of variables. For instance, the patient data 102 may include data for one or more of the fol lowing variables for the patient: age, performance status (e.g., Eastern Cooperative Oncology Group (ECOG) performance status), whether the patient has a history of diabetes, whether the patient has a history of hypertension, a stage of the patient' s multiple myeloma disease (e.g., International Staging System (1SS) stage of the disease), platelet count, serum creatinine level, a measure of the patient's mobility, etc. The patient data 102 may be provided by a user (e.g., the patient, the patient's doctor, a nurse, other medical practitioner, etc.) via a graphical user interface (GUI) of a software application. Exemplary GUIs for receiving data from the user are described below with reference to Figs. 2A-2F. [0038] The patient data 102 may be received at a data processing module 104 of the computer-based system. Processing performed on the patient data 102 at the data processing module 104 is used to generate one or more numerical measures 108. The processing module 104 may be implemented with a computer processing system comprising one or more central processing units (CPUs) in one computer or distributed among multiple computers in communication with suitable computer memory and programmed to cany out the processing as described herein. The one or more numerical measures 108 may include numerical measures or Boolean values that are representative of aspects of the patient data 102. In embodiments, a numerical measure comprises a value from the patient data 102 (e.g., a value indicative of the patient's age) or a value derived from the patient data 102 (e.g., a value indicative of whether the patient's age is greater than 75 years). Specifically, in embodiments, the one or more numerical measures 108 may include numerical measures that are related to the patient's multiple myeloma disease (e.g., a numerical measure related to a stage of the patient' s multiple myeloma disease, etc.) and al so numerical measures that are not related to the multiple myeloma disease (e.g., numerical measures relating to the patient' s age, health history, etc.). The data processing module 104 generates exemplary numerical measures 1 10 from the pati ent data 102. The exemplary numerical measures 1 10 may include the first, second, and third numerical measures illustrated in Fig. 1 , among others.

[0039] As shown in Fig. 1 , the first exemplary numerical measure is indicative of an age of the patient, where the first numerical measure is determined by processing the patient data 102. For instance, the patient data 102 may include data indicating the patient' s age in years (e.g., 77 years, in an example). In embodiments, the first numerical measure is a Boolean value indicative of whether the patient's age is 75 years or older. Thus, in the above example, the patient's age in years (i.e., 77 years) is processed to generate the first numerical measure indicating that the patient's age is 75 years or greater (e.g., a value of " 1").

[0040] The second exemplar)' numerical measure of Fig, 1 is indicative of a stage of the patient' s multiple myeloma disease, as determined based on the processing of the patient data 102 performed by the data processing module 104. For instance, the patient data 102 may include data indicating the ISS disease stage for the patient' s multiple myeloma disease (e.g., an ISS stage of "III," in an example). In embodiments, the second numerical measure has a first Boolean value when the ISS disease stage is "III" and a second Boolean value when the ISS disease stage is "I" or "II." Thus, in the above example, the patient' s ISS disease stage (i.e. , "Ill") is processed to generate the second numerical measure indicating that the patient's ISS disease stage is "HI," and not 'Τ' or "II." In other embodiments, the second numerical measure may have a first numerical value when the ISS disease stage is "I" (e.g., "0"), a second numerical value when the ISS disease stage is "Π" (e.g. , " i"), and a third numerical value when the ISS disease stage is "III" (e.g. , "2").

[0041] The third exemplary numerical measure of Fig. 1 is indicative of the patient's mobility, as determined based on the processing of the patient data 102 performed by the data processing module 104. For instance, the patient data 102 may include a EuroQol five dimensions questionnaire (EQ-5D) mobility score for the patient. In embodiments, the EQ-5D mobility score is processed to generate a mobility score of "0," " 1 ," or "2" for the patient. The third exemplary numerical measure i s equal to the generated mobility score.

[0042] Other or additional exemplary numerical measures 110 generated from the patient data 102 may include, for example, a numerical measure indicative of the patient's platelet count (e.g., a numerical measure indicative of whether the patient' s platelet count is greater than 150x ' 10 9 /L), a numerical measure indicative of whether the patient has renal insufficiency (e.g., a numerical measure indicative of whether the patient' s serum creatinine is greater than 2 mg/dL), a numerical measure indicative of a performance status of the patient (e.g., a numerical measure indicative of whether the patient' s ECOG performance score is greater than or equal to 2), a numerical measure indicative of whether the patient has a history of hypertension, a numeri cal measure indicative of whether the patient has a history of diabetes, a numerical measure indicative of Del(17P) from FISH and cytogenetic forms, a numerical measure indicative of hyperdiploidy, a numerical measure indicative of extramedullar}'- plasmacytoma, a numerical measure indicative of novel therapy use, a numerical measure indicative of triplet therapy use, and a numerical measure indicative of solitary plasmacytoma.

[0043] The one or more numerical measures 108 determined from the patient data 102 are received at a probability generating engine 1 12, which may be implemented with a computer processing system such as described above for module 104. The probability generating engine 1 12 is configured to determine a probability 1 18 that the patient will die within a predetermined period of time (e.g., a predetermined period of time starting from the date in which the patient was diagnosed with multiple myeloma). In embodiments, the probability 1 18 indicates whether the patient will die within 180 days (e.g., within 180 days of the patient' s multiple myeloma diagnosis), 1 year, 2 years, 3 years, 4 years, or 5 years. In embodiments, the probabi lity generating engine 1 12 i s a computer-based system for automatically generating the probability 1 18 that requires no human intervention or minimal human intervention. The probability generating engine 112 may determine the probability 1 18 based on the numerical measures 108 and a numerical computer model. The numerical computer model includes weighting factors for each of the numeri cal measures 108, and the weighting factors are determined based on reference data 1 14.

[0044] The numerical computer model is applied to the numerical measures 108 to determine the probability 118. In embodiments where the first, second, and third numerical measures 1 10 are generated, the numerical computer model includes a first variable and an associated first weighting factor, a second variable and an associated second weighting factor, and a third variable and an associated third weighting factor. The first variable receives a value of the first numerical measure, the second variable receives a value of the second numerical measure, and the third variable receives a value of the third numerical measure. By applying the numerical computer model to the first, second, and third numerical measures in this manner, the probability 118 for the patient data 102 is determined. It is noted that the numerical computer model may include other or additional variables that receive values for other numerical measures. The other or additional numerical measures may include, for example, numerical measures indicative of the patient's platelet count, Del(17P) from FISH and cytogenetic forms, hyperdiploidy, extramedullar^ plasmacytoma, novel therapy use, triplet therapy use, solitary- pi asm acytoma, performance status, whether the patient has renal insufficiency, whether the patient has a history of hypertension, and whether the patient has a history of diabetes. In generating the probability 1 18, the numerical computer model may be applied to one or more of these other numerical measures including in combination with previously mentioned numerical measures. [0045] To generate the numerical computer model used in the probability generating engine 1 12, a model generation module 106 may be used. The model generation module 106 receives the reference data 1 14 and uses the reference data 1 14 to determine the weighting factors for the model, e.g., using one or more regression analyses, imputation procedures used to add data that is missing from the reference data 114, and a model training procedure, all of which are discussed in further detail below. In embodiments, the reference data 1 14 is data for a plurality of patients diagnosed with multiple myeloma. Specifically, in embodiments, the reference data includes for respective patients of the plurality of patients (i) data for multiple variables of a set of patient variables, and (ii) survival data indicative of an amount of time between the patient's multiple myeloma diagnosis and the patient's death or between the multiple myeloma diagnosis and a date (e.g., a date in a database) at which the patient is last known to be alive. The survival data of the reference data 1 14 spans a range of different amounts of time, and the reference data 1 14 has been accepted as usable for training the numerical computer model, in embodiments.

[0046] In embodiments, the weighting factors of the numerical computer model are determined via a machine learning application trained based on the reference data 1 14. Specifically, the machine learning application may be a logistic regression classifier or a Cox regression classifier, in embodiments. The model generation module 106 performs various procedures (e.g. , imputation procedures to add data that is missing from the reference data 1 14, etc.), in embodiments, in order to generate the weighting factors of the model. As illustrated in Fig. 1 , the model generation module 106 provides the model to the probability generating engine 112, and the probability generating engine 112 uses the model to generate the probability 1 18, as explained above.

[0047] Additionally, in some embodiments, the model generation module 106 periorms vari able selection procedures to determine the variables that are used in the numerical computer model. Specifically, in embodiments described in further detail below, the model generation module 106 is configured to process the reference data 114 via one or more regression analyses (e.g. , univariate regression analyses, multivariate regression analyses, etc.) to automatically determine predictors of death in patients diagnosed with multiple myeloma. Both logistic regression analyses and Cox regression analyses may be used. Such predictors of death are utilized as vari ables in the numerical computer model. In embodiments, the model generati on module 106 further performs imputation procedures to automatically generate data for variables of the reference data 114 determined to have missing data. Then, after determining the variables (e.g. , predictors) for the model and imputing data as necessary, the model generation module 106 trains the numerical computer model to determine the weighting factors for the respective variables. The weighting factors thus take into account the relative contributions of each of the predictors.

[0048] With the trained numerical computer model in place, the patient, data 102 may be scored by applying the numerical computer model as described above. The probability 118 for the patient data 1 02 is a probability that the patient will die within a predetermined period of time. In embodiments, the probability- generating engine 1 12 implements multiple models, where each model is associated with a particular period of time. For instance, in an embodiment, the probability generating engine 1 12 utilizes a first numerical computer model to generate a probability that a patient will die within 180 days. The first numerical computer model includes variables configured to receive a particular set of numerical measures. The probability generating engine 1 12 may further utilize a second numerical computer model to generate a probability that the patient will die within a longer amount of time (e.g., 1 , 2, 3, 4, 5 years). The second numerical computer model may include variables that are configured to receive numerical measures that are different from those received by the first numerical computer model. The use of the different numerical computer models for the different periods of time reflects the fact that some predictors of death are more applicable when considering shorter amounts of time (e.g., death within 180 days) and less applicable when considering longer amounts of time (e.g., death within 1 year, 2 years, 3 years, 4 years, 5 years, etc.), and vice versa.

[0049] Conventionally, technological solutions based on single, static predictors have been used to predict mortality in patients diagnosed with multiple myeloma. In some conventional approaches, patient mortality may be predicted based on a revi sed version of the International Staging System (ISS) for multiple myeloma from the International Myeloma Working Group. The revi sed ISS is a disease-specific predictor and considers only the stage of the patient' s disease and not patient-specific characteristics (e.g., the patient' s age, etc.). The use of the revised ISS is a technological solution for predicting death because it is based on a quantitative analysis of relevant data (e.g., data indicative of the stage of the patient's disease, etc.). in other conventional approaches, patient mortality is predicted based on a frailty score, whereby the patient is categorized into one of three groups (e.g., fit, intermediate fit, and frail) and the probability of death is assessed based on the frailty score. The frailty score is a pati ent-specific predictor and considers only

characteristics of the patient and not disease-specific characteristics. The use of the frailty score is a technological solution for predicting mortality because it is based on a quantitative analysis of relevant data (e.g., data indicative of patients' age, functional status, comorbidities, etc.).

[0050] The techniques of the present disclosure provide solutions rooted in computer technology that improve on the conventional technological solutions described above. As described herein, rather than generating a probability of patient death based on a static, single predictor (e.g., a predictor that is predetermined and that does not. change, such as the revised ISS or the frailty score described above), the techniques of the present disclosure provide an accurate, multivariate analysis of patient mortality that takes into account the relative contributions of multiple predictors that are determined automatically from reference data. Using the techniques of the present discl osure, large amounts of reference data are processed via regression analyses to automatically determine multiple predictors of death in patients diagnosed with multiple myeloma. In embodiments, the predictors of death determined via the regression analyses vary based on the time frame considered (e.g., some predictors are valid for predicting whether the patient will di e within 180 days but not 1 , 2, 3, 4, 5 years and vice versa). It is thus noted that in these embodiments, the predictors of death are not static, "one-size-fits-aH" predictors that are predetermined and applied in ail instances, as in the conventional approaches. After imputing data for variables of the reference data determined to have missing data, as necessary, the numerical computer model is automatically built and trained, where the numerical computer model includes the predictors of death and associated weighting factors that take into account the relative contributions of each of the predictors. The numerical computer model is then applied to new data for a patient diagnosed with multiple myeloma to generate a probability that the patient will die within a predetermined period of time. The processes described herein thus enable an accurate, multivariate analysis of patient data to be performed in a relatively fast, automated manner that requires no human intervention or only minimal human intervention. The conventional technological solutions cannot provide the automated, multivariate analysis described herein.

[00511 In embodiments of the present disclosure, input data for a patient diagnosed with multiple myeloma may be received via a GUI of a software application, and based on the computer-implemented systems and methods described herein, the software application generates a probability that the patient wil l die within a predetermined period of time. To illustrate exemplary GUIs for such a software application, reference is made to Figs. 2A-2F. As illustrated in Figs. 2A and 2B, in embodiments, a GUI prompts a user to provide data for various patient variables. In Fig. 2A, for instance, the GUI prompts the user to "Enter patient's age in years" and provides a text box for receiving an input from the user. In Fig. 2B, for instance, the GUI prompts the user to "Select, stage of patient' s multiple myeloma disease" and provides three buttons for receiving an input from the user. Based on these inputs and inputs for multiple other patient variables (e.g., ECOG performance score of patient, whether patient has history of hypertension, whether patient has renal insufficiency, patient' s platelet count, patient' s mobility score, whether patient has a history of diabetes, etc.) received from the user, the software application applies the trained numerical computer model and generates and di splays a probability that the patient will die within a predetermined period of time. For instance, as shown in Fig. 2C, after receiving inputs from the user for multiple patient variabl es, the software application generates and di splays the probability (e.g., "Probability of patient death within 180 days: 97%," in the example of Fig. 2C). [0052] Fig. 2D illustrates another exemplary GUI for receiving input data representative of patient variables of a patient diagnosed with multiple myeloma. In this example, multiple patient variables are displayed, and for each variable, there is a corresponding drop-down menu with multiple selectable options. Although seven (7) variables are illustrated in the example of Fig. 2D, it is noted that these variables are examples only and that in other embodiments, a different set of variables may be presented to the user. Other or additional variables that may be used include variables indicative of the patient's history of diabetes, Del(17P) from FISH and cytogenetic forms, hyperdiploidy, extramedullar}' plasmacytoma, novel therapy use, triplet therapy use, and solitary plasmacytoma, among others. Based on input data received via the multiple drop-down menus, the software application generates and displays output data on predicted patient mortality. For instance, as shown in Fig. 2E, after receiving the input data, the software application generates a table with estimated probabilities for various amounts of time (e.g. , 180 days, 1 year, 2 years, 3 years, 4 years, 5 years, etc.). Specifically, as shown in Fig. 2E, the table presents probabilities for "mortality within 380 days," "survival beyond 1 year," "survival beyond 2 years," "survival beyond 3 years," "survival beyond 4 years," and "survival beyond 5 years."

[0053] Fig. 2F illustrates another exemplary display generated according to the computer-implemented techniques of the present disclosure. In an upper portion of the exemplary display, multiple patient variables are displayed, and for each variable, there is a corresponding drop-down menu with multiple user-selectable options. Although six (6) vari ables are illustrated in the example of Fig, 2F, it is noted that these variables are examples only and that, in other embodiments, a different set of variables may be presented to the user. An exemplary use of a dropdown menu is illustrated in Fig. 2G. As shown in this figure, when a drop-down menu for the "Mobility" variable is accessed, the user can select one of multiple different values for the variable (e.g., "Confined to Bed," "Some Problem in Walking About," "No Problem in Walking About," etc.). The other drop-down menus shown in the embodiment of Fig. 2F may operate similarly by allowing the user to select values for each of the respective variables. Selectable options for each of the drop down menus are reflected in the chart in the lower portion of Fig. 2F.

[0054] Based on input data received via the multipl e drop-down menus, the software application highlights a probabi lity value found in a predicti on matrix. An exemplary prediction matrix is illustrated in a lower portion of the exemplary display of Fig. 2F, which is based on application of the trained numerical computer model and which may be displayed to a user (e.g., physician) after inputting values for variables as described above. The patient variables considered in the embodiment of Fig. 2F may be used to estimate the probability that the patient will die within a rel atively short amount of time (e.g., 180 days). In this example, a probability value of " ! 1%" is underscored, in boldface, and italicized to indicate that this is the probability value corresponding to the inputs received via the drop-down menus. Of course, other suitable ways of highlighting the relevant values may be used other than or in addition to underlining, boldface, italics, such as for instance color coding the block or text for the relevant value against a different color background, etc. For instance, as can be seen from the prediction matrix, the underscored, boldface, and italicized 1 1% corresponds to the inputs "Some problem in walking about," "platelet count < 150x10 9 /L," "ISS stage I or II," "Age < 75 years," "history of hypertension," and "history of diabetes." In other words, given these inputs received via the dropdown menus, the software application' s probability generating engine (e.g., the probability generating engine 1 12 described above with reference to Fig. 1) determines that there is an 1 1 % probability that the patient will die within the relatively short amount of time, which may be 180 days or other first predetermined period of time. To generate the prediction matrix, all possible combinations of input values are provided to the probabi lity generating engine, whi ch generates probability values corresponding to each of the different combinations. The probability values are then put into a matrix form, such as that illustrated in Fig. 2F, and the applicable values thereof are coordinated with rule-based selection criteria so as to highlight the appropriate value(s) of the matrix based upon the selected variables. The generation of prediction matrices is described in further detail below.

[0055] Fig. 2H illustrates another exemplary display generated according to the computer-impl emented techniques of the present disclosure. Similar to the embodiment shown in Fig. 2F, in an upper portion of the exemplary display , multiple patient variables are displayed, and for each variable, there is a corresponding dropdown menu with multiple user-selectable options. Although seven (7) variables are illustrated in the example of Fig. 2H, it is noted that these variables are exampl es only and that in other embodiments, a different set of variables may be presented to the user. Based on input data received via the multiple drop-down menus, the software application highlights a probability value found in a prediction matrix.

[0056] An exemplary prediction matrix is illustrated in a lower portion of the exemplary display of Fig. 2H, which is based on application of the trained numerical computer model and which may be displayed to a user (e.g., physician) after inputting values for variables as described above. The patient variables considered in the embodiment of Fig. 2F! may be used to estimate the probability that the patient will survive a relatively long amount of time (e.g. , 3 years). In this example, a probability value of "81%" is underscored, in boldface, and italicized to indicate that this is the probability value corresponding to the inputs received via the drop-down menus. As can be seen from the prediction matrix, the underscored, boldface, and italicized 81 % corresponds to the inputs "no problem in walking about," "age iess than or equal to 75 years," "platelet count greater than 150 x 10 9 /L," " serum creatinine greater than 2 mg/dL," "a history of diabetes," "an 1SS stage of I or II," and "use of novel therapies that is less than or equal to 1." In other words, given these inputs received via the drop-down menus, the software application's probability generating engine (e.g., the probability generating engine 1 12 described above with reference to Fig. 1) determines that there is an 81% probability that the patient will survive the relatively long amount of time, which may be at least 3 years or other second predetermined period of time. To generate the prediction matrix, all possible combinations of input values are provided to the probability generating engine, which generates probability values corresponding to each of the different combinations. The probability values are then put into a matrix form, such as that illustrated in Fig. 2H, and the applicable values thereof are coordinated with rule-based selection criteria so as to highlight the appropriate value(s) of the matrix based upon the selected variables.

[0057] Fig. 3A is a flowchart 300 depicting operations of an exemplary method for constructing a numerical computer modei to generate a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time. As described above with reference to Fig. I , a model generation module 106 receives reference data 1 14 for a plurality of patients diagnosed with multiple myeloma and uses the reference data 1 14 to (i) determine a set of variables to be used in a numerical computer model (e.g., perform variable selection), where each variable has been determined to be predictive of death in patients diagnosed with multiple myeloma, and (ii) determine weighting factors for each of the variables of the numerical computer model. The exemplar}' ' operations depicted in the flowchart 300 of Fig. 3 A provide further details on the variable selection and training of such a numerical computer model. The exemplars' operations depicted in the flowchart 300 of Fig. 3 A also provide details on the generation of (i) a first computer model configured to generate probability data that a patient satisfying certain first selectable criteria will die within a first predetermined period of time (e.g., 180 days), and (ii) a second, different computer model (e.g., the second computer model comprising different variables than the first computer model, etc.) configured to generate probability data that a patient satisfying certain second selectable criteria will die within a second predetermined period of time (e.g., 1 , 2 3, 4, or 5 years).

[0058] At 302, reference data for a plurality of patients diagnosed with multiple myeloma is received. The reference data comprises for respective patients of the plurality of patients (i) data for vari ables of a set of patient variables, and (ii) survival data indicative of an amount of time between the patient' s cancer diagnosis and the patient' s death or between the cancer diagnosis and a date at which the patient is last known to be alive. The reference data has been accepted as usable for training a numerical computer model, in embodiments.

[0059] To illustrate examples of the patient variables that may be represented in the reference data, reference is made to Figs. 3B, 3C, and 3D. Fig. 3B depicts examples of "patient-specific" variables (e.g., variables that are related to the patient and not directly related to multiple myeloma) that may be represented in the reference data. The patient-specific variables may include age (e.g., whether patient's age is 75 years or greater, whether patient' s age i s 70 years or greater), body mass index, ECOG performance status score, history of diabetes, history of hypertension, history of venous thromboembolism (VTE), del( l 7p) from FISH and cytogenetic forms, T(4.14) from FISH and cytogenetic forms, T(l 1.14) from FISH, T(14.16) from FISH, history of MGUS, history of smoldering myeloma, and hyperdiploidy.

[0060] Fig. 3C depicts examples of "disease-specific" variables

(e.g., variables that are related to the multiple myeloma disease) that may be represented in the reference data. The disease-specific variables may include lactic acid dehydrogenase (e.g., whether lactic acid dehydrogenase is greater than 300 IU/L), history of solitary plasmacytoma, extramedui !ary plasmacytoma,

immunoglobulin IgG class (e.g., whether immunoglobulin IgG class is 5 g/dL or greater), albumin (e.g., whether albumin is greater than 3.5 g/dL), ISS disease stage, myeloma bone involvement, hypercalcemia (e.g., whether serum calcium is greater than or equal to 1 1 .5 mg/dL), renal insufficiency (e.g., whether serum creatinine i s greater than 2 mg/dL), anemia (e.g., whether hemoglobin i s less than 10 g/dL or whether it is greater than 2 below LLN), clonal bone marrow plasma cells (e.g., whether clonal bone marrow plasma cells are 10% or greater), serum monoclonal protein (e.g., whether serum monoclonal protein is 3 g/dL or greater), serum free light chain abnormality, pathological fracture, platelet count (e.g., whether platelet count is greater than 150 x 10 9 /L), IMWG risk, and beta 2 microglobulin (e.g., whether beta 2 microglobulin i s greater than or equal to 5.5 mg/L).

[0061] Fig. 3D depicts examples of "HRQOL from EQ-5D" and "novel therapy" variables that may be represented in the reference data. The HRQOL from EQ-5D variables may include "self-care from EQ-5D" and "mobility from EQ-5D." The novel therapy variables may include a variable that takes into account a number of novel therapies used by the patient and variables that take into account whether the patient has used triplet therapy, MID-containing therapy, and Pi-containing therapy, respectively. In embodiments, "triplet therapy" refers to a treatment regimen that uses at least three medications or drugs, and the triplet therapy variable may be used to store a Boolean value (e.g., "0" or " 1," or "yes" or "no") indicative of whether the patient has used triplet therapy or not.

[0062] In some embodiments, the reference data comprises data from a non-interventional trial and/or registry. Non-interventional trials or registries allow some latitude in the reporting of observations and procedures by site investigators, leading to a larger degree of missing data than in controlled clinical trials.

Accordingly, the computer-implemented procedures described herein for generating the numerical computer model address the issue of data incompleteness (e.g., via imputation procedures, as described herein). In embodiments, the registry used to generate the numerical computer model is the Connect® MM Registry

(NCT01081028). This registry enrolled two cohorts. The first cohort has adequate follow-up (e.g., median 33.5 months, N= 1493) for analysis. By contrast, analysis for the second cohort is pre-mature due to inadequate follow-up. The Connect MM Registry was designed as a prospective, observational, longitudinal, multi center study of patients with newly diagnosed multiple myeloma.

[0063] There is no planned investigational agent, prescribed treatment regimen, or mandated intervention in the Connect MM Registry study . The treating physician determines the enrolled patient' s therapy for newly diagnosed multiple myeloma according to his or her clinical judgment. Inclusion criteria are limited to patients who are newly diagnosed with symptomatic multiple myeloma within 2 months of enrollment, age greater than or equal to 1 8 years, willingness and ability to sign informed consent, and an agreement by the patient to complete patient questionnaires alone or with minimal assistance. No exclusion criteria are used. The data of this registry came largely from community sites (81.1%) while the prevalence of academic and government investigational sites was not insignificant (17.6% and 1.3%, respectively). An evaluation of the registry's baseline data in comparison to the National Comprehensive Cancer Network's suggested diagnostic work-up for multiple myeloma found that allowing physician discretion in diagnostic data to be collected, as is usually done for non-interventional registries, led to incomplete data. As noted above and described in further detail below, the computer-implemented processes used to build the numerical computer model take into account the issue of data incompleteness.

[0064] At 304, multiple candidate computer models comprising different combinations of the variables of the set of patient variables are generated. Each of the candidate computer models includes multiple weighting factors associated with the variables, and each variable of each candidate computer model has an associated weighting factor. At 306, multiple computerized numerical regression analyses for the multiple candidate computer models are conducted based on the data for the variables and the survival data to determine first selected variables and second selected variables from the set of patient variables. The first selected variables satisfy one or more selection criteria to be deemed predictive of mortality for a first predetermined period of time (e.g. , mortality within 180 days from diagnosis) for patients diagnosed with multiple myeloma, and the second selected variables satisfy one or more selection criteria deemed to be predictive of mortality for a second predetermined period of time (e.g., mortality within 1, 2, 3, 4, or 5 years from diagnosis) for patients diagnosed with multiple myeloma.

[0065] In embodiments, performing the steps 304 and 306 begins with univariate screening to reduce the number of variables and then proceeds to a variable selection procedure. Specifically, in embodiments, univariate analyses are conducted with the intent of determining the degree of missingness on each variable and the statistical significance of the variable in predicting the dependent measure (e.g., death within a predetermined period of time). In some embodiments, variables significant at the p > 0.15 level and with less than 60% missing data are screened in.

[0066] In embodiments, in building the first computer model used to generate a probability that a patient diagnosed with multiple myeloma will die within 1.80 days, the univariate analyses are logistic regressi on analyses conducted for the discrete variable of mortality within 180 days. Exemplary SAS code for the logistic regression analyses foilows, where d ' 180 is the discrete dependent variable:

run;

By contrast, in building the second computer model used to generate a probability that a patient diagnosed with multiple myeloma will die within 1 year, 2 years, 3 years, 4 years, or 5 years, the univariate analyses are Cox regression analyses, in

embodiments. In embodiments, the Cox regression analyses are used to handle censored data. Data is censored when patients discontinue or are otherwise lost to follow-up. From such data, it cannot be determined if the patients are currently dead or alive, and the data merely indicates that after a certain duration of follow-up, the patient discontinued follow-up or was otherwise lost to follow-up. In embodiments, longer time frames involve more censoring of data and thus, in predicting late mortality (e.g., probability of death within 1 year, 2 years, 3 years, 4 years, 5 years, etc., as opposed to a shorter amount of time such as 1 80 days), Cox regression analyses are used instead of logistic regression. When considering the 180-day time frame, there is little censoring of data, and when there is censoring of data, it can be assumed that the patient is alive, thus leading to a dichotomous variable (i.e., alive or dead at day 180). The simpler logistic regression analyses are used for shorter time frames such as this.

[0067] The reference data from the Connect MM Registry includes data for thirty-seven (37) different patient variables (e.g., the 37 different variables shown in Figs. 3B, 3C, and 3D), in embodiments. In embodiments, using the univariate screening procedure described above, thirteen (13) variables are screened through the logistic regression analyses, and twenty-eight (28) variables are screened through Cox regression analyses. The average amount of missing data for the logistic-regression- screened variables is 9.23%, and the average amount of missing data for the Cox- regression-screened variables is 15.4%, in embodiments.

[0068] To address the issue of missing data in the reference data, a number of imputed data sets are created, in embodiments. The relative efficiency (RE) of multiple imputation is given by the following;

where λ is the fraction of missing information about the parameter being estimated, and m is the number of imputed datasets. The fraction of missing data is roughly proportional to the average amount of missing data. For three (3) imputations, the RE is 0.9375 and 0.8571 for missing fractions of 20% and 50%, respectively. For the intended ten (10) imputations, the RE increases to 0.9804 and 0.9524, respectively.

[0069] In embodiments, Rubin's imputation framework may be used for the imputation analysis. This analysis involves (i) assuming an imputation model,

(ii) obtaining the predictive distribution of the missing data conditional on observed data and distribution parameters, and (iii) producing multiple imputed datasets using the predictive distribution. Analysis under multiple imputation is robust under less restrictive assumptions of Missing at Random (MAR) compared to the case-wise del etion of data records with any data missing on any variable. Further, case-wise deletion of data missing on any variable leads to considerable loss of information on other collected variables. In embodiments, the imputation model utilized is the Fully Conditional Specification (FCS) as recommended in "Multiple Imputation of Discrete and Continuous Data by Fully Conditional Specification," van Buuren S., Statistical Methods in Medical Research, 2007; 16:219-242, which is incorporated herein by reference in its entirety. All variables (including those screened out) are used in the imputation model to extract all information on the missingness of the predictors contained in the dataset, and ten imputations are generated. Exemplary SAS code for this analysis is as follows:

run ;

[0070] In other embodiments, the imputation model utilized is the Markov Chain Monte Carlo (MCMC) method under the multivariate normal model. Ail variables (including those screened out) are used in the imputation model to extract all information on the missingness of the predictors contained in the dataset, and ten imputations are generated, in embodiments. Exemplar}'' SAS code for performing this analysis is as follows:

[0071] In embodiments, following the univariate screening and imputation procedures described above, a computer-implemented variable selection procedure is performed. In the variable selection procedure, the imputed datasets are stacked on top of each other, and the multivariate logistic and Cox regressions are ran using underweighted observations with the underweighting being proportional to the number of imputed datasets and to the degree of missingness. The variables used are those screened in under the univariate regression analyses described above. The SAS code for the first computer model (e.g., the logistic model, as described herein ) requesting all possible models follows. The weight is equal to (l-f)/(#of imputations), where f is the average fraction of missing data.

The code "selection = score" provides the score statistic for all possible models. In embodiments, the difference in score stati stics between models is a chi-squared distribution with degrees of freedom given by the difference in the number of variables in the models. In embodiments, starting with the best I -variable model, movement in one vari able increments to the best k-variable model is performed until the incremental score statistic is less than the critical value obtained as the 0.1 -level Wald X 2 chi-square value for one degree of freedom. In embodiments, several models with score statistics in the neighborhood of that for the best k-variable model are considered as candidate models, and an appropriate model is selected. In embodiments, for each candidate model, multivariate Logistic/Cox regression are fit on each of the 10 imputed datasets, and the average Bayesian Information Criterion (BIC) value is calculated. The final multivariate model is selected as the candidate model with the minimum average BIC amongst models judged to be clinically appropriate.

[0072] In embodiments, in building the first computer model for generating a probability that a patient diagnosed with multiple myeloma will die within 180 days, the variable selection procedure described above may result in the selection of seven (7) variables. As described herein, these variables are selected using a stacked, weighted logistic regression analyses. These variables are illustrated in Fig. 3E, which lists the seven variables under a column heading "Characteristic. ' ' In embodiments, in building the second computer model for generating a probability that a patient diagnosed with multiple myeloma will die within 1 year, 2 years, 3 years, 4 years, or 5 years, the variable selection procedure described above results in the selection of ten (10) variables. As described herein, these variables are selected using a siacked, weighted Cox regression analyses. These variables are illustrated in Fig. 3F-1, which lists the ten variables under a column heading "Characteristic:' In other embodiments, in building the second computer model for generating a probability that a patient diagnosed with multiple myeloma will die within 1 year, 2 years, 3 years, 4 years, or 5 years, the variable selection procedure described above results in the selection of eleven (1 1) variables. As described herein, these variables are selected using a stacked, weighted Cox regression analyses. These eleven variables are illustrated in Fig. 3F-2.

[0073] At the conclusion of steps 304 and 306 of Fig. 3 A, first and second selected variables from the set of patient variables are determined, where the first selected variables are deemed predictive of mortality for the first predetermined period of time (e.g., death within 180 days) and the second selected variable are deemed predictive of mortality for the second predetermined period of time (e.g., death within 1, 2, 3, 4, or 5 years). At 308, the first computer model comprising a combination of variables of the first selected variables and first weighting factors associated with the respective first selected variables is generated, and at 310, the second computer model comprising a combination of variables of the second selected variables and second weighting factors associated with the respective selected second variables is generated. At 312, the first computer model and the second computer model are trained using the reference data to determine numerical values for the respective first and second weighting factors.

[0074] The training of the first computer model may include (i) processing the reference data to determine, for patients represented in the reference data, numeri cal measures for respective variables of the first selected variables, and (ii) conducting a first computerized numerical regression analysis based on the determined numerical measures to determine the first weighting factors. Likewise, the training of the second computer model may include (i) processing the reference data to determine, for patients represented in the reference data, numerical measures for respective variables of the second selected variables, and (ii) conducting a second computerized numerical regression analysis based on the determined numerical measures to determine the second weighting factors. For example, in an embodiment in which the first or second selected variables include a variable indicative of an age of the patient, the reference data is processed to determine, for respective patients represented in the reference data, numerical values corresponding to the patients' ages. Likewise, in an embodiment in which the first or second selected variables include a variable indicative of a stage of the patient' s multiple myeloma disease, the reference data is processed to determine, for respective patients represented in the reference data, numerical values corresponding to disease stages. After determining the numerical measures, the aforementioned numerical regression analyses are conducted based on the numerical measures and survival data for the respective patients represented in the reference data to determine the weighting factors of the respective first and second computer models.

[0075] In embodiments, a machine learning approach is used to build and train the first and second computer models. Specifically, in embodiments, reference data for a plurality of patients diagnosed with multiple myeloma is used, and numerical measures are determined from the reference data. The determined numerical measures for the first computer model associated with early stage mortality (e.g., death within 180 days of multiple myeloma diagnosis) may include one or more of the numerical measures 422 described below with reference to Fig. 4A, among other numerical measures. The determined numerical measures for the second computer model associated with late stage mortality (e.g., death within 1, 2, 3, 4, or 5 years of multiple myeloma diagnosis) may include one or more of the numerical measures 472 described below with reference to Fig. 4B, among other numerical measures. In constructing the first computer model, the determined numerical measures may be combined in a logistic regression classifier, which uses the determined numerical measures and the survival data for the patients represented in the reference data to generate weighting factors for the numerical measures. In constracting the second computer model, the determined numerical measures may be combined in a Cox regression classifier, which uses the determined numerical measures and the survival data for the patients represented in the reference data to generate weighting factors for the numerical measures.

[0076] In embodiments, the training of the first and second computer models may include combining the inferences for the regressions applied to each imputed dataset. The training of the first computer model for generating a probability that a patient diagnosed with multiple myeloma will die within 180 days will now be described. By Rubin' s imputation framework, the estimate of a parameter of interest is the average of estimates from each imputed dataset. Such an estimate is efficient and unbiased under MAR assumptions. As described above, in building the numerical computer model for mortality within 180 days, the variable selection procedure results in the selection of the seven (7) variables shown in Fig, 3E, in embodiments. The separate estimates and the combined inferences may be obtained using the following exemplary SAS code for the seven selected variables, in embodiments:

The output dataset estl above contains the estimates of the intercept parameter a and the regression coefficients β' s for each predictor x i in the logistic model given by

where is the probability of the event corresponding at a vector of predictor values x. Exponentiation of the parameter estimates and confidence limits provides the odds ratios for a one point increment in the predictor variable. In embodiments, all of the variables for the 180-day-mortality numerical computer model listed in Fig. 3E, with the exception of the mobility variable, are dummy coded as binary values 0 and 1 because they are dichotomized variables. In embodiments, the mobility variable is ordinal and takes three levels from 0 to 2, and its odds ratio represents, on average, the change in odds for every increase in the level of mobility. Fig. 3E provides a summary of inferences from the final logistic model using multiple imputation. The odds ratio of 1.70 implies that the odds of mortality within 180 days for those patients with age greater than 75 years is 1.7 times that for those patients with age less than or equal to 75 years. Simi lar interpretations apply for other characteristics in the table.

[0077] The training of the second computer model for generating a probability that a patient diagnosed with multiple myeloma will die within 1 year, 2 years, 3 years, 4 years, or 5 years will now be described. As described above, in building this numerical model, the variable selection procedure results in the selection of the ten (10) variables listed in Fig. 3F-1, in embodiments. The separate estimates by imputation and the combined inferences may be obtained using the following exemplary SAS code for the ten selected variables:

[0078] In other embodiments, as described above, in building the numerical computer model, the variable selection procedure results in the selection of the eleven (1 1) variables listed in Fig. 3F-2, The separate estimates by imputation and the combined inferences may be obtained using the fol lowing exemplary SAS code for the eleven selected variables:

[0079] The output datasets est 1 and est 2 generated by the example code above contain the estimates of the regression coefficients β' for each predictor r, in the Cox model given by

where hf(,x) is the hazard function at time t defined at a vector of predictor values x and ho(t) is the baseline hazard function. Exponentiation of the parameter estimates and confidence limits provide the hazard ratios and confidence limits for a one point increment in the predictor variable. In embodiments, all of the variables for the 1/2/3/4/5-year-mortality numerical computer model listed in Figs. 3F-1 and 3F-2, with the exception of mobi lity and 3SS stage, are dummy coded as binary values 0 and 1 because they are dichotomized variables. In embodiments, mobility and ISS are ordinal and take three levels, and the hazard ratio represents, on average, the change in hazard for every increase in level. Figs. 3F- 1 and 3F-2 provide a summary of inferences from the final Cox model using multiple imputation . In Fig. 3F-1, the hazard ratio of 1.89 for age implies that the hazard of mortality for those patients with age greater than 75 years is 1.89 times that for patients with age less than or equal to 75 years. Similar interpretations apply for other characteristics in the table.

[0080] At 314, the first computer model and the second computer model are updated to include the determined numerical values for the first weighting factors and the second weighting factors for each selected variable of the first and second selected variables. Accordingly, the first computer model is configured to generate probability data that a patient satisfying certain first selectable criteria will die within the first predetermined period of time (e.g., 180 days), and the second computer model is configured to generate probability data that a patient satisfying certain second selectable criteria will die within the second predetermined period of time (e.g., 1 , 2, 3, 4, or 5 years). The first and second computer models are then ready to be used for generating probabilities, i.e. , to receive numerical measures corresponding to variables of the respective computer model s, where the numerical measures are new data for a patient, so as to generate a probability that the patient will die within the first and second predetermined periods of time. In this manner, the numerical computer models are thereafter configured to perform automated determination of probabilities for new patient data.

[0081] As described above, in some embodiments, a prediction matrix is generated, and the prediction matrix includes probability values for all possible combinations of patient input data. The generation of an exemplar}? prediction matrix using the 1 80-day-niortality numerical computer model will now be described. In some embodiments, the prediction matrix is designed to show less favorable outcomes in the bottom left corner and more favorable outcomes towards the top right corner of the matrix. Further, in some embodiments, the variables are ordered by importance which is assessed by multiplying the odds ratio by (# of predictor levels - 1). The odds ratios for the variables of interest are illustrated in the table of Fig. 3E. For instance, in embodiments, mobility is assessed to be most relevant to the matrix because 2.42x(3-l) = ;: 4.84 is the largest computed value, and accordingly, this variable is placed in a largest row header of the matrix. ECOG status is the next most important and is placed to be the largest column header of the matrix, in

embodiments. The third most relevant variable, platelet count, bifurcates the mobility header. The fourth most important variable, hypertension history, bifurcates the ECOG header. Alternating between rows and columns in a similar manner populates row and column headers with all variables of the numerical computer model. The row header predictors have the predictor level with the favorable outcome on top, and the column header predictors have the predictor level with the favorable outcome to the right, in embodiments. An exemplary prediction matrix with row and column headers created in this manner is illustrated in Fig. 3G.

[0082] The above steps are used to generate a blank matrix with column and row headers, in embodiments. To populate these blank cells with the appropriate probability values, the numerical computer model is used to compute the probabilities for every possible combination of patient input values. The probabilities are then inserted into the prediction matrix. Exemplary SAS code to generate data for insertion into the section of the matrix where mobility = 0 (No problem in walking about) is as follows:

[0083] In the exemplary prediction matrix of Fig. 3G, smaller blocks are used within the larger blocks with factors which have succeeding smaller effects. Numeric values in the matrix are the probabilities of mortality within 180 days, as generated using the numerical computer model described herein. [0084] The generation of an exemplary prediction matrix corresponding to the 1/2/3/4/5-year-mortality numerical computer model will now be described. Steps similar to those described above for generating a blank matrix are used. To populate these blank cells with appropriate probability values, the numerical computer model is used to compute the probabilities for ever}' possible combination of patient input values. Exemplary SAS code to implement this starts with SAS PROC PLAN code, and a dataset "covais" is generated. This dataset contains the combinations of the levels of the predictors along with the mapping to cells in the matrix. To generate the probabilities for filling the matrix, the exemplar} 7 code below uses the covais dataset in the baseline statement of the SAS PHREG procedure to generate survival probabilities at every event time in the registry along with confidence intervals. To obtain the survival probabi lity beyond three years, the data records corresponding to event time closest to and less than the three-year time-point (1095 days) are retained. The prediction of survival beyond three years for each predictor combination is estimated as the average of the corresponding 3 year survivals from each of the imputations. In embodiments in which the variables shown in Fig. 3F-1 are considered, this is implemented in the exemplar} 7 SAS code below:

[0085] In embodiments in which the variables shown in Fig. 3F-2 are considered, generating the prediction of survival beyond three years for filling the matrix is implemented in the exemplar}' SAS code below:

[0086] To illustrate exemplary prediction matrices generated using the 1/2/3/4/5-year-mortality numerical computer model, reference i s made to Figs. 3H-1, 3H-2, 3H-3, 31-1, 31-2, and 31-3. Figs. 3H-1, 3H-2, and 3H-3 coliectively illustrate an exemplary prediction matrix of survival probability beyond three years for patients having an age that i s less than or equal to 75 years, and Figs. 31-1 , 31-2, and 31-3 coliectively illustrate an exemplary prediction matrix of survival probability beyond three years for patients having an age that is greater than 75 years. As seen in these figures, the 1/2/3/4/5 -year-mortality numerical computer model may consider the variables li sted in Fig. 3F-1 . In embodiments, these variables are based on the patient' s mobility, ECOG performance status, ISS stage, Del l7P status, platelet count, triplet therapy use, renal insufficiency status (e.g., whether serum creatinine is greater than 2 mg/dL), age, diabetes history, and whether the patient has solitary

plasmacytoma.

[0087] To illustrate additional exemplary prediction matrices generated using the 1/2/3/4/5-year-mortality numerical computer model, reference is made to Figs. 3J-1, 3J-2, 3J-3, 3K-1 , 3K-2, and 3K-3. Figs. 3J-1, 3J-2, and 3J-3 collectively illustrate an exemplary prediction matrix of survival probability beyond three years for patients having an age that is greater than 75 years, and Figs. 3K-1, 3K-2, and 3K- 3 collectively illustrate an exemplary prediction matrix of survival probability beyond three years for patients having an age that is less than or equal to 75 years. As seen in these figures, the 1/2/3/4/5-year-mortality numerical computer model may consider the variables listed in Fig. 3F-2. In embodiments, these variables are based on the patient' s mobility, ECOG performance status, ISS stage, Del 17P status, platelet count, novel therapy use, renal insufficiency status (e.g. , whether serum creatinine is greater than 2 mg/dL), age, diabetes history, and whether the patient has extramedullary plasmacytoma and hyperdiploidy.

[0088] With reference again to Fig. 3 A, at 3 6, the first and second computer models are validated. Each of the first and second computer models may be validated with both an "internal" validation procedure and an "external" validation procedure. The validation of the first computer model used in generating a probabi lity that a patient diagnosed with multiple myeloma will die within 180 days will now be described. In some embodiments, internal validation involves the splitting of the dataset into test and training samples, and the model obtained in the training sample is evaluated in the test sample. Better estimates of validation indices may be obtained when they are obtained through analysis of repeated random splits into test and training samples, a process referred to as bootstrap re-sampling. The validation index used in embodiments to measure the predictive ability of the computer model is Harrell's C -Index. This index is interpretable as a concordance probabi lity, i.e. , the probability that a randomly selected pair of patients, one with a poorer survival outcome than the other, will be correctly differentially identified based on inputting the two patients' baseline prognostic characteristics in the fitted model. To compute the index, each of the 10 imputed datasets is imputed into R software, and the following R code is executed for each dataset for 100 bootstrap sample pairs:

[0089] This R script above provides the Somer's D statistic Dxy. The concordance probability for each imputation can be computed as O!ndex : =

0.5*|Dxy|+0.5. Training datasets may have better predictive ability due to the possibility of over fitting the model to the data, and the training optimism adjusted concordance probability adjusts for this bias. In the multiple imputation context, the concordance probability is computed as the average of the adjusted concordance probabilities from each imputation. For the logistic model used in the generation of the first computer model (e.g., computer model used in predicting 180-day mortality), the concordance probability may be identical to the area under the receiver operating characteristic (ROC) curve for the model, and confidence intervals can therefore be computed using expressions developed for determining this area under the curve. The percent reduction in the concordance probability for the test samples compared to the training samples is 2.53% in some embodiments for the logistic model, indicating the unlikelihood of an overfitted model. The training optimism adjusted concordance probabiiity of the fitted logistic model is estimated at 74.3% (95% CI: 68.7,80.0), in embodiments. A concordance probability significantly greater than 50% is indicative of a good predictive model, [0090] External validation may be a measure of how well a computer model (e.g., a computer model derived from data from a registry, as described above) works for an additional, independent external dataset. The external dataset may thus comprise additional, independent data not used in the training of the computer model. In embodiments, the external data is from the "FIRST" multiple myeloma clinical study (N=1623). This study was a phase III, randomized, open-label, 3 -Arm study to determine the efficacy and safety of lenalidomide (Revlimid) plus low-dose dexamethasone when given until progressive disease or for 18 four-week cycles versus the combination of Melphalan, Prednisone, and Thalidomide given for 12 six- week cycles in patients with previously untreated multiple myeloma who are either 65 years of age or older or not candidates for stem cell transplantation.

[0091] In performing the external validation for the first computer model used in generating a probability that a patient diagnosed with multiple myeloma will die within 180 days, the seven variables used in the logistic model may be collected in the FIRST study data. These variables, as well as mortality within 180 days, may be extracted from the FIRST database. Then, the probability of mortality within 180 days was computed for the FIRST data using the first computer model and compared against actual outcomes in the FIRST study. This may be achieved using the R package rms with the following code:

in embodiments, the concordance probability of the first computer model i s 71 .83% (95%CI: 66.2, 77.4), which compares favorably to the 74.3% determined in the internal validation. These results show that the first computer model may be relatively portable {e.g., the first computer model may work relatively well on a variety of different datasets). As is evident from the description above, the external validation procedure may include validating the first computer model with testing using additional independent data (e.g., data from the "FIRST" study) not used in the training of the first computer model.

[0092] The validation of the second computer model used in generating a probability that a patient diagnosed with multiple myeloma will die within 1, 2, 3, 4, or 5 years will now be described. Internal validation for the second computer model may involve bootstrap re-sampling of 100 test and training datasets and the computation of concordance probabilities. In embodiments in which the ten variables shown in Fig. 3F-1 are considered, to compute this concordance index, each of the 10 imputed datasets is imported into the R software, and the following R code is executed:

[0093] In embodiments in which the eleven variables shown in Fig. 3F-2 are considered, to compute this concordance index, each of the 10 imputed datasets is imported into the R software, and the following R code is executed:

validate(f,B=100, dxy =TRUE)

[0094] In embodiments, the percent reduction in the concordance probability for the test samples compared to the training sampl es is 0,94% for the second computer model, indicating the unlikelihood of an over-fitted model. The training optimism adjusted concordance probability of the second computer model is estimated at 69.5% (95% CI: 66.6, 72.4), in embodiments, A concordance probability significantly greater than 50% may be indicative of a good predictive model.

[0095] External validation of the second computer model may be conducted to determine how well the second computer model works for data from the FIRST study. In embodiments in which the variables of Fig. 3F- 1 are considered, the ten variables used in the second computer model may be collected in the FIRST study- data. The ten variables, as well as the survival duration and censoring variables, may be extracted from the FIRST database. Then, the probability of survival beyond 3 years is computed for FIRST data using the second computer model and compared against actual outcomes in the FIRST study. To compute the probability of survival beyond 3 years, SAS code similar to that described above using the actual predictor combinations found in the FIRST study instead of the covals dataset may be utilized. To compare actual outcomes in FIRST to predicted outcomes generated by the model, the following R code may be utilized:

[0096] In embodiments, the first part of the above code computes the concordance index and 95% CI as 67.8% (66, 1 , 69,6). In embodiments, the concordance probability compares favorably to 69.5% in the internal validation, thus supporting the portability of the second computer model. As is evident from the description above, the external validation procedure may include validating the second computer model with testing using additional independent data (e.g. , data from the "FIRST" study) not used in the training of the second computer model. [0097] In embodiments in which the variables of Fig. 3F-2 are considered, the eleven variables used in the second computer model may be collected in the FIRST study data. The variable '# of novel therapies' has levels defi ned as >= 2 novel therapies or (0, 1) novel therapies as part of the induction regimen in first line. Novel therapies being administered in cohort 1 of the registry included the multiple myeloma drugs Revlimid, Pomalidomide, Velcade and Carfilzomib. In the FIRST study, patients were randomized to Revlimid + Dexamethasone continuous, Revlimid + Dexamethasone for 18 months and Melphalan, Prednisone and Thalidomide for 18 months. The first of the three groups was most efficacious and was mapped to the >=2 level and the remaining groups to (0, 1) of the novel therapy variable. The eleven variables, as well as the survival duration and censoring variables, may be extracted from the FIRST database. Then, the probability of survival beyond 3 years is computed for FIRST data using the second computer model and compared against actual outcomes in the FIRST study. To compute the probability of survival beyond 3 years, SAS code simil ar to that described above using the actual predictor combinations found in the FIRST study instead of the covais dataset may be utilized. To compare actual outcomes in FIRST to predicted outcomes generated by the model, R code similar to that described above for the embodiment considering the variables listed in Fig. 3F-1 may be used.

|0098] The above description indicates that the first computer model is used in generating a probability that a patient diagnosed with multiple myeloma will die within a relatively short amount of time (e.g., 180 days), while the second computer model i s used in generating a probability that a patient diagnosed with multiple myeloma will die within a longer amount of time (e.g., 1 , 2, 3, 4, or 5 years). It is noted, however, that in other embodiments, the first and second computer models may be associated with different respective periods of time. Thus, in embodiments, the first computer model may be trained to predict mortality within 3 months of a multiple myeloma diagnosis, and the second computer model may be trained to predict mortality within 6 months of di agnosis. The first and second computer models are trained to predict mortality for various other periods of time, in embodiments.

[0099] Fig. A A depicts a flowchart 400 including exemplary steps for generating a probability that a patient diagnosed with multiple myeloma will die within 180 days of being diagnosed. Thi s figure further depicts exemplary numerical measures 422 determined from the patient' s input data and used in generating the probability. At 402, input data for a patient diagnosed with multiple myeloma is received, where the input data comprises data for multiple variables of a set of patient variables,

[00100] At 404, one or more numerical measures are determined by processing the input, data. The one or more numerical measures may include numerical measures from the exemplary numerical measures 422 of Fig. 4A. A first numerical measure, "age (< 75 years versus >75 years)" is indicative of the patient's age and specifically whether the patient's age is greater than or equal to 75. In embodiments, the input, data may comprise patient' s age in years, and thus, determining the first numerical measure includes comparing the patient's age against "75 years" to determine whether the patient's age is greater than or equal to 75 years. A second numerical measure, "ECOG Performance Score (>2 versus <2)" is indicative of the patient's performance status and specifically whether the patient's ECOG performance status is greater than or equal to 2. In embodiments, the input data comprises ECOG performance scores in a number format (e.g., 0, 1, 2, etc.), and determining the second numerical measure includes comparing the patient' s ECOG performance score against "2" to determine whether the patient's ECOG performance score is greater than or equal to 2.

[00101 ] A third numerical measure, "history of hypertension" is indicative of whether the patient has a history of hypertension. In embodiments, the third numerical measure comprises a Boolean value {e.g., "0" if the patient has no history of hypertension, and "1" if the patient has a history of hypertension, etc.). A fourth numerical measure, "ISS disease stage (III versus I and II)" is indicative of a stage of the patient's multiple myeloma disease and specifically whether the ISS stage of the patient's disease is "III" or whether it is "I or II." A fifth numerical measure, "renal insufficiency (serum creatinine > 2 mg/dL)" is indicative of whether the patient has renal insufficiency and specifically whether the patient' s serum creatinine is greater than 2 mg/dL. A sixth numerical measure, "platelet count (< 150xl 0 9 /L versus > 150xl0 9 /Lj" is indicative of the patient' s platelet count and specifically whether the platelet count is greater than I50x10 9 /L. A seventh numerical measure, "mobility from EQ-5D" is indicative of the patient's mobility and specifically the patient's EuroQol five dimensions questionnaire (EQ-5D) mobility score. In embodiments, the seventh numerical measure can take on values of "0," "1," or "2," corresponding to the possible EQ-5D mobility scores. Additional numerical measures not included in the numerical measures 422 of Fig. 4A may be used in other examples.

|001ί)2] At 406, a probability that the patient will die within 180 days is determined by applying the numerical computer model to the determined numerical measures,

[00103] Fig. 4B depicts a flowchart 450 including exemplary steps for generating a probability that a patient diagnosed with multiple myeloma will die within 1 year, 2 years, 3 years, 4 years, or 5 years of being diagnosed. This figure further depicts exemplary numerical measures 472 determined from the patient's input data and used in generating the probability. At 452, input data for a patient diagnosed with multiple myeloma is received, where the input data comprises data for multiple variables of a set of patient variables.

[00104] At 454, one or more numerical measures are determined by processing the input data. The one or more numerical measures may include numerical measures from the exemplary numerical measures 472 of Fig. 4B. A first numerical measure, "age (< 75 years versus > 75 years)" is indicative of the patient's age and specifically whether the patient's age is greater than or equal to 75 years. A second numerical measure, "ECOG Performance Score (>2 versus <2)" is indicative of the patient's performance status and specifically whether the patient's ECOG performance status is greater than or equal to 2. A third numerical measure, "history of diabetes" is indicative of whether the patient has a history of diabetes. In embodiments, the third numerical measure comprises a Bool ean value (e.g., "0" if the patient has no history of diabetes and "1" if the patient has a history of diabetes, etc.). A fourth numerical measure "Dei(17P) from FISH and cytogenetic forms" is indicative of whether the patient has a deleted chromosome 17. A fifth numerical measure "hyperdiploidy" indicates whether the patient has hyperdiploidy. A sixth numerical measure "extramedullary plasmacytoma" indicates whether the patient has extramedullary plasmacytoma.

[00105] A seventh numerical measure, "ISS disease stage (III versus II versus I)" is indicative of a stage of the patient's multiple myeloma disease and specifically whether the ISS stage of the patient's di sease is "III," "II," or "I." An eighth numerical measure, "renal insufficiency (serum creatinine > 2 mg/dL)" is indicative of whether the patient has renal insufficiency and specifically whether the patient' s serum creatinine is greater than 2 mg/dL. A ninth numerical measure, "platelet count (< 150xl0 9 /L versus > 150xl0 9 /L)" is indicative of the patient's platelet count and specifically whether the platelet count is greater than 150xl0 9 /L. A tenth numerical measure, "mobi lity from EQ-5D" is indicative of the patient' s mobility and specifically the patient's EuroQol five dimensions questionnaire (EQ- 5D) mobility score. In embodiments, the tenth numerical measure can take on values of "0," " 1," or "2," corresponding to the possible EQ-5D mobility scores. An eleventh numerical measure, "novel therapy use (>2 versus (0, 1)" is indi cative of a number of novel therapies that the patient has used and specifically whether the number is greater than or equal to 2.

[00106] A twelfth numerical measure, "triplet therapy use" is indicative of whether the patient has used triplet therapy. A thirteenth numerical measure, "solitary plasmacytoma" indicates whether the patient has solitary plasmacytoma. Additional numeri cal measures not included in the numerical measures 472 of Fig. 4B may be used in other examples.

[00107] At 456, a probability that the patient will die within 1 year, 2 years, 3 years, 4 years, or 5 years is determined by applying the numencal computer model to the numerical measures. In comparing Figs. 4A and 4B, it can be seen that the exempiary numerical measures 422 used in generating the probability that the patient will die within 180 days differ from the exemplary numerical measures 472 used in generating the probability that the patient will die within 1 year, 2 years, 3 years, 4 years, or 5 years. As described herein, using the techniques of the present disclosure, reference data is processed to automatically determine a set of variables (e.g., predictors) to be used in a numerical computer model. In embodiments, the variables determined via the processing of the reference data vary based on the time frame considered (e.g., some predictors are valid for predicting whether the patient will die within 180 days but not 1 year, 2 years, 3 years, 4 years, or 5 years, and vice versa). This reflects that early mortality (e.g., mortality within 180 days of diagnosis) may have a different etiology than later mortality (e.g., mortality within 1 year, 2 years, 3 years, 4 years, or 5 years of diagnosis), with co-morbidities dominating the former and disease factors more relevant for the latter.

[00108] Fig. 5 is a flowchart depicting steps of an exemplary method for generating a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time. At 502, input data for a patient diagnosed with multiple myeloma is received. The input data comprises data for multiple variables of a set of patient variables. At 504, the input data is processed to determine a first numerical measure indicative of an age of the patient. At 506, the input data is processed to determine a second numerical measure indicative of a stage of the patient' s multiple myeloma disease. At 508, the input data is processed to determine a third numerical measure indicative of the patient's mobility.

[001091 At 510, a numerical computer model associated with a

predetermined period of time is applied to the first numerical measure, the second numerical measure, and the third numerical measure to determine a probability that the patient will die within the predetermined period of time. The numerical computer model includes a first variable and an associated first weighting factor, the first variable receiving a value of the first numerical measure. The numerical computer model also includes a second variable and an associated second weighting factor, the first variable receiving a value of the second numeri cal measure. The numerical computer model further includes a third variable and an associated third weighting factor, the third variable receiving a value of the third numerical measure. The application of the numerical computer model at this stage may involve the actual variable selection, training and configuration of the computer model. Alternatively, the application of the numerical computer model at this stage may involve accessing pre-calculated results the numerical computer model and applying rule-based selection criteria based on the particular numerical measures to select the corresponding mortality vaiue(s) applicable from pre-calculated data from the numerical computer model applicable to the particular numerical measures for the associated variabl es.

[00110] As described above, the present disclosure provides computer- based techniques for predicting likelihoods of near-term and long-term mortality in patients diagnosed with multiple myeloma. However, the computer-based techniques described herein may be applied to other cancers and other diseases beyond multiple myeloma through application of regression analysis for selecting suitable patient variables for multiple computer models, training the computer models with suitable patient reference data to determine weighting factors associated with the variables for the models for desired mortality time periods, updating the computer models with the appropriate weighting factors, and validating the computer models for use in making actual predictions.

[00111] Figs. 6A, 6B, and 6C depict exempl ary systems for implementing the techniques described herein. For example, Fig. 6A depicts an exemplary system 600 that includes a standalone computer architecture where a processing system 602 (e.g., one or more computer processors located in a given computer or in multiple computers that may be separate and distinct from one another) includes a numerical computer model 604 being executed on the processing system 602. For instance, the processing system 602 represented in Fig. 6A may be that of a touchscreen smartphone, a touchscreen tablet, a laptop PC, a desktop PC, etc. Accordingly, the processing system 602 may communicate with a touchscreen display or GUI 603 to display outputs to the user and receive inputs from the user. The processing system 602 has access to a computer-readable memory 607 in addition to one or more data stores 608. The one or more data stores 608 may include variables 610 as well as weighting factors 612. The processing system 602 may be a distributed parallel computing environment, which may be used to handle very large-scale data sets.

[00112] Fig. 6B depicts a system 620 that includes a client-server architecture. One or more user PCs 622 access one or more servers 624 running a numerical computer model 604 on a processing system 627 via one or more networks 628. The one or more servers 624 may access a computer-readable memory 630 as weil as one or more data stores 632. The one or more data stores 632 may include variables 634 as well as weighting factors 638.

[00113] Fig. 6C shows a block diagram of exemplary hardware for a standalone computer architecture 650, such as the architecture depicted in Fig. 6A that may be used to include and/or implement the program instructions of system embodiments of the present disclosure. A bus 652 may serve as the information highway interconnecting the other illustrated components of the hardware. A processing system 654 labeled CPU (central processing unit) (e.g. , one or more computer processors at a given computer or at multiple computers), may perform calculations and logic operations required to execute a program. A non-transitory processor-readable storage medium, such as read only memory (ROM) 658 and random access memory (RAM ) 659, may be in communication with the processing system 654 and may include one or more programming instructions for performing methods (e.g. , algorithms) for constructing a numerical computer model to generate a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time. Optionally, program instructions may be stored on a non-transitory computer-readable storage medium such as a magnetic disk, optical di sk, recordable memory device, flash memory, or other physical storage medium.

[001141 ln Figs. 6A, 6B, and 6C, computer readable memories 607, 630,

658, 659 or data stores 608, 632, 683, 684 may include one or more data structures for storing and associating various data used in the exemplary systems for constructing a numeri cal computer model to generate a probability that a patient diagnosed with multiple myeloma will die within a predetermined period of time. For example, a data structure stored in any of the aforementioned locations may be used to store data relating to vari ables and/or weighting factors. A disk controller 690 interfaces one or more optional disk drives to the system bus 652. These di sk drives may be external or internal floppy disk drives such as 683, external or internal CD-ROM, CD-R, CD-RW or DVD drives such as 684, or external or internal hard drives 685. As indicated previously, these various disk drives and disk controllers are optional devices.

[00115] Each of the element managers, real-time data buffer, conveyors, file input processor, database index shared access memory loader, reference data buffer and data managers may include a software application stored in one or more of the disk drives connected to the disk controller 690, the ROM 658 and/or the RAM

659. The processor 654 may access one or more components as required.

[00116] A display interface 687 may permit information from the bus 652 to be displayed on a display 680 in audio, graphic, or alphanumeric format.

Communication with external devices may optionally occur using various communication ports 682. [00117] In addition to these computer-type components, the hardware may also include data input devices, such as a keyboard 679, or other input device 681 , such as a microphone, remote control, pointer, mouse and/or j oystick. Such data input devi ces communicate with the standalone computer architecture 650 via an interface 688, in some embodiments. The standalone computer architecture 650 further includes a network interface 699 that enables the architecture 650 to connect to a network, such as a network of the one or more networks 628.

[0011S] Additionally, the methods and systems described herein may be implemented on many different types of processing devices by program code comprising program instructions that are executable by the device processing subsystem. The software program instructions may include source code, object code, machine code, or any other stored data that is operable to cause a processing system to perform the methods and operations described herein and may be provided in any suitable language such as C, C++, JAVA, for example, or any other suitable programming language. Other implementations may also be used, however, such as firmware or even appropriately designed hardware configured to carry out the methods and systems described herein.

[00119] The systems' and methods' data (e.g., associations, mappings, data input, data output, intermediate data results, final data results, etc.) may be stored and implemented in one or more different types of computer-implemented data stores, such as different types of storage devices and programming constructs (e.g. , RAM, ROM, Flash memory, flat flies, databases, programming data structures, programming variables, IF-THEN (or similar type) statement constructs, etc.). It is noted that data structures describe formats for use in organizing and storing data in databases, programs, memory, or other computer-readable media for use by a computer program. [00120] The computer components, software modules, functions, data stores and data structures described herein may be connected directly or indirectly to each other in order to allow the flow of data needed for their operations. It is also noted that a module or processor includes but is not limited to a unit of code that performs a software operation, and can be implemented for example as a subroutine unit of code, or as a software function unit of code, or as an object (as in an object- oriented paradigm), or as an applet, or in a computer script, language, or as another type of computer code. The software components and/or functionality may be located on a single computer or distributed across multiple computers depending upon the situation at hand.

[00121] One or more aspects or features of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/ or combinations thereof. These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretab!e on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. The programmable system or computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. [00122] These computer programs, which can also be referred to as programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural language, an object-oriented programming language, a functional programming language, a logical programming language, and/or in assembly/machine language. As used herein, the term "machine-readable medium" refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instmctions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readabl e signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor. The machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid-state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random access memory associated with one or more physical processor cores.

[00123J In the descriptions above and in the claims, phrases such as "at least one of or "one or more of may occur followed by a conjunctive list of elements or features. The term "and/or" may also occur in a list of two or more elements or features. Unless otherwise implicitly or explicitly contradicted by the context in which it is used, such a phrase is intended to mean any of the listed elements or features individually or any of the recited elements or features in combination with any of the other recited elements or features. For example, the phrases "at least one of A and Β;" "one or more of A and Β;" and "A and/or B" are each intended to mean "A alone, B alone, or A and B together." A similar interpretation is also intended for lists including three or more items. For example, the phrases "at least one of A, B, and C;" "one or more of A, B, and C;" and "A, B, and/or C" are each intended to mean "A alone, B alone, C alone, A and B together, A and C together, B and C together, or A and B and C together." In addition, u se of the term "based on," above and in the claims is intended to mean, "based at least in part on," such that an unrecited feature or element is also permissible.

|00124] The subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration. The implementations set form in the foregoing description do not represent all implementations consistent with the subject matter described herein, instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of several further features disclosed above. In addition, the logic flows depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other implementations may be within the scope of the following claims.