METHOD AND SYSTEM FOR IDENTIFYING AT RISK PATIENTS DIAGNOSED WITH DIABETES

Title:

METHOD AND SYSTEM FOR IDENTIFYING AT RISK PATIENTS DIAGNOSED WITH DIABETES

Document Type and Number:

WIPO Patent Application WO/2000/004512

Kind Code:

Abstract:

This invention is a technique which processes patient data (124) to find patients who are at risk of diabetes (134).

Inventors:

FRIEDMAN FELIX (US)
FU QUINGGONG (US)
GELLERT GEORGE (US)

Application Number:

PCT/US1999/016345

Publication Date:

January 27, 2000

Filing Date:

July 20, 1999

Export Citation:

Click for automatic bibliography generation Help

Assignee:

SMITHKLINE BEECHAM CORP (US)
FRIEDMAN FELIX (US)
FU QUINGGONG (US)
GELLERT GEORGE (US)

International Classes:

G06F19/00; (IPC1-7): G07F/

Foreign References:

US5976082A	1999-11-02
US5937387A	1999-08-10
US5835897A	1998-11-10

Attorney, Agent or Firm:

Kanagy, James M. (UW2220 709 Swedeland Road P.O. Box 1539 King of Prussia, PA, US)

Download PDF:

View/Download PDF PDF Help

Claims:

What is Claimed:

A computerimplemented method for identifying at risk patients diagnosed with diabetes, using information about patients existing in a claims database, said method comprising the steps of: processing, based on predetermined criteria, the patient information in the claims database to extract claims information for a group of diabetes patients; defining, using the information available in the claims database, a set of events relevant to diabetes; converting data representing the extracted claims information and the defined events into files containing event level information; defining a time window to provide a time frame from which to judge whether events should be considered in subsequent processing; defining a set of variables as potential predictors; processing the event level information, using the time window and the set of variables, to generate an analysis file; and performing statistical analysis on the analysis file to generate a prediction model for use in identifying at risk patients diagnosed with diabetes, said prediction model being a function of a subset of the set of variables; and applying the prediction model to current data in the claims database to identify at risk patients for diabetes.

3.	The computerimplemented method of claim 1, wherein the step of processing extracts claims information for patients who have been diagnosed with diabetes or prescribed an antidiabetes drug.

The computerimplemented method of claim 1, wherein the step of defining a set of variables includes defining both dependent and independent variables and a hospital (HL) indicator is defined as a dependent variable, where independent variables are representative of predictors and the dependent variable is representative of an adverse health outcome to be predicted.

The computerimplemented method of claim 1, wherein the step of defining a set of variables includes defining both dependent and independent variables and a high cost indicator is defined as a dependent variable, where independent variables are representative of predictors and the dependent variable is representative of an adverse health outcome to be predicted.

The computerimplemented method of claim 1, wherein the step of defining a set of variables includes defining both dependent and independent variables, individual data elements from the claims information as well as at least one combination of a plurality of data elements from the claims information are used as independent variables.

7.	The computerimplemented method of claim 1, wherein the step of performing statistical analysis includes performing logistic regression.

An apparatus for identifying at risk patients with diabetes, information about patients existing in a claims database, said apparatus comprising: means for processing, using predetermined criteria, the patient information in the claims database to find and extract claims information for a group of diabetes patients; a predetermined set of events, derived from the claims information, said events being relevant to diabetes; means, using the extracted claim information and set of events, for creating files of event level information; means for defining a predetermined time window for providing a time frame from which to judge whether events should be considered in subsequent processing; a predetermined set of variables representing potential predictors; means, using the time window and the set of variables, for processing the event level information to generate an analysis file; means for performing statistical analysis on the analysis file to generate a prediction model used for identifying at risk patients diagnosed with diabetes, said prediction model being a function of a subset of the set of variables; and means for applying the prediction model to current data in the claims database to identify at risk patients for diabetes.

9.	The apparatus of claim 8, further comprising means for applying the prediction model to a processed claims database to identify and output a respective likelihood that each patient information in the claims database will have an adverse health outcome for diabetes.

10.

A computerreadable medium containing a program for identifying at risk patients diagnosed with diabetes from a claims database which contains information about patients, said program on said medium comprising: means for causing a computer to process, based on predetermined criteria, the patient information in the claims database to extract claims information for a group of diabetes patients; means for causing the computer to input a set of predetermined events relevant to diabetes; means for causing the computer to create, using the extracted claims information and the defined events, files containing event level information; means for causing the computer to establish a time window for providing a time frame from which to judge whether events should be considered in subsequent processing; means for causing the computer to input a set of predetermined variables representative of potential predictors; means for causing the computer to process the event level information, using the time window and the input set of variables, to generate an analysis file; means for causing the computer to perform statistical analysis on the analysis file to generate a prediction model used for identifying at risk patients diagnosed with diabetes, said prediction model being a function of a subset of the set of variables; and means for causing the computer to apply the prediction model to current data in the claims database to identify diabetes patients who are at risk for an adverse health outcome.

11.

A computerimplemented method for identifying at risk patients diagnosed with diabetes, using information about patients existing in a claims database, said method comprising the steps of: a) processing, based on predetermined criteria, the patient information in the claims database to extract claims information for a group of diabetes patients; b) defining, using the information available in the claims database, a set of events relevant to diabetes; c) converting the extracted claims information and the defined events into files containing event level information; d) defining a time window for providing a time frame from which to judge whether specific ones of the defined events should be considered in subsequent processing; e) defining a set of variables as potential predictors; f) processing the event level information, using the time window and the set of variables, to generate an analysis file; and g) performing statistical analysis on the analysis file to generate a prediction model to predict a relative change in health care costs among at risk patients diagnosed with diabetes, said prediction model using a subset of the set of the defined variables.

12.	The computerimplemented method according to claim 11 wherein step g) includes predicting a relative change in health care costs among at risk patients diagnosed with diabetes who have a hospitalization during the time window.

13.	The computerimplemented method according to claim 11 wherein step g) includes predicting a relative change in health care costs among at risk patients diagnosed with diabetes who have no hospitalizations during the time window.

14.

A computerimplemented method for identifying at risk patients diagnosed with diabetes, using information about patients existing in a claims database, said method comprising the steps of: a) processing, based on predetermined criteria, the patient information in the claims database to extract claims information for a group of diabetes patients; b) defining, using the information available in the claims database, a set of events relevant to diabetes; c) converting the extracted claims information and the defined events into files containing event level information; d) defining a time window for providing a time frame from which to judge whether specific ones of the defined events should be considered in subsequent processing; e) defining a first set of variables as potential predictors; f) defining a second set of variables as potential predictors; g) processing the event level information, using the time window and the first set of variables, to generate a first analysis file; h) processing the event level information, using the time window and the second set of variables, to generate a second analysis file; i) performing statistical analysis on the first analysis file to generate a first prediction model to predict a relative change in health care costs among at risk patients diagnosed with diabetes, said first prediction model using a subset of the first set of variables; and j) performing statistical analysis on the second analysis file to generate a second prediction model to predict future hospitalization among at risk patients diagnosed with diabetes, said second prediction model using a subset of the second set of variables.

15.	The computerimplemented method according to claim 14 further comprising: k) performing risk stratifications for the first prediction model and the second prediction model, respectively.

16.	The computerimplemented method according to claim 15 further comprising: 1) determining a combined risk stratification as the intersection of the risk stratification for the first prediction model and the risk stratification for the second prediction model.

17.	The computerimplemented method according to claim 1 wherein step g) includes predicting future hospitalization among at risk patients diagnosed with diabetes who have a hospitalization during the time window.

18.	The computerimplemented method according to claim 1 wherein step g) includes predicting future hospitalization among at risk patients diagnosed with diabetes who have no hospitalizations during the time window.

19.

The computerimplemented method according to claim 1 wherein the set of variables defined in step e) includes variables selected from a group of variables consisting of: 1) number of hospital admissions for diabetes, 2) number of diabetes inpatient days per year, 3) presence and number of diabetesrelated diagnoses, 4) presence and number of diabetes hospitalizations associated with a major diagnostic or surgical procedure 5) number of hospital admissions not for diabetes, 6) number of emergency room visits for diabetes, 7) number of outpatient visits for diabetes, 8) number of prescriptions for insulin, 9) number of prescriptions for oral hypoglycemics, 12) frequency of emergency room visits, 13) frequency of outpatient visits, 14) number of emergency room visits not for diabetes, 15) number of physician visits not for diabetes, 16) number of prescriptions for drugs possibly related to diabetes, 17) number of prescriptions for nondiabetes related drugs, 18) age, 19) sex, 20) number of visits by home health care services, 21) number of dietitian visits, and 22) number of physical therapy visits.

20.

The computerimplemented method according to claim 1 wherein the set of variables defined in step e) includes variables selected from a group of variables consisting of: 1) previous diabetes hospitalization, 2) previous diabetesrelated hospitalization, 3) ER visits during the time window, 4) visits to a cardiologist, 5) insulin usage during the time window, 6) number of months since first diabetes event, 7) age, 8) health plan membership, 9) use of neuropathy pharmacotherapy, 10) number of radiological procedures, and 11) nonuse of oral hypoglycemic during the time window.

21.

The computerimplemented method according to claim 18 wherein the set of variables defined in step e) includes variables selected from a group of variables consisting of: 1) visits to doctor for skin care, 2) visits to nephrologist, 3) number of emergency room visits, 4) age, 5) number of radiological procedures, 6) number of months since first diabetes event, 7) health plan membership, and 8) nonuse of oral hypoglycemics.

22.

The computerimplemented method according to claim 17 wherein the set of variables defined in step e) includes variables selected from a group of variables consisting of: 1) number of hospitalizations, 2) prescriptions for neuropathy pharmacotherapy, 3) Insulin only use during the time window, and 4) number of ER visits during the analysis period.

23.

The computerimplemented method according to claim 11 wherein the set of variables defined in step e) includes variables selected from a group of variables consisting of: 1) age, 2) gender, 3) last diabetes event within 1 month, 4) visits to a nephrologist, 5) hospitalization during analysis period, 6) unrelated doctor visits, 7) number of months since any event, 8) visits to doctor for care of neuropathy, 9) emergency room visits during the analysis period, 10) hospitalization LOS for conditions contributory to diabetes, 11) use of oral hypoglycemics, 12) prescriptions of drugs for diabetes sequelae, and 13) ophthalmic procedures, wherein variables 5) through 13) are used as negative predictors,.

24.

The computerimplemented method according to claim 13 wherein the set of variables defined in step e) includes variables selected from a group of variables consisting of: 1) age, 2) gender, 3) number of months since an event, 4) use of oral hypoglycemics, 5) ophthalmic procedures, 6) radiological procedures, 7) unrelated doctor visits, and 8) unrelated emergency room visits, wherein variables 3) through 8) are used as negative predictors.

25.

The computerimplemented method according to claim 12 wherein the set of variables defined in step e) includes variables selected from a group of variables consisting of: 1) pharmacotherapy for related conditions, 2) visits to nephrologists, 3) number of ER visits, 4) last event within several months, 5) cardiovascular hospitalization LOS, 6) unrelated hospitalization LOS, and 7) hospital LOS for conditions possibly contributing to diabetes, wherein variables 4) through 7) are used as negative predictors.

Description:

METHOD AND SYSTEM FOR IDENTIFYING AT RISK PATIENTS DIAGNOSED WITH DIABETES FIELD OF THE INVENTION This invention relates to database processing techniques and, more particularly, it relates to identification of diabetes patients having a high risk of adverse health outcomes using various database processing techniques.

BACKGROUND OF THE INVENTION Diabetes generally refers to disease conditions leading to polyuria, or excessive urination, which causes electrolyte depletion and may lead to other physiological problems. In its most common usage, however, diabetes refers to Diabetes Mellitus (DM) type I or type II. Both types involve problems with the hormone insulin, which allows cells to extract nutrients from the blood.

Consequently an excess of glucose accumulates in the blood (hyperglycemia), and cells virtually begin to starve because of their inability to obtain and use the nutrients. Because the blood-glucose concentration is elevated, the kidneys try to compensate by increasing urine production, leading to water and electrolyte loss. Also, the body's cells then begin metabolizing protein and fat, which causes"keto-acid"production and subsequent"keto- acidosis". Over time, keto-acidosis leads to retinopathy, neuropathy, nephropathy and vascular disease. These complications can occur years to decades after the disease develops.

In DM type I, known as insulin-dependent diabetes, the pancreatic beta-cells found anatomically in the beta-islets of Langerhans of the pancreas that produce insulin are destroyed or in some manner impaired, leading to an insufficiency of insulin secretion. Treatment for this form of the disease involves the usual injections of cloned insulin hormone several times a day.

The most common form of diabetes is DM type II (or insulin- independent diabetes) and is commonly referred to as"adult-onset"Diabetes because it normally appears clinically only later in life. In this form of the disease, cellular receptors for insulin are insensitive or have decreased sensitivity to insulin. The receptors may either not be sufficiently responsive, or may not bind well to insulin molecules. Thus, in this form, although there is usually enough insulin produced by the pancreas, the insulin cannot stimulate cells properly. Treatment for this form requires strict dietary management of sugar, mainly sucrose and glucose, and also sometimes hypoglycemic agents (drugs which reduce blood sugar concentration).

Other types of Diabetes include Diabetes Insipidous and Gestational Diabetes. Diabetes Insipidous relates to a problem in the production or utilization of Antidiuretic hormone (ADH). ADH causes the kidneys to retain water and not produce as much urine. A deficiency in this hormone leads to excessive urination, electrolyte loss, glucose depletion, and exhaustion on exertion. Patients are usually first noticed because they need to drink and go to the bathroom excessively, and usually present as children.

Patients can have virtually the same pathologies as those listed for Diabetes Mellitus. In Gestational Diabetes, a pregnant mother will normally become insensitive to insulin so that more blood glucose will be available to the fetus.

However, if for some reason this hyperglycemia becomes too great, it can cause damage to the fetus such as birth defects and mental retardation.

Diabetes Mellitus (DM) affects a number of major organ systems and produces diverse morbidity. This morbidity, including large vessel disease, microvascular disease, neuropathy, hyperglycemia, and ketoacidosis, result in heavy outpatient and inpatient health care utilization and associated expenditures. In the IJ. S. 6.5 to 7 million individuals have been diagnosed with diabetes. It is the seventh leading cause of death with approximately 160,000 persons dying with diabetes annually. The contribution of diabetes to death in the U. S. is greatly underestimated because of the lack of sensitivity of death certificates as indicators or mortality attributable to diabetes. Diabetes resulted in 13.2 million office visits in 1989 (where diabetes was the principal diagnosis). In 1988, Diabetes was a primary diagnosis for approximately 750,000 hospital discharges and was among the top seven listed diagnoses in 2.8 million discharges. Diabetes is the single greatest contributor to blindness (8%), end-stage renal disease (30%), and non-traumatic amputations in the U. S. It is a lifelong disease associated with $105 billion per year of health care costs in the U. S. alone.

Quantifying the total burden of diabetic illness is complicated by difficulties in ascertaining cases. Insulin-dependent diabetes (IDDM) is easily recognized and virtually all cases are ascertained within the health care system because of catastrophic outcomes that occur if no treatment is provided. Recognition of non-insulin-dependent diabetes (NIDDM), however, depends on the severity of disease symptoms, diagnostic activities of the health care system, and the choice of diagnostic criteria. The incidence

rate determined by criteria employed by the National Diabetes Data Group, for example, is lower than that determined by the World Health Organization because the latter uses criteria that are less stringent in case ascertainment.

The incidence of IDDM varies according to age, ranging from 7 to 27 individuals per 100,000 per year. Risk increases in the first and second decades of life, levels off after the third and fourth decades, increases again thereafter (two incidence peaks). The incidence of NIDDM ranges from 8 to 613 individuals per 100,000 per year, increasing steadily (and over 100-fold) from early childhood to old age. The population point prevalence of both types of diabetes is 3.5-4. 5% in the U. S.

Diabetes is essentially a syndrome that results from variable interaction of hereditary and environmental factors. DM has no distinct etiology, pathogenesis, fixed set of clinical findings, or curative therapy. The pathogenesis of diabetic disease manifests at the organ system level in a variable manner and over the life of the individual. Left untreated or inadequately treated, DM is immediately life-threatening and a progressive and often debilitating disease.

With respect to costs, 1992 per capita expenditures for confirmed diabetics (at $11,157) were more than four times greater than for nondiabetics. In 1992, while diabetics constituted only 4.5% of the U. S. population they accounted for 14.6% of total U. S. health care expenditures ($105 billion). One study determined that health care expenditures for individuals with diabetes constituted about 1 in 7 health care dollars spent in 1992. Diabetes comprised 6.4% of employers'health care costs in 1993, comparable to costs associated with hypertension, heart failure, and arthritis.

Inpatient costs dominate diabetes health care expenditures (63%).

A diabetic disease management program should be aimed at preventing adverse outcomes, such as microvascular and macrovascular disease, and the use of costly health care resources. Specifically, the tight control of blood glucose and early treatment of short-term and long-term complications can facilitate this objective in clinical care of diabetics. A diabetic has a risk of cardiovascular death 3.5 times that of a non-diabetic of the same age. Diabetics have a two to six fold greater risk of stroke than the general population. About 30% of all diabetics eventually develop peripheral

vascular disease (pvd), and their risk of pvd is 3.8 (men) to 6.5 (women) times greater than the non-diabetic population. Leg and foot amputations are 5 to 11 times more frequent in diabetic than non-diabetic persons. Factors related to an increased risk of disease progression and severe sequelae include: age, obesity, insulin-dependence, and poor glucose control as indicated by glycolated hemoglobin. It is estimated that up to 50% of non-insulin dependent diabetics remain undiagnosed. Therefore in any plan population, the number of individuals at risk for manifest diabetes and pathogenetically in evolution toward diabetes is probably close to double the number found to have medical and pharmaceutical claims related to diabetic disease and insulin/oral hypoglycemic use. It is of course unknown whether the undiagnosed are at the same risk for severe complications of diabetes.

There has been a marked increase in the number of seniors enrolling in managed care plans (MCOs). In some plans, elderly persons can account for up to 50% of total enrollment. Growth in Medicare risk programs is expected to continue, due to the anticipated passage of the Medicare reform bill, which will encourage seniors to join health plans. Also, the current "50/50"provision that forces MCOs to enroll one non-Medicare patient for every Medicare patient enrolled may be overturned, which will make it easy for plans to concentrate on Medicare patients. Patients with diabetes are living longer as treatment options increase. This means that MCOs will need to be able to cost-effectively manage diabetes for an ever-growing number of patients.

An overall objective of any Diabetes Disease Management Program should be to improve the quality of treatment and outcomes for patients with diabetes while, at the same time, achieving cost savings. An important step in doing so is to identify patients who are at high risk of adverse outcomes and assuring"best practice"treatment of these patients.

SUMMARY OF THE INVENTION The present invention involves a computer-implemented method for generating a model to identify at-risk patients diagnosed with diabetes, from information about patients existing in a database. According to an exemplary embodiment of the present invention, the method includes the steps

of 1) processing, based on predetermined criteria, the patient information in the database to find and extract information for a group of diabetes patients; 2) defining, using the information available in the database, events relevant to diabetes; 3) processing the extracted information and the defined events to create files containing event level information; 4) defining a time window for providing a time frame from which to judge whether events should be considered in subsequent processing; 5) defining a set of variables as potential predictors; 6) processing the event level information, using the time window and the set of variables, to generate an analysis file; and 7) performing statistical analysis on the analysis file to generate a prediction model, the prediction model being a function of a subset of the set of variables.

Another aspect of the present invention involves a computer- implemented method for identifying, using the generated model, at risk patients diagnosed with diabetes. The method according to this aspect of the present invention includes the additional step of applying the prediction model to a processed claims database to identify and output a file listing the likelihood of each patient having an adverse health outcome.

BRIEF DESCRIPTION OF THE DRAWINGS The invention is best understood from the following detailed description when read in connection with the accompanying drawing, in which: Figure 1A is a high-level flowchart illustrating an exemplary overall process of the present invention.

Figure 1B is a high-level flowchart illustrating an exemplary process application for the present invention.

Figure 2 is a high-level block diagram illustrating three exemplary sources of information suitable for use with the present invention.

Figure 3 is a data structure diagram which shows an exemplary format in which the information from the sources of Figure 2 are stored in a research database.

Figure 4 is a data structure diagram which shows an exemplary format for an event level file generated by the process shown in Figure 1.

Figure 5 is a data structure diagram which shows an exemplary format for an analysis file generated, in part, from the event level file shown in Figure 4 and during the process shown in Figure 1.

Figure 6A is a simple bubble diagram illustrating the concept of analysis and prediction zones as used in the present invention.

Figure 6B is a time-line diagram which shows a first exemplary time window scheme suitable for use in processing the data, in part, from the event level files shown in Figure 4.

DETAILED DESCRIPTION OF THE INVENTION Overview The present invention is embodied in apparatus and a method to identify, in a predetermined population of diabetes patients, those patients at high risk of adverse health outcomes. The identification of these high risk patients is an initial stage in attempts, e. g., targeted interventions, to prevent and/or improve their health outcome.

Initially, one or more sources of information are used which allow for the identification of an initial population of diabetes patients.

Examples of sources include health care providers such as doctors, hospitals and pharmacies which all keep records for their patients. The individual records for each of these providers, however, may be scattered, difficult to access, and/or have many different formats. On the other hand, a more comprehensive source containing this type of information exists in the health care records of any given benefits provider. One example is as follows: Claims for doctors and hospital are generally received by a managed healthcare organization or a pharmaceutical services provider for processing and reimbursement. These claims are entered into a relational database.

Claims are subsequently downloaded to a second relational data base (RDB) after data integrity checks have been performed, and stored in the format of

SAS. to be included in the analysis preferably incorporate all available data within the RDB so that the maximum number of variables will be available for prediction. Outcomes to be risk stratified desirably include hospital admission and total cost of care (broken down into 5% and 10% increments).

Patients that are typically at high risk are those patients who have experienced a hospital or emergency room visit for diabetes, patients who have experienced a hospital admission for diabetes, and/or patients who consume in the top 5% and 10% of total costs for the treatment of diabetes in the 12 months after a diabetes diagnosis appears in the claims database or after there is a prescription for a serum glucose lowering agent.

Figure 1A is a high-level flowchart illustrating an exemplary overall process of the present invention. As illustrated in Figure 1, the"raw" claims information is received and stored in a database (e. g., DB2 format) represented by block 110. In the world of claims processing, before this database of"raw"information can be useful, some pre-processing, step 112, is generally performed which may include rejecting claims, reconciling multiple claims and so on. The output of this preprocessing step, represented by block 114, is a"cleaner"database now stored, in the exemplary embodiment, in SAS format.

SAS (8) is the name of a well known format and software package produced by SAS Institute, Inc. of Cary, North Carolina. It should be noted that other data processing and storage formats, as appreciated by those skilled in the art, could be used in the storage and processing of data.

It should also be noted that SAS formats, programming techniques and functions are more fully described in the SAS/STAT User's Guide, Version 6, Fourth Edition, Volumes 1 and 2,1990 and the SAS Language: Reference, Version 6, First Edition, 1990 which are both herein incorporated by reference for their teachings regarding the SAS language, SAS programming, functions and formats.

Moreover, the SAS routines used for processing information as part of the present invention are used for computational operations, executed on a computer and stored on a storage medium such as magnetic tape, disk, CD ROM or other suitable medium for purposes of storage and/or

transportability. The stored software can then be used with a computer such as a PC.

The claims records of the benefits provider, although containing information such as medical, hospital and pharmacy reimbursement claims, may not be organized in a manner for efficient analysis. Thus, the next step is to perform another processing step (e. g., screening for diabetes patients, age, etc.), represented by block 116, to transform the"raw"data into a more appropriate and useful database. That is, the output data from the processing (i. e., extraction) step is a subset of the"raw"information and represents an initial universe of diabetes patients upon which further processing is performed.

A next step, which is optional, is to perform a"quality check" on the initial universe of diabetes patients. This step is somewhat subjective.

This processing step, represented by block 118, using intermediate output files, performs a refinement of the extracted information by, for example, checking to see if an imbalance exists in the extracted information. This is essentially a common sense check, can be performed as many times as necessary to ensure the integrity of the database data. At this point, the database data exists at the claim level.

The information existing at the claim level provides various information in the form of raw data elements. From the claims level data, the next processing step, represented by block 120, creates new files (e. g., primary file 1 and primary file 2) by reformatting the information into an event level format.

Before this occurs, a set of events (e. g., doctor visit for diabetes) relevant to diabetes are defined using a combination of both the raw data elements available from the claims information and clinical knowledge about diabetes. With these events defined, the claims level information is used to create new files based on events rather than claims. Having the information in an event level format is an important aspect of the present invention in that, among other things, it allows for added flexibility in subsequent analysis.

As depicted by block 122, further processing is performed on the event level data to generate an analysis file. In particular, the processing is performed using input information representative of a sliding time window and a plurality of variables. The time window imposes limits on the time periods in which the events from the primary files are considered. That is to say, the time window is used to identify an analysis region and a prediction region where activity in the analysis region is used to predict some predetermined outcome in the prediction region. The selection of variables, both dependent and independent, for analysis, is an important step impacting the accuracy of the final prediction model. The dependent variables are representative of the desired result (i. e., a potential adverse health outcome to be predicted); whereas, the independent variables are representative of predictors. This processing step, step 122, can be easily re-programmed, via the input parameters, for various time window adjustments as well as various variable modifications. The analysis file generated at this step is a member level file which means it is broken down by member.

With the analysis file in hand, a model or technique for identifying high risk subgroups is determined. That is, as represented by step 124, the analysis file is used to develop an identification technique represented by an equation incorporating a subset of the initial variables programmed into the above-mentioned processing step. The resulting subset are those variables which best reflect a correlation to adverse health outcomes, consequently, resulting in substantial use of health care resources (e. g., funds). It should be noted that the determination of the initial as well as the final variables is an important aspect of present invention as the variables may significantly impact the accuracy of the identification of the subgroup.

The above model for identification can be developed, step 124, in various ways using statistical techniques. The technique used in the exemplary embodiment of the present invention for generating the model is multiple logistic regression.

Figure 1B is a high-level flowchart illustrating an exemplary process of the application of the present invention. Having developed the model, as shown in Figure 1A, it can then be applied to updated claims data, step 132, or to other databases of diabetes patients (e. g., claims information

for other benefits providers), in order to identify at risk patients diagnosed with diabetes, step 134, allowing for various types of targeted intervention to properly care for suffering patients as well as maximize the effective allocation of health care resources.

Exemplary Embodiment of the Invention Although the present invention is illustrated and described below with respect to specific examples of a method and system for identifying diabetes patients at high risk for adverse health outcomes, the invention is not intended to be limited to the details shown. Rather, various modifications may be made in the details within the scope and range of equivalents of the claims and without departing from the spirit of the invention.

As mentioned, the present invention is designed to identify patients with diabetes at high risk of adverse health outcomes. The identification of this high risk subgroup is the first step in being able to try different treatment techniques (e. g., targeted interventions).

Initially, a source of information is required which allows for the identification of a population of diabetes patients. A comprehensive source containing this type of information exists in the health care claims records of many benefit providers. As is known, claims for drugs, doctors and hospitals are received and processed for payment/reimbursement. In the exemplary embodiment of the present invention, this claims information is entered into a DB2 database on a benefits provider's computer system (not shown).

Figure 2 is a high-level block diagram illustrating three exemplary sources of information suitable for use with the present invention.

As illustrated in Figure 2, the claims information of such a provider would typically include three sources: pharmacy claims (Rx) 210, doctor (DR) claims 212, and hospital (HL) claims 214. As listed on the blocks representing the claims information, many types of information would be available from the respective claims including drug codes, physician's names, diagnosis codes, procedures, various dates and other important information.

Much of this information is referenced using codes, such as drug codes, procedure codes and illness codes. Appendices I-VII provide exemplary listings of various codes used with the present invention. These codes were selected for processing purposes of the present invention from a voluminous source of codes and, as will be appreciated by those skilled in the art, may be modified to include/exclude codes deemed more/less useful at the various stages of processing.

It should also be noted that in the health care industry various codes are used in claims information for indicating which procedures, treatments, diagnoses, drugs, etc. are being claimed. For the exemplary embodiment of the present invention, the selected codes are shown in Appendices I-VII. These codes were found in Physician's Current Procedural Terminology (CPT), American Medical Association (1995) and St. Anthony's ICD-9-CM Code Book (1994) which are both incorporated herein by reference for their teaching of codes and sources of codes. As will be appreciated by those skilled in the art, any set of codes, representative of the various procedures, treatments, diagnosis, drugs, etc. relevant for use with the present invention would suffice. Reference to such codes occurs throughout this specification.

The first relational database file represents a source of"raw" data elements which require processing. A first step in processing this raw data is to perform data integrity checks (e. g., rejected or reconciled claims).

Subsequently, the data is routinely download into a "research"database, called RDB herein above. The research database is a claims level database in the format of SAS.

Exemplary formats, for each of the Rx, DR and HL claims, of the records contained in the research Database, are shown in Figure 3. As shown in Figure 3, claims are listed from claim 1 to claim x and the appropriate information, for the particular service provider (e. g., Rx) being claimed, is also presented.

Once in the format of SAS, the information is processed using the procedures of SAS to 1) identify patients with diabetes (step 116), 2) process the claims level information into event level information (step 120), 3)

using predetermined variables and time frame schemes, generate analysis files for analysis purposes (step 122) and 4) create a prediction model as a function of those variables most reflective of the correlation to an adverse health outcome (step 124).

It should be mentioned that, from a statistical perspective, an important consideration in developing prediction models from datasets is sample size. To maximize the integrity of the prediction model, sample size is an important factor. Prevalence of diabetes, as mentioned in the Background section, is reported to be approximately between 3.5 and 4.5 percent; however, desirable sample sizes which may be used to determine prediction equations depend on the magnitude of association between variables. As these associations are initially unknown, all patients within any individual plan are initially included.

The first step, extracting patients with diabetes (step 116), uses various parameters to define which patients qualify for the overall initial universe of diabetes patients to be considered.

According to an exemplary embodiment of the present invention, the following patients are eligible for the diabetic cohort: 1) patients treated with a serum glucose lowering agent such as hypoglycemics and/or insulin, 2) patients with two or more ICD-9 codes for diabetes, 3) patients aged 10 years or older, 4) patients not diagnosed with or otherwise suspected of having gestational diabetes, 5) patients diagnosed with diabetes (ICD-9) during the preceding 24 months,

6) optionally patients who have filed at least 1 reimbursement claim during the preceding 12 months, (required for implementing the program but not necessarily for 7) patients for whom the oldest available diagnosis or prescription was made at least 12 months before the patient is considered (used in the analysis but not in the implementation of risk stratification), 8) patients for whom the total cost of diabetes care is greater than $0.00, or 9) patients who have been enrolled in a treatment plan for at least 24 continuous months (used in the analysis step but not for implementation).

Of course, these criteria are exemplary and could be modified such that 12 months or 3 months of enrollment is satisfactory or that an individual must be at least 18 years of age. In the exemplary embodiment of the present invention, the claims extraction step, step 116, extracts all claims data for patients with either an appropriate code for diabetes (see Appendix I) or for treatment with a diabetes-related drug (see Appendix VII).

Members having any of the claims which identify them as diabetic may be further classified as follows: Category Label Category Definitions 1 +/- DM dx Insulin use only with or without claims indicating diabetes diagnosis H+/- DM dx Oral use only with or without claims indicating diabetes diagnosis H and I +/- DM dx Oral hypoglycemic and insulin use with or without claims indicating diabetes diagnosis 2+DM dx Two or more claims indicating diabetes diagnosis, no claims evidence of drug therapy

Members having received one or more diagnoses for diabetes, but all diagnoses were the 3rd, 4th or 5th diagnoses in a hospital claim and for whom no other diabetes association existed in the claims database, are desirably quantified by the number of members at each diagnostic level. If these numbers are few, such members will be excluded from the analysis. The glycated hemoglobin test is desirably not used as an indicator of diabetes in the process of identifying individuals with diabetes due to the prevalence of the screening test among the non-diabetic population.

Subsequent to the claim extraction step, the claim adjustment and integrity checks are optionally performed, step 118. To do so, from the dataset defined above, intermediate output files are generated which contain information for processing purposes. In the exemplary embodiment of the present invention, information for the following items is generated for review to determine if the data within intermediate data files is in general agreement with the common clinical knowledge and experience and with literature evidence as to event frequencies.

A) First, a frequency count of the number of enrollment periods for the members is generated. Then, for members with multiple enrollment periods of at least 6 months duration, it is determined if a diabetes diagnosis is present in each enrollment period. Consequently, enrollment periods without a diabetes diagnosis are generally excluded and, for members with multiple enrollment periods that have a diabetes diagnosis, the most recent enrollment period that contains a diabetes diagnosis is generally kept.

According to the exemplary embodiment of the present invention, the following frequency counts are also generated: 1) frequency counts for members classified according to sex, age group and enrollment duration by months, including: i) respective frequency counts for male members and female members, ii) respective frequency counts for members of particular age groups (e. g., age 10-19,20-29, etc.),

iii) respective frequency counts for members of the particular age groups further classified by sex, and iv) respective frequency counts for number of months of enrollment duration (e. g., 1 month to maximum number of months desired), 2) frequency counts for numbers of members having at least two of the ICD-9 codes listed in Fig. 4, 3) frequency counts of members using serum glucose lowering drugs, including the number of members having at least one claim for each or both of the drugs listed (e. g., insulin, oral hypoglycemics), 4) frequency counts of members qualifying by ICD-9 code by drug only, and by both ICD code and drug, 5) frequency counts of ICD-9 codes using only the first three digits of ICD codes of any nature in DR (any position) and HL files, for example, for at least the top ten members with respective frequencies, 6) frequency counts of procedures related to diabetes and diabetes-related conditions (e. g., renal failure, amputation) using CPT and/or ICD-9 procedure codes, ordered in descending frequency, 7) frequency counts of all, for example, the top ten, for example, CPT codes (to the level of the first three code digits), ordered in descending frequency, 8) frequency count from the hospital file by location or ranking of diabetes ICD-9 code, 9) frequency count of ICD-9 codes listed in each position (i. e., number of individuals with unfilled coding positions and fewer diagnoses),

10) frequency count by ICD-9 code of the first three positions, for example, based on a diagnosis listed above the diabetes ICD-9 code, and 11) frequency count of patients who, during the most recent 24 months, did and did not have a diabetes code.

According to one exemplary embodiment of the present invention, the above frequency counts are modified to exclude: 1) members enrolled for 24 or fewer months, 2) enrollment periods with no hits for diabetes or serum glucose lowering drugs among members included in the cohort because of another enrollment period with a hit, 3) members less than 10 years old (generally due to insufficient time for such members to manifest chronic morbidity and relatively high-cost health care needs), 4) members not continuously enrolled but receiving a diabetes diagnosis or serum glucose lowering drug in one or more of the periods of enrollment, to the extent of the periods of enrollment in which no such diagnosis or drug was received, and 5) gestational diabetes, as defined by: (a) abnormal glucose tolerance test in diabetes/gestational diabetes (ICD-9 648.8) ; or (b) any diabetes codes which appear, for example, two months prior or up to 10 months after the codes listed below: 630. xx-676.9 Any and all conditions related to pregnancy and childbirth V22 Normal pregnancy V22.0 Supervision of normal first pregnancy

V22.1 Supervision of other normal pregnancy V22.2 Pregnancy state, incidental V23 Supervision of high-risk pregnancy V23. x Various high-risk pregnancies V28 Antenatal screening V28. x Various antenatal screening V72.4 Pregnancy examination or test, pregnancy unconfirmed (c) any of the ICD-9 codes listed above occurring, for example, a maximum of 10 months prior to or two months following the codes below: V24 Post-partum care and examination V24. x Various post-partum care V27 Outcome of delivery V27. x Various outcomes of delivery B) For the one enrollment period for all remaining members, ALL COSTS encountered by that member during the entire enrollment are identified. A complete proc univariate for ALL COSTS is provided for each plan separately and all plans together. It should be noted that"proc univariate"is a SAS procedure which generates descriptive statistics (e. g., mean, standard deviation, etc.) C) From the ALL COSTS determined above, costs which are specifically DIABETES COSTS are identified. In doing so, a cost is considered to be a DIABETES COST, if a claim from the DR or HL file has any diabetes ICD-9 code in the first or second position. If a claim from the Rx file is from therapeutic class of agents used to treat hypoglycemia or hyperglycemia then it is counted as a diabetes claim and count cost as DIABETES COST. A complete proc univariate for DIABETES COSTS is also provided for each plan separately and all plans together.

D) For all member enrollment periods remaining, the total member months for each plan is calculated separately and together. In doing so, a member is considered enrolled during any month that they were enrolled for at least one day. For this, a complete proc univariate is provided for member months for each plan separately and all plans together.

E) Finally, a unique member count is provided for all patient status code = 20 within the remaining enrollment periods. It is noted that status code = 20 indicates that the patient has expired or did not recover.

It should be noted that, regarding the cost calculations, the following guidelines apply in the exemplary embodiment of the present invention: a. the cost of inpatient hospitalizations, emergency services, physician/outpatient, and other medical services on a per claim basis are considered to be: AMTPAID + AMTCOPAY + AMTRESERVE + AMTDEDUCT b. the cost of drugs are considered to be: AMTPAID + AMTCOPAY It should also be noted that, for purposes of a cost hierarchy, the following rules were used in the exemplary embodiment of the present invention.

1. Only hospitalizations for diabetes can spawn other events.

2. Hospital costs include all Rx, procedure, physician charges.

3. Hospital visits can generate Rx and procedure events with costs set to zero (included in hospital cost).

4. Hospital visits cannot generate separate doctor visit events.

The above information for use in performing preliminary evaluations as to the integrity of the data is exemplary and could be modified to include/exclude parameters which are shown to be more/less useful within the spirit of the present invention.

With this information, a"quality check"is performed on the initial universe of diabetes patients to make sure that the final results, i. e., prediction model, is not unreasonably skewed due to imbalanced input information. This processing step, block 118, using intermediate output files, allows for a refinement of the extracted information by, for example, checking to see if an imbalance exists in the extracted information which may otherwise taint the integrity of a prediction model. Step 118, in the exemplary embodiment, is performed manually by viewing the intermediate output files.

It is contemplated, however, that using various threshold values, this information could be automatically processed to flag a potential imbalance.

Having now extracted and refined the claims level information according to various predetermined criteria deemed relevant for subsequent processing purposes, the information is converted into an event level format.

To provide processing flexibility, particularly in assigning time windows for analysis, the above-mentioned second step (i. e., converting the claims level information into event level information, step 122) is employed to generate two primary data files from which an analysis file can be created.

In the exemplary embodiment of the present invention, primary data file 1 is a member level file and contains all data of a static nature (i. e., not time sensitive) such as 1) Member Key, 2) Date of birth, 3) Gender, 4) First enrollment date of most recent 24 months of enrollment (i. e., start of dataset or enrollment date), 5) End data of most recent 24 months of enrollment (i. e., end of dataset or last date of enrollment), 6) Date of first diabetes diagnosis (first prescription for serum glucose lowering agent or diabetes diagnosis or diabetes hospitalization), 7) Date of last hospitalization, 8) Mode of entry into the dataset: i) prescription of serum glucose lowering drug only - oral hypoglycemic, ii) prescription of serum glucose lowering drug only - insulin, iii) prescription of both serum glucose lowering drug and a diabetes diagnosis, and iv) two or more diabetes diagnoses.

Primary data file 2 is an events level file with a record for each event ordered by member and the chronological date of the event, in the present invention, presented in descending order of event date.

It should be noted that an event is an occurrence which, based on clinical knowledge, is deemed relevant to diabetes. Having knowledge of what raw data elements are available from the claims, a set of events is defined directly or indirectly from the data elements where events can be based on an individual data element, combination of data elements or derived from individual or multiple data elements.

Figure 4 is an exemplary list of events and format for primary file 2 (an event level file).

With respect to Figure 4, in the exemplary embodiment of the present invention, the following exemplary ground rules are established for providing counts for the various events: I. Count as a HOSPITALIZATION event (using both 1st and 2nd ICD-9 codes) a claim having a"from"and"through"date of at least one day AND having a site code of 04. It is noted that a site code distinguishes between the sites at which the service under consideration took place (e. g., emergency room, doctor's office, etc.). It should be noted that costs go to 1 st ICD-9 code category only. Also, if a new hospitalization occurs within one day of discharge from a previous hospitalization, the two hospitalizations are bridged into one. If a new hospitalization occurs greater than one day following a previous hospitalization, the second hospitalization is considered a new one.

II. Count as an ER VISIT event (using both 1st and 2nd ICD-9 codes) a claim having a site code of 07,08 or 10 OR a claim with the following the Hospital Common Procedure Coding System (HCPCS) codes: A0010-A0070, A0215-A0225, A0999 with a provider code = 81. It should be noted that costs go to 1st ICD-9 code category only.

III. Count as an OFFICE VISIT event (using only one ICD-9 code) a claim having a site code of 01 or 06 AND having a unique date of service (DOS) but allow for different provider keys on the same DOS (if same provider key on same DOS, consider to be the same office visit) BUT if an office visit event occurs during a hospitalization, do not generate an office visit event (Attribute all costs for this event to the hospitalization). ALSO

count as an OFFICE VISIT a claim with the following HCPCS codes A0080- A0210 with provider code = 81. For all other office visit events, costs go to 1st ICD-9 code category only.

The Three Event Types above are further defined according to the associated Diagnoses that follow: Count as a Diabetes Hospital, Emergency Room (ER) or office visit if the first or second ICD-9 code is from APPENDIX I.

Count as an Diabetes-Related Hospital, ER or office visit if the first or second ICD-9 code is from APPENDIX II: Count as a Conditions-Possibly-Contributory-to-Diabetes Hospital, Emergency Room or office visit if the first or second ICD-9 code is from APPENDIX III.

Count as an Other-High-Cost Hospital, ER or office visit if the first or second ICD-9 code is from APPENDIX IV.

Count as a Possibly-Diabetes-Related Hospital, ER or office visit if the first or second ICD-9 code is from APPENDIX V.

Count as an Event Indicating Effective Diabetes Care if the first or second ICD-9 code is from APPENDIX VI.

Count as a Diabetes-Related Rx Event if the first or second ICD-9 code is from APPENDIX VII.

Count as an Other Disease Hospital, ER or office visit if the first or second ICD-9 code is not from any of the above Appendices.

Count as a MISCELLANEOUS MEDICAL EVENT any claim having a site code of 02,03, 04, or 09 AND any claim that cannot be linked to one of the other type of events (ER, Hospital or Office Visit).

After generating the two primary files using the above described instructions and rules, corresponding to step 120 of Figure 1, further processing is performed on the event level data to generate an analysis file, step 122. An exemplary format for the analysis file is shown in Figure 5. As shown, the format of the analysis file includes a list of members in a first column of a table. Across the top of the table is a list of variables, described in detail below. And, the body of the table provides indications as to a member's relation to a listed variable.

In particular, the processing from the primary files to the analysis files includes an algorithm defined, in part, by a time window and a plurality of variables. The algorithm can be re-programmed for various time window adjustments as well as variable modifications. The analysis file generated at this step is a member level file (i. e., organized with respect to members). The main analysis files are member level files derived from the information in the primary files.

Each main analysis file is created to take into account a single reference time window of censored events and prediction window of interest for that file. Each new time window applied to the data, in the exemplary embodiment, uses another main analysis file.

To generate the analysis file, a time window scheme, using a plurality of variables, is applied to the event level data.

Discussing the variables first, included in the processing are both independent and dependent variables. The independent variables represent potential predictors of the adverse health outcomes; whereas, the dependent variables represent the adverse health outcomes to be predicted.

To determine exemplary independent variables for step 122, as many of the original data elements as possible are used, assuming nothing about diabetes. Then, based on clinical knowledge, additional variables are created. Furthermore, combinations of the data elements and/or variables, based on clinical knowledge, are used as variables. Finally, some variables may be created and used based on their potential utility as leverage points in disease management.

The cost hierarchy is structured as follows: 1) all hospitalizations which are the result of diabetes are treated as diabetes costs, 2) hospital costs include all pharmacy, procedure, and physician charges, and 3) costs for separate doctor visit claims concurrent with any hospitalization are absorbed by the hospitalization.

In the exemplary embodiment of the present invention, the plurality of variables, in addition to each of the items in the event file, currently used by step 122 in the SAS routine for generating an analysis file are shown below in Table 1. Cases are defined by non-zero values of the dependent variable (s) listed above. The independent variables listed in Table 1 are created for the time frames dictated by the analysis schemes. For example, in one embodiment, a prediction zone of 12 months is created at the end of each member's enrollment. In this example, the model is built from the creation of the variables for up to 12 months prior to the prediction zone.

It is noted that each of the events in Figure 4 are automatically considered an independent variable for processing.

Table 1 Independent Variables of Interest: 1. Hospital Admissions for Diabetes: a. Any hospital claim identified by hospital site code. b. Having a from and through duration of at least one (1) day (overnight or admit and discharge dates that are different). c. Having a specified diabetes ICD-9 code (i) in the 1st or 2nd position or (ii) in the 3rd, 4th and or 5th position providing that a diabetes- related ICD-9 code (to be defined) appears in position 1 or 2.

2. Number of Diabetes Inpatient Days per Year.

3-8 Presence and number of diabetes-related diagnoses (sequelae) evaluated individually as subgroups, and as a collapsed category: 3. Diseases of the Eye 4. Nerve Disorders 5. Skin and Foot Infection Disorders 6. Other Infections 7. Vascular/Circulatory/Cardiac Disorders 8. Renal Disorders 9-14 Presence and Number of Diabetes Hospitalizations Associated with a Major Diagnostic or Surgical Procedure by Organ System Affected by Diabetic Disease 9. Cardiovascular/Vascular (CPT codes): 33503 CABG w/o cardiopulmonary bypass 33504 CABG with cardiopulmonary bypass 33510-16 CABG using venous grafting 33517 CABG using venous and arterial grafts 33518-23 CABG, variants of above 33530 CABG re-operation 33533-36,42, 45 CABG, variants of above 33570,72, 75 Coronary endarterectomy 33860,61, 63,70, 75,77 Thoracic aortic aneurysm, variants 35301,11, 21,31, 41,51, 55,61, Thromboendarterectomy, various 71,72, 81,90 35450,52, 54,56, 58,59, 60 Transluminal angioplasty - open 35470-76 Transluminal angioplasty - percutaneous 35480-85,90-95 Transluminal angioplasty - open and percutaneous 35501,06, 07,08, 09,11, 15,16, Bypass graft various 18,21, 26,31, 33,36, 41, 46,48, 49,51, 56,58, 60, 63,65, 66,71, 82,83, 85, 87/35601,06, 12,16, 21, 23,26, 31,36, 41,42, 45, 46,50, 51,54, 56,61, 63, 65,66, 71,81 35700, 01, 21,41, 61 Exploration 35800,20, 40,60 70,75, 75 35901,03, 05, 07 36005,10-15 Vascular injection procedures 36100,20, 45,60 36200,15-18, 45-48,60-62 36410,30, 88,91, 93 36530-35 92950 CPR 92960 Cardioversion 92975-77 Thrombolysis, coronary 92982-84 Percut translum coronary balloon angioplasty (medical) 92995-96 Percut translum coronary artherectomy 9300-18,93224 and 93235 ECG and stress tests (24 hours) 93307-50 Echocardiography 93501-562 Cardiac catheterization 93886 Cerebrovascular arterial studies 93922-31 Extremity arterial studies 10. Integumentary (CPT codes): 11000, 01, 40,41, 42,43, 44 Excision-debridement 11300-13 12001-13300 Repairs 15920-99 Pressure/decubitis ulcers 11. Musculoskeletal Lower Limb (CPT codes): 27590-98 Amputation: thigh 27880-89 Amputation: below knee 28800-25 Amputation: foot, toes 12. Renal/Urinary System (CPT codes): 50300-50380 Transplantation 90918-25 Dialysis (hemo and peritoneal, and 90935-90999 patient training) 90920-21,90924-25 End stage renal disease services ICD Procedure Code: V56.0 Extracorporeal dialysis 13. Eye and Ocular Adnexa (CPT codes): 66830-66966 Removal cataract

67005-67299 Retinal, vitreous, or choroid repair, prophylaxis, removal or destruction (including treatment of retinopathy) 76519 Echography 92002-92287 General opthalmic services ICD-9-CM Codes: Examination of eyes/vision V72.0 Special screening for neurological, eye V80-80.2 and ear diseases Diagnostic procedures on orbit and eye 16.2-29 14. Radiology (CPT codes): 74400-425 Urinary tract (urography) 73700-725 CT/MRI lower extremity 36000-299 Vascular radiology injection procedures 93501-93556 Cardiac catheterization 75552-75685 Cardiac MRI, aortography and angiography 75710-16 Angiography lower extremity 75722-24,75754 Renal angiography 75692-68 Transluminal balloon angioplasty 76604 Chest echography 76700,76705, and 76770 Echo of abdomen 76856 Echo of pelvis 78414-483 Myocardial perfusion and imaging 78600-615 Brain imaging 78700-727 GU imaging 15. Number of Hospital Admissions not for Diabetes 16. Number of Emergency Room Visits for Diabetes 17. Number of Physician (Outpatient) Visits for Diabetes (collapsed) For this independent variable, costs are desirably assigned to the doctor visit or hospitalization in which the procedure occurred. Procedures could have been performed for a condition other than diabetes, although these patients qualified for membership in the cohort by virtue of receiving a diabetes diagnosis, a diabetes-related diagnosis, or receiving a serum glucose lowering agent at some time.

18-23. Number of Physician (outpatient) Visits for Diabetes by Specialty:

18. Family Practitioner 19. Internist 20. Endocrinologist 21. Cardiologist 22. Nephrologist 23. Opthamologist 24. Number of Prescriptions for Insulin (DPS Code 8.1. 1) 25. Number of Prescriptions for Oral Hypoglycemics (DPS Code 8.1. 2) 26. Number of Hospital Admissions (HL) for Hemodialysis 27. Number of Emergency Room Visits (ER) for Hemodialysis 28. Number of Outpatient Visits (OV) for Hemodialysis 29. Frequency of Hospital Admissions (HL) to a Cardiologist 30. Frequency of Emergency Room Visits (ER) to a Cardiologist 31. Frequency of Outpatient Visits (OV) to a Cardiologist 32. Number of Emergency Room Visits Not for Diabetes (all non- diabetes ER visit codes) 33. Number of (non-hospital) Physician Visits Not for Diabetes 34. Number of Prescriptions for Drugs Possibly Related to Treatment of Diabetes Complications and Sequelae (collapsed pharmacy codes): DPS Formulary Codes: 4.3, 4.5. 1,4. 5.4, 4.5. 6 Anti-hypertensive drugs 4.8 Lipid-lowering drugs 4.1, 4.2, 4.6 Cardiovascular drugs 5.5 Antidepressants 5.8, 5.4 Phenothiazines (Fluphenazine, Carbamazepine) 9.3 Metoclopramide

35 Number of Prescriptions for Drugs Possibly Related to Treatment of Diabetes Complications and Sequelae (by individual category of drug): DPS Formulary Codes: 4.3, 4.5. 1,4. 5.4, 4.5. 6 Antihypertensive drugs 4.8 Lipid-lowering drugs 4.1, 4.2, 4.6 Cardiovascular drugs 5.8, 5.4 Fluphenazine, Carbamazepine 9.3 Metoclopramide 5.5 Antidepressants 36. Number of Prescriptions for Non-diabetes Drugs (excluding all codes given above for insulin, oral hypoglycemics, and possibly related drugs) 37. Total Value of All Claims Paid Out 38. Age at Time of Enrollment to the Plan 39. Sex A number of the following disorders are associated with high hospital utilization and costs among patients with diabetes. Their entry into the analysis as independent variables is based on this association.

40. Number of Hospital Admissions (HL) for Depression 41. Number of Emergency Room Visits (ER) for Depression 42. Number of Outpatient Visits (OV) for Depression 43. Lifestyle Hospital Admissions 44. Lifestyle Emergency Room (ER) Visits 45. Lifestyle Outpatient Visits (OV) 46. Hospital Admission (HL) for Hyperplasia of Prostate 47. Emergency Room Visit (ER) for Hyperplasia of Prostate 48. Outpatient Visit (OV) for Hyperplasia of Prostate

49. Hospital Admission (HL) for Lumbar Disc Displacement 50. Emergency Room Visit (ER) for Lumbar Disc Displacement 51. Outpatient Visit (OV) for Lumbar Disc Displacement 52. Hospital Admission (HL) for Breast Cancer 57. Emergency Room (ER) Visit for Breast Cancer 58. Outpatient Visit (OV) for Breast Cancer 59. Hospital Admission for Abdominal Pain 60. Emergency Room (ER) Visit for Abdominal Pain 61. Outpatient Visit (OV) for Abdominal Pain 62. Hospital Admission (HL) for Respiratory Failure 63. Emergency Room Visit (ER) for Respiratory Failure 64. Outpatient Visit (OV) for Respiratory Failure 65. Hospital Admission (HL) for Chronic Myeloid Leukemia 66. Emergency Room Visit (ER) for Chronic Myeloid Leukemia 67. Outpatient Visit (OV) for Chronic Myeloid Leukemia 68. Presence and Number of Visits by Home Health Care Services 69. Number of Dietitian Visits 70. Number of Physical Therapy Visits 71. General +/- Diabetes-linked Hospital Admission (HL) 72. General +/- Diabetes-linked Emergency Room (ER) Visit 73. General +/- Diabetes-linked Outpatient (OV) Visit 74. Miscellaneous Medical Events (all other events not defined above) According to another exemplary embodiment of the present invention, the following laboratory variables are also examined: Laboratory CPT Codes: 80061 Lipid panel 82465 Total serum cholesterol 83718 Lipoprot and HDL 83721 LDL Chol 83719 VLDL Chol 84478 Triglycerides 83525 Insulin 84681x5 C-peptide 82497x5 Glucose 81000-005 Urinalysis 82042-44 Urine albumin and microalbumin 84520-25 BUN 82565 and 575 Creatinine 82947-50 Glucose 82962 Glucose by monitoring device 82951-52 GTT 84206 Proinsulin 83525 Insulin total 83527 Insulin free 80434-35 Insulin tolerance test 86337 Insulin antibodies 83036 Glycated hemoglobin 86431 Islet cell antibody Exemplary dependent variables contemplated for use with the present invention as results to be predicted include: 1. Hospitalization (HL) for diabetes. This is a dichotomous variable which is referred to as the HL indicator such that HL is typically assigned a value according to the following scheme: Value of dependent variable HL Meaning 0 No diabetes hospitalization in the prediction period 1 One diabetes hospitalization in

prediction period 2. Change in cost-Absolute 3. Change in cost-Relative The ratio of these two costs provides a prediction of risk.

Although only three dependent variables are listed above, as those of ordinary skill in the art will appreciate, other known or yet unknown variables consistent with the goals of the present invention may also suitably serve as a dependent variable within the scope of the present invention.

Turning to the time window aspect of the generation of the analysis file, it should be noted that there is one analysis record for each selected member.

In the present invention, a scheme, as described below, has been developed for defining prediction zones and censoring data to create the analysis file. That is, referring to Figure 6A, a time window basically defines a prediction zone or region 610 and an events window (analysis region) 612 from where activity is used to predict something in the prediction zone.

According to an exemplary embodiment of the present invention, the time window is defined between 12 months preceding the specified events and 12 months following the specified events. In the case of patients without specified events, all time is preferably considered. Duration of observation will be an analysis covariate. As those skilled in the art will appreciate, additional or alternative time window schemes may also adequately serve the present invention.

For purposes of explanation, the time that the claims history covers is referred to as the time window that starts at some point'A'and ends at point'C'. The time interval is divided into analysis and prediction regions by point'B'such that A<B<C. That is to say, B'represents the present.'A' represents the farthest past event and'C'represents the farthest future event.

By way of example, Jane Doe's analysis record is based on claims from 1/1/91 through 6/30/93. Therefore, A=1/1/91, C=6/30/93 and B can be selected somewhere in between, such as 12/31/92. Generally, A is

defined based on the data extraction protocol (i. e., from when the data is available) and C is defined by the last day for which the member is still enrolled and eligible for the benefits. Of course, variations of those general points of definition could be selected within the scope of the present invention.

The definition of the present instant B is important. In the subject invention, two basic definitions of B were devised in order to maximize the accuracy of the prediction model. Although, as would be understood by those skilled in the art, alternative definitions of B may also be used.

Figure 6B is an exemplary time window scheme, referred to as Scheme 1, for use in processing the data from the event level files shown in Figure 4.

In Scheme 1, the event prediction region is set from B to C such that B=C- (x# of months) for all the members in the analysis. For example, if a 6-month diabetes hospitalization (HL) model (i. e., HL is used as a dependent variable) is to be built then B=C- (6 months). In Jane Doe's example, B would equal 12/31/92. Therefore, only data covering from A through B (1/1/91-12/31/92) is used to predict the hospitalization for diabetes in the'next 6 months'. The phrase'next 6 months'in this context implies that the time point B is"NOW"and any time after it is in the FUTURE and any time before it is in the PAST. This is a key concept of Scheme 1 and is important to understanding the prediction model implementation and application.

Therefore, given a selected time window scheme and an appropriate set of predetermined variables, the processing step of 122 generates the analysis file.

Using the analysis file, the model for identification/prediction can then be developed in various ways using statistical techniques. In particular, the analysis file, now at a member level, is processed using statistical functions available in SAS. In the exemplary embodiment of the present invention, the statistical processing performed to generate the

prediction model is multiple logistic regression. As will be appreciated by those skilled in the art, other statistical techniques may also be suitable for use with the present invention.

In the exemplary embodiment, the statistical processing, when applied to the analysis file, identifies variables which meet predetermined levels of significance (e. g., probability value < 0. 05). These variables then form a prediction model which is a mathematical equation of the following form: Logit (p) = a + bxl + cx2... + zxi where xl... xi are the identified variables and a... z are their parameter estimates. An individuals probability (P) for the outcome under consideration is then determined using the following formula: p = e-logit (p)/ (i+e-logit (p)).

Using the above steps, several models were constructed, three of which relate to hospitalization, and three of which relate to relative change in cost. These six models were built to predict: (1) future hospitalization among all diabetics within the cohort, (2) future hospitalization among diabetics with no hospitalizations during the analysis window, (3) future hospitalization among diabetics with a hospitalization during the analysis window, (4) relative change in total health care costs among all diabetics in the cohort, (5) relative change in total health care costs among diabetics with no hospitalizations of any kind during the analysis window, and (6) relative change in total health care costs among diabetics with a hospitalization during the analysis window. The first model, model (1) is the preferred model.

According to the exemplary embodiment, the total number of diabetics analyzed in model building was 6858, from which up to 7% of extreme outliers were eliminated. Females comprised 45.2% of the cohort, and the mean age was 51 years. 78% of the cohort was 40 to 69 years of age, of which 30% were 50 to 59, and 25 % were 60 to 69 years old. Three percent were age 10-19 and 4% were age 20 to 29. Five United Health Care (UHC)

plans contributed members to the cohort: Atlanta (12%), Minneapolis (48%), Providence (25%), Salt Lake City (5%) and St. Louis (10%).

Of the total patient population, 72% of cohort members had their last diabetic event within one month, indicating that these patients are highly active with respect to health care utilization. Eighteen percent of the remainder had their last event within 2 to 3 months. For the average cohort member, 35 months had transpired since their first diabetic event. Therefore, the informational power of our database is strong insofar as we can capture an average of 3 years of observation on diabetics.

Among the members of the cohort within these plans, 549 experienced hospitalization specifically for diabetes and there were 736 episodes of hospitalization. Other major categories of hospitalizations occurring within the 12 month analysis window included: diabetes-related conditions (1059); cardiovascular disease (813); and unrelated causes (782).

The total number of hospitalizations among the cohort was 2807. Among the diabetes-related conditions, cardiovascular disease was predominant.

24,852 Outpatient doctor visits were made for diabetes and 18,058 were made for diabetes-related conditions. Thus, the average diabetic visits a doctor's office quarterly for their disease. Roughly 10 times as many visits were made to the emergency room for non-diabetic complaints (3332) as for diabetes (365), suggesting that diabetics are seeking much ER care for complaints that may not be classified or claimed as diabetes-related.

The most frequent prescriptions per member per year were for insulin (5. 8), oral hypoglycemics (4. 1), anti-hypertensives (3. 3), and cardiovascular medications (2. 2). Unrelated medications accounted for 10.7 prescriptions per member. The average cohort member had 2.3 prescriptions per month and 28.1 per year. The percentage of members who claimed at least one prescription over the 12 month analysis period by agent was: oral hypoglycemic only 44%, insulin only 40%, both agents 6%; neither agent 10%.

Diabetes hospitalization and outpatients visits contributed greatly to total cost of diabetes care, which in turn comprised 9.6% of total plan costs. Other leading diabetes-related costs include cardiovascular care (14.6% of total plan costs), renal care (2. 4% of total costs and likely underestimated because of Medicare coverage), and possibly diabetes-related care (6. 6% of total costs). Unrelated health care and pharmaceuticals comprised 47. 1 % of the plans'costs for this cohort of diabetic patients over one year.

The results of the six model building exercises, with the variables identified as predictive of outcome, are presented below. Those skilled in the art will recognize, that in the following presentation of results, the use of the term"predictors"defines a correlational association and not necessarily a causal relationship between a variable and an outcome. Any causal linkages are derived from inference.

Model 1: Predictors of Hospitalization Among Diabetics (Total Cohort) The first model was based on all commercial members and using the HL indicator as a dependent variable. The results of this model are listed below. The resulting independent variables, most likely to predict an adverse health outcome, were 1) previous diabetes hospitalization, 2) previous diabetes-related hospitalization, 3) any ER visit during analysis period, 4) visits to cardiologist, 5) insulin usage during last 12 months, 6) number of months since first diabetes event, 7) age between 10 and 19 years, 8) age between 60 and 69 years, 9) membership in Atlanta plan, 10) membership in Providence plan, 11) use of neuropathy pharmacotherapy, 12) number of radiological procedures, and 13) non-use of oral hypoglycemic during last 12 months.

MODEL 1: DIABETES HL MODEL (MDL~HL2B) : OVERALL NON DX-ONLY SUBSET Parameter Variable Description - 3. 37201760 (n/a) INTERCEPT +0.71577821 DIHL # DIABETES HL +0.16789085 DIPROC6 DIABETES DR VISITS: OPHTHALMOLOGIST +0.03941667 VADR VASCULAR DR VISITS +0.06171879 RXNE RX: DIABETES NEUROPATHIC +0.00619757 DIAB~AGE # MONTHS SINCE 1ST DIABETES EVENT +0.42563967 ANY~ER ANY ER IN ANALYSIS PERIOD +0.38572138 ATL ATLANTA (UHC HMO) +0.40988992 PVD PROVIDENCE (UHC HMO)

+0.58671938 AGE1019 10 - 19 AGE CATEGORY +0.35911567 AGE6069 60 - 69 AGE CATEGORY +0.20357855 DRHL DIABETES RELATED HL -. 520538560 DIRXGRP1 ORAL ONLY RX USE (12MOS) +0.37256134 DIRXGRP3 INSULIN ONLY RX USE (12MOS) The first model was driven primarily by a history of hospitalization during the analysis window. Prior hospitalization for diabetes or a diabetes-related condition probably reflects severity of disease, as does any ER visit, number of radiological procedures and outpatient visits to a cardiologist during the analysis period. However, part of the effect of prior hospitalization may also relate to physician and/or patient amenability to a second or subsequent hospital admission. That is, once the threshold of hospitalization is first crossed in the treatment of a particular clinical problem or patient, either the physician or the patient may be more willing to be hospitalized subsequently.

Use of neuropathy pharmacotherapy, that is, agents to manage the pain of diabetic neuropathy, was predictive of hospitalization. This association may indicate that severe neuropathy warranting multiple treatment courses also creates the possibility of other complications, such as chronic skin ulceration, skin infection, or amputation that eventually require hospitalization. Neuropathy treatment may co-exist with poor limb care, and together may portend future clinical exacerbation and need for admission.

Use of insulin only over the prior 12 months was also predictive.

This again may reflect severity, however, another possible explanation could involve problems with actual insulin use. For example, over control of glucose may produce hypoglycemia necessitating hospital admission.

Oral hypoglycemic use during the prior 12 months was a negative predictor, i. e. associated with less frequent hospitalization, which is a mirror image of insulin use and likely a reflection of disease severity.

Number of months since first diabetes event probably reflects the chronic nature of the disease.

Plan membership (Atlanta, Providence) probably reflects variation in local conditions. It is possible that the epidemiological pattern of diabetes in these plans includes patients with greater disease severity and at

greater consequent risk for hospitalization. Demography may also contribute, with these plans having higher proportions of ethnic and racial groups at particularly elevated risk for diabetes, such as Native Americans or African Americans. Variation in institutional practice may also contribute to plan membership as a predictor. For example, outpatient care for diabetics may be less aggressive or comprehensive in these plans, which could in turn result in a diminished ability to prevent clinical exacerbations that require hospital admission. Physician amenability or willingness to hospitalize their diabetic patients for a particular complication may also be greater in these plans for clinical or plan-related reasons.

Age was the other major demographic variable found to be predictive. The 10-19 year olds in the cohort are most likely to be suffering from insulin-dependent diabetes (IDDM). It is well established that in IDDM it is harder to exert good control of blood glucose, and these individuals may be at an elevated risk for inadequate (or excessively aggressive) glucose control that produces admissions for hypoglycemia, ketoacidosis, etc. From a psychosocial point of view, this age stratum is comprised of adolescents, a group whose sense of invulnerability, independence, and resistance to authority is legend. These age-related behavioral traits may be producing a high degree of non-compliance with therapy, with numerous complications and hospitalizations ensuing accordingly.

On the other hand, the 60-69 year olds are in the first decade of life where a broad range of chronic disease sequelae, and diabetes-related sequelae, begin to be manifest overtly and complexly. The contribution of 60- 69 year olds to the predictive ability of the model is therefore not surprising.

However, it is also possible that physical morbidity co-exists with psychosocial risk, as this is also the decade of life when the rate of single living or unaccompanied habitation increases as the elderly become widows and widowers. Perhaps the sudden absence of a supportive and care providing spouse exacerbates the baseline psychological risk that diabetes sequelae present during this decade of life.

When risk stratification is performed on the model: (i) the top decile accounts for 45% of all hospitalizations for the total cohort during the prediction window, (ii) the top 2 deciles account for almost 60% of all hospitalizations in this model, (iii) for total health care costs in the total cohort, the top decile accounts for 26% of all health care expenditures, (iv) and the top decile accounts for 29% of diabetes-specific costs. These findings indicate that this model provides a powerful risk stratification of patients.

Model 2: Predictors of Hospitalization Among Diabetics With No Hospitalization of Any Kind During the Analysis Window The second model was also based on all commercial members with no prior DIABETES hospitalization and using the HL indicator as a dependent variable. The results of this model are listed below. The resulting independent variables, most likely to predict an adverse health outcome, were: 1) visits to doctor for skin care, 2) visits to nephrologist, 3) total emergency room visits, 4) age between 60 and 69 years, 5) age between 10 and 19 years, 6) number of radiological procedures, 7) number of months since first diabetes event, 8) membership in Providence plan and 9) non-use of oral hypoglycemics.

MODEL 2: DIABETES HL MODEL (MDL~HLOB) : NO PRIOR ANY HL NON DX-ONLY SUBSET Parameter Variable Description - 3. 1221429 (n/a) INTERCEPT +0.34685222 DIDR5 DIABETES DR VISITS: NEPHROLOGIST +0.29097326 DIPROC6 DIABETES PROCS: RADIOLOGY +0.11753116 SKDR SKIN DR VISITS +0.19895382 SUM~ER TOTAL ER COUNT +0.00809441 DIAB~AGE # MONTHS SINCE 1ST DIABETES EVENT +0.31762767 PVD PROVIDENCE +0.65546484 AGE1019 10 - 19 AGE CATEGORY +0.42376983 AGE6069 60 - 69 AGE CATEGORY -. 767079160 DIRXGRP1 ORAL ONLY RX USE (12MOS) The overall model for the full cohort was driven by prior hospitalization, whereas the model for this subgroup of the cohort, lacking a hospitalization during the analysis window, is not.

Except for the variables: 1) outpatient visits to a doctor for skin care, and 2) visits to a nephrologist, the predictors observed in this model in the model for the entire cohort. The persistence of these variables indicates a high degree of validity for the respective models.

The second variable, "visits to a nephrologist"is desirably interpreted with caution. A systematic loss of data with respect to renal care probably affects the data on which the model is built because Medicare provides coverage for renal dialysis and transplantation. Thus, any such clinical intervention is probably not reflected in the claims data. Even so, however, this bias would tend to be conservative and favor a lesser predictive ability, and the presence of visits to nephrologist among the predictive variables may reflect the dramatic"all-or-none"nature of clinical exacerbations that result from renal failure.

"Visits to a doctor for skin care"is a predictor that may be associated clinically with the neuropathic medications variable in the first model. Severe neuropathy and persistent skin problems are an often neglected, or inadequately addressed area of diabetic care. Improved outpatient and home care for the skin lesions typically associated with neuropathy could prevent subsequent hospitalization and heavy resource use.

When a risk stratification is performed on the second model, the top decile accounts for 23% of all hospitalizations for members of the cohort with no hospitalization during the analysis window. For total health care costs, the top decile accounts for 16% of health care expenditures. The top decile accounts for 17% of all diabetes-specific costs.

Model 3: Predictors of Hospitalization Among Diabetics With a Hospitalization During the Analysis Window A third model focused on Medicaid membership and uses the HL indicator as a dependent variable. The results of this model are listed below. The resulting independent variables, most likely to predict an adverse health outcome, were found to be 1) total number of hospitalizations, 2) prescriptions for neuropathy pharmacotherapy, 3) Insulin only use over the

last 12 months, 4) any ER visit during the analysis period, and 5) not having unrelated hospitalizations.

MODEL 3 : DIABETES HL MODEL (MDL~HL1B) : 1+ PRIOR ANY HL NON DX-ONLY SUBSET Parameter Variable Description - 2. 9547589 (n/a) INTERCEPT +0.08763124 RXNE RX: DIABETES NEUROPATHIC -. 280028930 ZZHL UNRELATED HL +0.34949717 SUM~HL TOTAL HL COUNT +0.45170745 ANY~ER ANY ER IN ANALYSIS PERIOD +0.87786381 DIRXGRP3 INSULIN ONLY RX USE (12MOS) The above predictors, except for unrelated hospitalizations, have been observed previously in other models. Unrelated hospitalizations is new as a negative predictor and typically reflects the fact that patients are well enough (from the diabetes standpoint) to be hospitalized for another unrelated condition without a specific coding listed for diabetes on each unrelated admission. Alternatively, it is possible that these patients are prepared clinically for their diabetic needs during these unrelated admissions, which in turn prevents subsequent hospitalizations that are coded as caused by or related to diabetes.

Risk stratification on the fourth model finds the top decile accounting for 37% of all hospitalizations and the top 2 deciles accounting for almost 60% of all hospitalizations in the prediction window. For total health care costs, the top decile accounts for 23% of all health care expenditures.

The top decile accounts for 27% of diabetes-specific costs.

Model 4 - Predictors of Relative Change in Cost Among Diabetics (Total Cohort) A fourth model also focused on Medicaid membership and uses the HL indicator as a dependent variable. The results of this model are listed below. The resulting independent variables, most likely to predict an adverse health outcome, were found to be 1) age, 2) female gender, 3) last diabetes event within 1 month, 4) visits to nephrologist, and the following independent variables as negative predictors: 5) any hospitalization during analysis period, 6) unrelated doctor visits, 7) number of months since any event, 8) visits to doctor for care of neuropathy, 9) any ER visit during the analysis period, 10) hospitalization LOS for conditions contributory to diabetes, 11) use of oral hypoglycemics, 12) prescriptions of drugs for diabetes sequelae, and 13) ophthalmic procedures.

MODEL 4 : RELATIVE $ CHANGE MODEL (MDL~CH2B) : OVERALL NON DX-ONLY SUBSET Parameter Variable Description +0.04885322 (n/a) INTERCEPT +0.00271857 AGE AGE AT TIME OF CUTOFF +0.04284061 FEMALE FEMALE INDICATOR +0.05878239 DIDR5 DIABETES DR VISITS : NEPHROLOGIST -. 013322540 DIPROC5 DIABETES PROCS: EYE -. 034616290 DIPROC6 DIABETES PROCS: RADIOLOGY -. 016228130 NEDR NERVE DR VISITS -. 007130950 RXHY RX: ORAL HYPOPGLYCEMIC -. 006677660 ZZDR UNRELATED DR VISITS -. 402889850 ANY~HL ANY HL IN ANALYSIS PERIOD -. 045917130 RECENCY # MONTHS SINCE ANY EVENT -. 105892950 DCLOS~SR CONTRIBUTORY DIABETES HL LOS (SQRT) +0.00205962 DIRX2 RELATED CONDITIONS RX These predictors indicate theevents that typically drive year-to- year variance in costs. The variables predict or negatively predict changes in cost over time, in this case one year. The negative predictors predict cost reduction over time. This model illustrates the variance of the cost changes by considering specific events. The model asks whether members can be identified who have a high relative change in costs with or without high risk.

A change-in-cost model permits: (i) the assessment of whether there are opportunities for disease management intervention, and (ii) the evaluation of such interventions.

Model 5: Predictors of Relative Change in Cost Among Diabetics With No Hospitalization of Any Kind During the Analysis Window A fifth model focused on Medicaid members and using the HL indicator as a dependent variable. The results of this model are listed below.

The resulting independent variables, most likely to predict an adverse health outcome, were 1) age, 2) female gender, and the following independent

variables as negative predictors: 3) number of months since any event, 4) use of oral hypoglycemics, 5) ophthalmic procedures, 6) radiological procedures, 7) unrelated doctor visits, and 8) unrelated emergency room visits. In this fifth experiment, the negative predictors are typically proxies for relative cost reduction.

MODEL 5: RELATIVE Dollar CHANGE MODEL (MDL~CHOB) : NO PRIOR ANY HL NON DX-ONLY SUBSET Parameter Variable Description +0.07783285 (n/a) INTERCEPT +0.00293687 AGE AGE AT TIME OF CUTOFF +0.04927050 FEMALE FEMALE INDICATOR -. 017085150 DIPROC5 DIABETES PROCS: EYE -. 047355500 DIPROC6 DIABETES PROCS: RADIOLOGY -. 007339420 RXHY RX: ORAL HYPOPGLYCEMIC -. 064964790 ZZER UNRELATED ER -. 008264700 ZZDR UNRELATED DR VISITS -. 043964180 RECENCY # MONTHS SINCE ANY EVENT Model 6: Predictors of Relative Change in Cost Among Diabetics With A Hospitalization During the Analysis Window A sixth model focused on Medicaid members and using the HL indicator as a dependent variable. The results of this model are listed below.

The resulting independent variables, most likely to predict an adverse health outcome, were 1) pharmacotherapy for related conditions, 2) visits to nephrologists, 3) total ER visits, and the following independent variables as negative predictors: 4) last event within 2-3 months, 5) cardiovascular hospitalization length of stay (LOS), 6) unrelated hospitalization LOS, and 7) hospital LOS for conditions possibly contributing to diabetes. The negative predictors likely reflect previous severity and resource utilization.

MODEL 6: RELATIVE $ CHANGE MODEL (MDL~CH1B) : 1+ PRIOR ANY HL NON DX- ONLY SUBSET Parameter Variable Description -. 443774930 (n/a) INTERCEPT +0.11070908 DIDR5 DIABETES DR VISITS : NEPHROLOGIST -. 007068960 VALOS VASCULAR HL LENGTH OF STAY -. 006497110 ZZLOS UNRELATED HL LENGTH OF STAY

+0.02819513 SUM~ER TOTAL ER COUNT -. 376527890 ACT091A LAST EVENT W/IN 2-3 MONTHS -. 019113950 DCLOS CONTRIBUTORY DIABETES HL LOS +0.04680708 DIRX2SR RELATED CONDITIONS RX (SQRT) It should be noted that each of the models uses a different number of independent variables; and, depending on the precision of the models desired, more or fewer independent variables may be used based on their individual ability to accurately predict the selected dependent variable.

Next, the determined model is applied to the data. The determined model can be applied to the existing data, to the data as it is regularly updated or to other claims databases for other benefits providers. To do so, only the determined independent variables of interest need to be processed. Of course, as new claims databases are to be analyzed, the entire process can be repeated to generate a new model in order to determine if other variables may be better predictors. The output generated by applying the model is a file containing a list of all of the diabetes patients ordered by an indicator representative of the likelihood that that patient will have an adverse health outcome (i. e., experience that is defined by the dependent variable).

This list can then be divided into subgroups such as in 5% or 10% increments of patients likely to have the adverse health outcome.

Model performance can now be assessed by determining the number of actual adverse health outcomes occurring in the prediction window for each 5% or 10% subgroup.

Applying the model to future claims data or other databases of diabetes patients or building a new model in a new database as described above, the following items may be identified: 1) diabetes patients who consume high cost resources, 2) resources commonly used by diabetes patients, 3) the differences between relatively high-cost diabetes patients and other (low-cost) patients, and 4) the resources used by relatively high-cost diabetes patients versus the resources used by other (low-cost) diabetes patients, Identifying diabetes patients at high risk allows for various types of intervention to maximize the effective allocation of health care resources for diabetes patients. Such intervention may take the form of: 1) specific case management, 2) novel interventions based on subgroup characteristics, 3) high risk intervention, 4) high (relative) cost intervention, or 5) plan modification, all adhering to the best practice guidelines.

According to another exemplary embodiment of the present invention, a computer-implemented method for identifying at risk patients diagnosed with diabetes uses information about patients existing in a claims database to: a) process, based on predetermined criteria, the patient information in the claims database to extract claims information for a group of diabetes patients; b) define a set of events relevant to diabetes using the information available in the claims database; c) convert the extracted claims information and the defined events into files containing event level information; d) define a time window for providing a time frame from which to judge whether specific ones of the defined events should be considered in subsequent processing; e) defining a first set of variables as potential predictors; f) defining a second set of variables as potential predictors; g) processing the event level information, using the time window and the first set of variables, to generate a first analysis file; h) processing the event level information, using the time window and the second set of variables, to generate a second analysis file; i) performing statistical analysis on the first analysis file to generate a first prediction model to predict a relative change in health care costs among at risk patients diagnosed with diabetes, said first

prediction model using a subset of the first set of variables; j) performing statistical analysis on the second analysis file to generate a second prediction model to predict future hospitalization among at risk patients diagnosed with diabetes, said second prediction model using a subset of the second set of variables; k) performing risk stratifications for the first prediction model and the second prediction model, respectively; and 1) determine a combined risk stratification as the intersection of the risk stratification for the first prediction model and the risk stratification for the second prediction model.

According to another exemplary embodiment of the present invention, integrated and independent analyses of an clinical laboratory claims database is performed with respect to diabetes. The panels and profiles of interest to the analytic include the Diabetes Assessment Profile (DAP), the Diabetes Monitoring Evaluation I (DME I), and the Diabetes Monitoring Evaluation II (DME II).

DAP includes c-peptide, fasting glucose, glycosylated albumin, hemoglobin, Alc, insulin, urine microalbumin, and potassium. DME I includes fasting glucose, hemoglobin Ale, and potassium. DME II includes DME I plus glycosylated albumin, and urine microalbumin.

Up to 50% of diabetes patients are believed to remain undiagnosed. Therefore, in any reasonably random and representative database, the number of pathogenetically diabetic individuals is typically double the number of those determined on the basis of ICD diagnostic, procedure, and pharmacy criteria.

Thus, a substantial population of individuals exist within any plan who could demonstrate subclinically or clinically evolving diabetics disease without current physician detection. By definition, these individuals are not likely to have glycated hemoglobin values in the SCBL database, nor are they likely to receive the Diabetes Assessment Profile or Monitoring Evaluations. However, captures the glucose status of patients through a multiplicity of other clinical laboratory measurements, including: Panel/Profile Name Test Code Chemzyme 7962 Chemzyme Plus 7030 Multi-chem 8888 Multi-chem II 8999 Wellness Profile #1 6792 Wellness Profile #70 6420 Wellness Profile #72 7222 Wellness Profile #75 8767 Wellness Profile #76 8771 Wellness Profile #77 8772 Wellness Profile #78 7840 Wellness Profile #79 7889 Wellness Profile #80 7004 Wellness Profile #82 7015 Wellness Profile #83 7076 Wellness Profile #85 7269 It should be possible to analyze clinical laboratory aggregate data and model serial glucose lab values to assess if certain patients are pathogenetically in evolution towards diabetes or at risk for diabetes.

Appendices I-VII follow.

Appendix I : 250 Diabetes mellitus 250.0 Diabetes mellitus without mention of complications 250.1 Diabetes with ketoacidosis 250.2 Diabetes with hyperosmolarity (non- ketotic) 250.3 Diabetes with other coma (ketoacidosis, hypoglycemic, insulin) 250.4 Diabetes with renal manifestations 250.5 Diabetes with opthalmic manifestations 250.6 Diabetes with neurological manifestations 250.7 Diabetes with peripheral circulatory disorders 250.8 Diabetes with other specified manifestations (hypoglycemia, hypoglycemic shock) 250.9 Diabetes with unspecified complications The following 5th digit sub- classifications will be used with the above: 0 - Type II or unspecified type, not stated as 1 - Type I, not stated as uncontrolled 2 - Type II, or unspecified type, uncontrolled 3 - Type I, uncontrolled 357.2 Polyneuropathy in diabetes 362. 0x Diabetic retinopathy 366.41 Diabetic cataract 648.0 Pre-existing diabetes in child delivery Other code: J1820, Injection, Insulin, up to 100 units Excluded IC-9 Codes 648.8 abnormal glucose tolerance test in pregnancy 790.2 abnormal glucose tolerance test 962.3 poisoning by insulin or antidiabetic agency Appendix II :

Diseases of the Eye 369. xx Blindness (either eye, both eyes) 362.83 Retinal edema 365.44 Glaucoma in diabetes 366.41 Diabetic cataract 369. 0 - 08 Profound impairment, both eyes 369.4 Legal blindness Nerve Disorders 337.1 Peripheral autonomic neuropathy in diseases classified elsewhere 354. x Mononeuritis of upper limb (carpal tunnel syndrome 357.2 Polyneuropathy in Diabetes 358.3 Amyotrophy/myoneural disorder 607.84 Impotence, organic origin 713.5 Arthropathy associated with neurologic disorders 729.2 Acute neuralgia/neuritis Skin and Foot Infection Disorders 680. x Carbuncle and furuncle (any site) 681. xx Cellulitis or abcess of finger and toe 682. x Cellulitis and abcess (any location except finger and toe) 686.9 Unspecified local infection of skin, subcutaneous 707.1 Chronic ulcer of skin, unspecified site 707.2 Chronic ulcer of lower limb, except decubitis 707.3 Chronic ulcer of other specified sites 707.4 Chronic ulcer of skin, unspecified site 707.5 Gangrene/gangrene in diabetes, any site (i. e. necrosis) 707.8 Chronic ulcer of other specified sites 707.9 Chronic ulcer of skin, unspecified site 728.89 Infection muscle, NEC 730. 0x Acute osteomyelitis 730. 1 x Chronic osteomyelitis 730.2x Osteomyelitis, unspecified 785.10 Gangrene/gangrene in diabetes, any site Other Infection: 482.9 Bacterial pneumonia 481 Lobar pneumonia 480.9 Viral pneumonia 486 Pneumonia, organism unspecified 487.0 Influenza with pneumonia 599.0 Urinary tract infection Vascular/Circulatory/Cardiac : 250.7 Peripheral angiopathy 250.8 Peripheral vascular disease, unspecified 401. x Essential hypertension 402. xx Hypertensive heart disease with and without congestive heart failure 403. x Hypertensive renal disease 404. x Hypertensive heart and renal disease 405. x Secondary hypertension 410. xx Acute myocardial infarction 411. xx Other acute and subacute forms of ischemic heart disease 412 Old myocardial infarction 413. x Angina pectoris 414. xx Other forms of IHD: . 0 - Coronary atherosclerosis . lx - Aneurysm of heart . 8 - Other specified IHD . 9 - Chronic IHD, unspecified 427 Cardiac dysrhythmia 428. x Heart failure . 0 - Congestive heart failure . 1 - Left heart failure . 9 - Heart failure 431 Intracerebral hemorrhage 432. xx Other and unspecified intracranial hemorrhage 433. xx Occlusion and stenosis of precerebral arteries 434. xx Occlusion of cerebral arteries 435. x Transient cerebral ischemia (includes stroke in evolution, progressive) 436 Acute, but ill-defined cerebrovascular disease (includes stroke, CVA) 437. x Other and ill-defined CVD 440. x Atherosclerosis: . 0 - of aorta . 1 - of renal artery

. 2x - of extremities . 8 - of other specified arteries . 9 - generalized and unspecified atherosclerosis 443. 81 Peripheral angiopathy 443.82 Peripheral vascular diseases, unspecified 276. x Cardiac dysrhythmia Renal Disorders: 580 Acute glomerulonephritis 583.5 Nephropathy in diseases specified elsewhere (i. e., diabetes) 581.6 Nephrosis, intercapillary glomerulosclerosis in diseases specified elsewhere 581. xx Nephrotic syndrome 582. xx Chronic glomerulonephritis 583. xx Nephritis and nephropathy not specified as acute or chronic 584. x Acute renal failure 585 Chronic renal failure 586 Renal failure, unspecified Hospital Admissions for Depression : 296.2x Major depressive disorder, single episode 296.3x Major depressive disorder, recurrent episode 296. 5x Bipolar affective disorder, depressed 296.82 Atypical depressive disorder 300.4x Neurotic depression (dysthymia) 311. xx Depressive disorder, NEC Appendix III: 2780x Obesity 272xx Lipid disorders 305lu Alcohol abuse 303xx Alcohol dependence syndrome

305lu Tobacco use disorder V 11. 3 Personal history of mental disorder, alcoholism Appendix IV: 600 hyperplasia of prostate 722.10 lumbar disc displacement 174. x Neoplasm, breast, malignant primary 196.81 Neoplasm, breast, malignant secondary 233.0 Neoplasm, breast, carcinoma in situ 217 Neoplasm, breast, benign 238.3 Neoplasm, breast, uncertain behavior 239.3 Neoplasm, breast, unspecified 205.01 Chronic Myeloid Leukemia Appendix V: 789.0 Pain, abdominal 536.8 Pain, stomach 787.3 Pain, intestinal 518.81 Respiratory Failure 786 Symptoms of respiratory systems and/or chest 276 Disorder of fluids, electrolytes, and acid-base balance V65 Consultation and complaint w/o sickness 780 General symptoms 996 Complications peculiar to certain specified procedures V72 Special investigations and examinations Appendix VI: 99341-9353 (CPT code) home health care V65.3 dietary care 93.39 physical therapy

Appendix VII: DPS Formularv Codes: 8.1. 1 insulin prescription 8.1. 2 oral hypoglycemic prescription 4.3, 4.5. 1,4. 5.4, 4.5. 6 Anti-hypertensive drugs 4. 8 Lipid-lowering drugs 4.1, 4.2, 4.6 Cardiovascular drugs 5.5 Antidepressants 5.8, 5.4 Phenothiazines (Fluphenazine, Carbamazepine) 9.3 Metoclopramide

Previous Patent: AUTOMATED 3D SCENE SCANNING FROM MOTION IMAGES

Next Patent: THEFT DETERRENT REPOSITORY FOR SECURITY CONTROLLED DEVICES