Advanced analytics using non-medical data may hold the key to a more nuanced understanding of high-risk patient populations, enabling providers and plans to craft targeted interventions that improve care quality while controlling costs.
This report describes an analytical approach to segmenting the Medicare-eligible population that is substantially more effective in identifying attributes associated with dual-eligible status than a traditional, unsegmented approach.
There has never been a better time for health care organizations to understand how to efficiently and effectively leverage data analytics to improve care and lower costs. The aging US population, the increased prevalence of chronic disease, and the emergence of payment models that reward value over volume are all driving health care stakeholders to operate more efficiently. At the same time, the tools and know-how to perform analytics, even on very large data sets, have been proliferating. Marrying analytics with the information that health care organizations collect—or could collect—can allow providers and plans to understand their patients and beneficiaries in far more detail than has hitherto been feasible. This, in turn, could create the opportunity to more closely tailor care to the needs of specific patient or beneficiary populations, improving the quality of care while helping to control costs.
The imperative to improve care and lower costs is especially urgent for organizations serving vulnerable, high-risk groups such as “dual eligibles”—individuals who are eligible for both Medicare and Medicaid coverage. Dual eligibles are among the health care system’s most sick, most complex, and most expensive consumers, and research underscores the importance of “establishing well-targeted interventions for specific subsets of the dual-eligible population in order to reduce unnecessary hospitalizations and potentially achieve savings.”1
This report describes an analytical approach to segmenting the Medicare-eligible population that is substantially more effective in identifying attributes associated with dual-eligible status than a traditional, unsegmented approach. Central to the approach is the use of sociodemographic, lifestyle, and other non-medical data that health plans may collect but do not regularly use to segment or describe patient populations. Health plans could use an expanded version of this approach to target distinct subsets of dual eligibles for care interventions.
While much of the cost of caring for dual eligibles can be attributed to their low income and high health care needs, at least some of the cost is thought to be an artifact of ineffective coordination between Medicare and Medicaid services.2 As such, dual eligibles have been the focus of a number of policy initiatives aimed at slowing the growth in this group’s health care costs through better service coordination.
The Affordable Care Act of 2010 included several initiatives focused on dual eligibles. One such initiative, the Financial Alignment Initiative (FAI), is testing two new payment models—capitated payments and managed fee-for-service—for the dual-eligible population through a number of state-based “demonstrations.”3 States participating in the demonstrations are responsible for coordinating Medicare and Medicaid benefits and spending for dual-eligible beneficiaries through contracts with private managed-care plans.4 Among the United States’ 9.6 million dual eligibles,5 1.4 million (15 percent) are getting their care from these demonstrations; of the rest, more than half a million (6 percent) are in non-demonstration managed fee-for-service programs, where the states can share in savings, and the remaining 7.6 million (79 percent) are in traditional fee-for-service Medicare and Medicaid.6 More than 70 unique health plans are participating in the FAI across the country.7
Our analysis of the peer-reviewed and gray literature, along with conversations with industry experts, suggests that few health plans use sociodemographic and lifestyle data in combination with claims data to segment their dual-eligible beneficiaries. (For two examples of exceptions, see the sidebar “Health plans putting analytics into action.”) This may not be surprising, considering the many challenges that can arise in collecting this type of data. One potential barrier is consumers’ concerns about privacy and security: Members may refuse to provide what they may regard as sensitive information about their habits and preferences. Another challenge is that demographic data coming from government agencies or prior health plans or providers may be incomplete or inaccurate when transmitted to a new health plan: Plans often spend a lot of time validating the data received upon enrollment. Finally, one issue specific to dual eligibles is that these individuals, with their low incomes, tend not to generate the shopping pattern and lifestyle analytics information that is being used by many retailers today to understand their customers.
Amerigroup, a wholly owned subsidiary within Anthem, is a managed care company serving duals through Medicaid and Medicare Advantage.8 As part of its care management solutions, the organization uses social workers and nurses for activities such as nursing home community reintegration and assistance in finding affordable and accessible housing.9 To identify, segment, and tailor care coordination programs, the company uses medical and pharmacy claims and demographic data.10
UnitedHealth has invested heavily in its big data strategy, which includes proprietary analytics and the use of other technologies like health information exchanges. With approximately 700 terabytes of structured and unstructured data, the organization runs queries and models data using a variety of tools, such as business intelligence, data mining, and online analytical processing. Future plans include potentially mining social network data and unstructured data from internal applications.11
Perhaps because of such constraints, many plans take relatively few variables into consideration when segmenting their beneficiaries and running predictive analyses. For instance, many plans tend to rely solely on claims data for predicting which enrollees are likely to be high-risk. The traditional method of segmenting Medicare beneficiaries is even simpler: Many plans divide them into just two categories, the under-65 disabled and all others. Yet, as we demonstrate here, gathering and analyzing a combination of nontraditional, non-medical data and medical/pharmacy data can allow for a more detailed picture of an individual’s health and risk that can help health plans more clearly understand care needs.
To demonstrate the potential power of leveraging more information to understand this heterogeneous population, we used data from the Medicare Current Beneficiary Survey (MCBS), an annual nationally representative survey of Medicare beneficiaries administered by the Center for Medicare and Medicaid Services.12 This survey provides detailed information on health care insurance coverage, spending by payment source (Medicare, Medicaid, private insurance, and out-of-pocket spending), and the use of prescription drugs and long-term care services, as well as information on a rich set of socioeconomic and demographic characteristics, such as marital status, living arrangements, education, and income level. It also provides information on survey participants’ Medicare claims. Our dataset included MCBS records for 15,573 individuals, with 19 percent of the sample identified as dual eligibles.
The analysis proceeded in three stages. First, we conducted a cluster analysis to divide the 15,573 MCBS respondents in our dataset into discrete segments based on similarities and differences in their demographic and health status. Then, we looked at the entire set of 15,573 individuals to determine what attributes, as described in the MCBS data, were significantly associated with dual-eligible status. Finally, we examined each of the resulting segments to determine what attributes were significantly associated with dual-eligible status within each segment. The goal was to understand whether looking at each segment independently would allow us to identify attributes associated with dual-eligible status that would have been “hidden” in the analysis of the entire, unsegmented group. The results confirmed that the segment-by-segment analysis did, in fact, identify more attributes connected with dual-eligible status than the analysis of the group as a whole.
The cluster analysis identified five segments of Medicare beneficiaries in the MCBS: “Older and likely to be widowed,” “Young and disabled,” “Older and overweight,” “Older and healthy weight,” and “The survivors.”13 To establish the clusters, we used key demographic and health status variables that we believed would paint the most illustrative demographic profile and would produce meaningful clusters. Table 1 shows the variables that were significant in distinguishing the segments.
Table 1. Variables that established the segments
|Older and likely to be widowed (N=3619)||Young and disabled (N=1894)||Older and overweight (N=4620)||Older and healthy weight (N=2507)||The survivors (N=2933)|
|Body mass index||X||X||X||X|
|Year-over-year change in health status||X|
The segments differed in their sociodemographics, lifestyle variables, health conditions, and health care costs (table 2). Each segment contained roughly 20 percent dual eligibles, except for the “Young and disabled” segment, of whom 71 percent were dual eligibles.
Table 2. Descriptive statistics for each segment
|Older and likely to be widowed (N=3,619)||Young and disabled (N=1,894)||Older and overweight (N=4,620)||Older and healthy weight (N=2,507)||The survivors (N=2,933)||Total (N=15,573)|
|% of dual eligibles in cluster||21%||71%||17%||20%||21%||19%|
|Less or equal to $10,000||11.7%||33.6%||9.4%||11.8%||13.0%||12.8%|
|Greater than $50,000||14.3%||5.6%||22.8%||22.0%||11.4%||18.0%|
|Negative change in health|
|Current or previous smoker|
|Mental condition count|
|2 or more||8.6%||40.3%||6.8%||7.7%||12.1%||10.7%|
|High school graduation or above|
|Physical condition count|
|5 or more||30.3%||12.4%||18.8%||15.1%||26.2%||20.7%|
|Part A reimbursement in calendar year (mean)||$5,216.51||$3,880.17||$2,216.35||$3,088.28||$6,719.14||$3,732.64|
|Part B reimbursement in calendar year (mean)||$3,817.01||$4,070.03||$2,641.12||$2,892.59||$3,830.34||$3,198.11|
|Total health insurance premiums (mean)||$1,132.65||$306.64||$966.51||$958.11||$1,148.42||$967.41|
To identify attributes significantly associated with being a dual eligible, we used logistic regression on both the entire MCBS dataset and on each of the five segments within the MCBS dataset to identify variables that differentiated between the dual-eligible and non-dual-eligible populations. The results, given in terms of odds ratios, are shown in table 3.
An odds ratio is a measure of relationship between an “exposure”—in this case, an attribute such as smoking status, marital status, or diagnosis—and an “outcome”—in this case, being a dual eligible. If the odds ratio is 1, the exposure has no relationship with the outcome; the odds of the outcome occurring given a particular exposure are equal to the odds of the outcome occurring in the absence of that exposure. An odds ratio greater than 1 indicates a positive relationship between the exposure and the outcome; that is, the exposure increases the odds of experiencing the outcome. Conversely, an odds ratio of less than 1 indicates a negative relationship between the exposure and the outcome; the exposure decreases the odds of experiencing the outcome.
Table 3. Odds ratios for variables associated with dual-eligible status at a statistically significant level
|Total (N=15,573)||Older and likely to be widowed (N =3,619)||Young and disabled (N=1,894)||Older and overweight (N=4,620)||Older and healthy weight (N=2,507)||The survivors (N=2,933)|
|Respondent's home: Apartment||2.61||0.44|
|Respondent's home: One-family detached||0.36||0.34||0.50||0.18||0.24|
|Respondent's home: A two-family or duplex home||0.46||3.77||3.00|
|Congestive heart failure||3.25|
|Has had a stroke||3.28|
|History of a psychiatric disorder||1.73||1.82|
|Has trouble hearing||2.02|
|Self-reported health is poor compared to others his/her age||3.32|
|Activities of daily living|
|Tooth loss makes it difficult to eat||3.26|
|Has difficulty eating solid foods because of teeth problems||2.19||4.02||3.38|
|Uses special equipment to eat||23.65|
|History of military service|
|Ever served in the army||0.49||0.38|
|Less than high school education||5.87||2.49||3.16|
|Problem making decisions||2.33|
|Total condition count|
|Total condition count||1.14||1.29||0.83||1.14||1.15||1.20|
A look at table 3 yields two important insights. First, many attributes that are associated with the dual-eligible status at the segment level did not show up as being associated with it at the whole-group level. This indicates that the relationships between these attributes and dual-eligible status can be “masked” in analyses that consider the Medicare-eligible population as a whole (that is, without segmentation). Using a broad brush to paint a picture of dual eligibles, in other words, misses potentially clinically significant nuances that emerge when taking a more finely grained approach.
In fact, our complete analysis showed that more than half of the variables associated with dual-eligible status would likely be missed with an unsegmented approach. Overall, we found 71 variables that were significantly associated with dual-eligible status in either the total sample or in the segments. However, 39 of these variables, or 55 percent, were only apparent when we looked within the segments.14
The second insight is that many of the attributes that are associated with dual-eligible status, especially at the segment level, are attributes that would not typically be assessed on a claims form. Such attributes include not only factors such as an individual’s education level, history of military service, and housing situation, but also factors such as smoking status and difficulty with activities of daily living (ADLs) that would be important in formulating an intervention plan appropriate to the individual. This points out the desirability of capturing and using non-claims data to understand and manage dual eligibles, as claims forms only capture a fraction of the information that plans and providers may find useful.
The odds ratios in table 3 make it possible to develop profiles of the dual-eligible population within each segment of the broader population of Medicare beneficiaries. These profiles, illustrated in figure 1, make it clear that the dual eligibles within each segment look quite different from one another—underscoring the importance of segmentation to obtain a more accurate and useful view of the dual-eligible population.
Our analysis may actually understate the level of insight that health plans could derive from an advanced segmentation approach similar to the one outlined above. In order to retain statistical validity, we limited the number of segments to five in order to preserve a minimum sample size of 1,500 people per segment. Health plans, which often have hundreds of thousands or even millions of customers, could develop even more nuanced profiles than the ones that we outline in this study while still identifying segments large enough to have a business impact.
One challenge that accompanies a rich data source is determining the most salient and useful variables or characteristics to consider when trying to understand a population of interest. For health plans, states, and care management companies, this challenge manifests in the problem of how to most efficiently identify vulnerable populations. How can plans determine whom to reach out to in the first 30 or 90 days? What characteristics are most meaningful when trying to separate highest-risk beneficiaries from the rest?
To answer this question for dual eligibles within our five MCBS segments, we conducted a random forest analysis to identify the five variables most strongly associated with dual-eligible status within each segment. The random forest technique accounts for more complex relationships between variables than does logistic regression, allowing it to detect even more nuanced effects that the logistic regressions may not have been able to uncover, and enabling it to rank variables in order of their importance in predicting the outcome of interest (in this case, dual-eligible status).
To illustrate the type of insights available from this technique, table 4 gives the top five attributes associated with dual-eligible status in the “older and overweight” segment.
Table 4. Ranked list of variables associated with dual-eligible status
|Top five variables associated with dual-eligible status in “older and overweight” segment|
|Respondent’s home: One-family detached|
|Wears eyeglasses or contact lenses|
|Current mental illness|
|Less than high school diploma|
The practical value of performing a random forest analysis is that it can allow health plans to understand what to look for first when identifying populations for potential care interventions. In practice, it can be expensive and time-consuming to gather new information about a given patient population, making it important to identify variables that have the greatest fundamental strength over a wide set of population subsets. Random forests (a cousin of decision trees), fill this gap by performing two key functions: splitting the population into deep segments and explicitly ranking variable worth. Due to the way a random forest works, it avoids simply looking for variables that have the greatest strength when considered in isolation; it helps search for variables that, when in the presence of other variables, may have significant predictive value. By taking such a detailed approach, plans can have increased confidence that the variables they choose to examine will be valuable, not only in an original study of the population of interest, but also in future attempts to combine their newly collected variables in insightful ways.
Our analysis points to two main implications. The first concerns the value of using expanded data to understand patient and beneficiary populations. The analysis described above shows that using psychosocial, lifestyle, behavioral, and other non-medical variables are useful both in segmenting broad populations of beneficiaries and in identifying the distinguishing attributes of certain groups—in this case, dual eligibles—within those segments. For example, wearing eyeglasses, a history of military service, and employment status, along with many other characteristics, are not likely to be found in claims data. Yet we found these attributes to be significantly associated with certain clusters in the Medicare population and associated with dual-eligible status.
Plans or health systems could use non-claims data to supplement claims data, forming the basis for the tailoring of care. Such assessments would need to be collected regularly to be accurate and actionable,15 and they may run into challenges due to privacy and security concerns or other complicating factors. However, these challenges are not insuperable. In fact, one could argue that lifestyle, demographic, and other nontraditional data could be easier to come by for dual eligibles than for other groups of beneficiaries. This is because many FAI demonstrations, as well as non-FAI programs serving dual eligibles, collect detailed data on beneficiaries in order to perform a health risk assessment (see sidebar, “Targeting care through health risk assessments”).
The second implication is that, by using an analytical approach to understanding their dual-eligible beneficiaries, health care organizations can tease out interventions to manage this heterogeneous population. An analytical approach such as the one demonstrated here can offer insights and a method to stratify members for different interventions. Because each segment has unique characteristics associated with being a dual eligible, health care organizations can identify which characteristics are most important to better serve the individuals within the segment. For example, for those segments where social support may be lacking (for instance, those living alone or widowed), that lack of social support may help predict an admission to a nursing home or an acute admission. Some segments are at higher risk for multiple comorbidities. And for healthier segments, interventions that focus on wellness may be just as effective as higher-cost interventions.
In summary: Segmenting the beneficiary population to develop well-targeted, tailored interventions is more likely to yield positive results, in terms of both health and cost outcomes, than taking a one-size-fits-all approach. Analytical methods such as those described here can help organizations effectively segment their Medicare beneficiaries to gain a more finely grained understanding of various subpopulations of dual eligibles. Many organizations have the data to adopt and implement the model we have used. Successfully building and deploying the model and using the information to target and coordinate care could make a meaningful difference in providing better care to this diverse population at a lower cost.
California’s FAI demonstration, the Coordinated Care Initiative (CCI), delivers a full continuum of services to dual eligibles through the capitated payment model. Beneficiaries participating in CCI can access acute, primary, institutional, and home- and community-based services (HCBS). Participating health plans are using health risk assessments (HRAs) to identify beneficiaries’ needs around medical care, behavioral health, chronic conditions, activities of daily living, and other areas. Cal MediConnect plans perform HRAs on high-risk beneficiaries* within 45 days and on low-risk beneficiaries within 90 days of enrollment, identifying high-risk enrollees using a beneficiary’s historic Medicaid and Medicare fee-for-service utilization data.16As of September 2014, 78 percent of Cal MediConnect HRAs were completed within 90 days of enrollment.17
Three health plans are participating in Massachusetts’ FAI demonstration, One Care, which launched in late 2013. Interdisciplinary care teams—which include the beneficiary, a care coordinator, and an independent-living and long-term services and supports coordinator—work together to develop a customized care plan reflective of each beneficiary’s needs and preferences. Approximately 95,000 individuals are eligible for the program, and more than 17,000 were enrolled as of October 2014.18
Within the first 90 days of a beneficiary’s enrollment, One Care plans must conduct a comprehensive assessment of each individual. Beneficiaries may choose the setting in which they prefer to complete the assessment, and family members and other providers can be present. The assessment tool identifies immediate needs and current services, as well as the individual’s health conditions, medications, functional status, and more. This assessment is then turned into the core components of an individualized care plan, which includes steps that the beneficiary and One Care team will take to address goals and concerns.19 The assessment is also used to assign beneficiaries to four rating categories.20
Arizona pulled out of the FAI demonstration program in early 2013, citing slow implementation progress and uncertainty around the future of the program as chief complaints. However, the state will continue to test innovative models of care for its dual eligibles through the Arizona Health Care Cost Containment System (AHCCCS),21 which has provided care to its Medicaid and dual-eligible population through contracts with managed care organizations for more than 30 years.22 All plans in the AHCCCS program use health risk assessments and predictive modeling to target appropriate interventions.
* High-risk beneficiaries are defined as those who are “at increased risk of having adverse health outcomes, or worsening of their health and functional status, or whose health conditions require careful monitoring and coordination of multiple medical, long-term services and supports, or behavioral health services.”23
In order to group our 15,573 records into mutually exclusive groups with similar characteristics, we used the unsupervised K-means clustering analysis technique.24 The iterative algorithm leveraged demographic and health status variables to statistically separate the data into five unique clusters (table 1). Health care spending and utilization data were not used to define or establish the segments, but were used to describe the segments (that is, how cost differs among the segments). Not all variables were significantly different among the populations; while they were included in the model, they did not have a strong impact on the clusters.
In performing our regression analysis on the five different clusters, we chose to divide each cluster into five random but equal parts. This allowed us to run the model five different times for each cluster, using 80 percent of the data to train the model and 20 percent of the data to validate the model. For example, the first iteration of the model used data parts 1, 2, 3, 4 of each cluster to build the model, and data part 5 of each cluster to validate the results. The second iteration of the model used data parts 1, 2, 3, 5 of each cluster to build the model and data part 4 of each cluster to validate the results. This process was repeated until we generated five sets of model coefficients for each cluster. The final model coefficients for each cluster were generated by taking the average of the coefficients across each predictive variable (for example, age, body mass index, marital status, ethnicity, and so on).
Some variables had a high rate of correlation to each other. To deal with these, we used the first principal component of these groups of variables as a way to reduce this problem. For the most part, these did not turn out to be key variables, but they did serve to inform our model selection.
In conjunction with performing a logistic regression on each cluster, we also applied an ensemble method of classification known as a random forest to each cluster. To be consistent with our logistic regression analysis, the exact same division of each cluster into five equal parts was used in fitting the random forests. In other words, the data used in the logistic regression analysis was identical to those of the random forest analysis. However, due to the robust nature of the random forest, the model was only fit once to each cluster using a standard 80/20 split within each cluster for training and validation data, respectively. For each cluster, a random forest was fit using a maximum number of trees ranging anywhere from 50 to 500. Each tree within a random forest was trained using a 60/40 split of the training data for each cluster.
The Deloitte Center for Health Solutions (DCHS) is the research division of Deloitte LLP’s Life Sciences and Health Care practice. The goal of DCHS is to inform stakeholders across the health care system about emerging trends, challenges, and opportunities. Using primary research and rigorous analysis, and providing unique perspectives, DCHS seeks to be a trusted source for relevant, timely, and reliable insights.