In this large-scale retrospective study of more than 5.2 million individual patient encounters, we investigated three different approaches for the identification of transition points between stages of patient care: DIANA-based clustering of patient ages, the OPERAND method based on FNN analysis by Neuman et al., and transitivity analysis derived from recurrence plots in the phase space of patient embeddings12. These methods represent a range of approaches examining patients at the individual and group level. Across all methodologies, we found that there was some alignment to pre-established age transitions, giving validation to conventional pediatric/geriatric age cutoffs. However, we found that a greater number of transition points was needed to divide the entire age-ordered population into like-cohorts. More specifically, we found that there were more transitions in the 20–40 age range. A potential interpretation of this result is that the age range of 20-40 is where many patients develop a first chronic diagnosis, or where earlier health events or choices lead to downstream outcomes. Conversely, while numerous transition points were found in the 20-40 age range, a range considered clinically homogeneous, few points were found between ages 3-17, a range often considered more heterogeneous. Although there is significant social and developmental changes in this age range, their structured clinical presentation may be consistent as a result of a minimal multimorbidity burden at that age range.
Examining first the similarity to conventional transitions, even with the precision offered by these methods, there was a striking resemblance to the clinical gestalt or physicians’ expectations. For instance DIANA based clustering at k = 3 led to clusters that roughly match the ages that delineate pediatric and adult as well as adult and geriatric care. However, the silhouette-metric optimal clustering performance occurred with five clusters. The groups, 0–2, 3–17, 18–41, 42–66, and 67–95 represent ages of infancy/toddler, childhood/adolescence, early adulthood, late adulthood, and geriatric populations. This may indicate that current clinical parameters overvalue certain age delineations, i.e. the difference between ages 2-11 as childhood and 11-18 as adolescence13. Current clinical breakdowns may also undervalue further stratification in other age ranges, for example, the difference between early adulthood and late adulthood.
However, the age groups identified through cluster-based analysis are not consistent across the different feature sets: ICD-10CM, CPT, LOINC, and RxNORM. For example, the age breakdowns of ICD-10CM codes are similar to all features together, while CPT-based clustering with k = 5 was 0–3, 3–18, 18–64, and 64–78, 78–95. This breakdown groups all adults into one cluster while providing more stratification in the geriatric population. It is difficult to elucidate the exact reason for the difference; however, one hypothesis might be the increased rates of screening procedures that a younger geriatric population receives, while at 75 most screening procedures are suspended while procedures become more diagnosis specific.
By employing the OPERAND methodology, we are able to investigate transition points with greater granularity than a one-year interval, and range of possible transition of these points may be interpreted as potential “transition zones.” These transition zones represent clinical heterogeneity at the individual and population levels. It is unreasonable to expect every patient to transition between clinical phases at a precise age, and it is also not accurate to say that any one patient suddenly transitions at a specific age, whatever that age may be.
The OPERAND-derived age groups shed light on the potential oversimplification of treating all adult patients as a single cohort. Notably, these transitions predominantly revolve around the 20–40-year-old age range as stated previously, which is conventionally viewed as clinically homogeneous. As show in in Fig. 2(d), three transition zones: around ages 4.25, 5.8, and 14 occur in the pediatric age range. While two transition zones around ages 55.6 and 70 occur in the older adult phase, and the remaining six: ages 20.6, 24.75, 29.1, 33.1, 35.7, and 38.7 occur in only an 18-year window. As noted earlier, these transition points may signify the onset of chronic diseases in patients. Patients tend to exhibit fewer associated codes for their care at younger ages, indicating a period of relative health, while at older ages they tend to manifest a stable heterogeneity, indicating diverse health outcomes. Although older patients may not be similar, they have reached various health outcomes. The ages of 20–40 encompass the overall transition between these stages.
Supplementary Fig. 3 provides another mechanism for interpreting the OPERAND results. Through the use of a 1-year running sum, the graph approximates the overall perturbations in the phase space of the patients within that window. Three peaks are defined around the ages 4, 14, and 24. Although all transition points should be considered changes between clinical phases of patients, the magnitude of the peaks in the 1-year window could be interpreted as the degree of change between phases before and after the peak. For example, the transition of the stage of healthcare before and after 24 years of age may be of greater difference than before and after 70, which has a lower peak.
Finally, we use transitivity analysis on the same phase-space representation. Figure 2c shows the DT of groups of 1000 patients ordered by age. Although transition points are not identified, this graph may represent the flux, or how much change is occurring, in the phase space at that age range. For instance, between ages 18 and 22 there is a significant increase in DT, which may indicate a large transition in the nature of patient complexity at that time interval. Supplementary Fig. 2 shows the coefficient of variance of DT. Peaks of this variance represent areas with higher than average variation in DT at that age. These peaks may be interpreted as transition points identified through the transitivity analysis.
A limitation of this analysis is the self-fulfilling nature of the current clinical gestalt that the analysis may be identifying. The reason some of the transition points identified are similar to current clinical age cutoffs may be because physicians are influenced in the diagnoses, labs, medications, and procedures these patients are receiving when they cross these age cutoffs. For example, was a transition point at 18 years old identified because there truly is a change in the stage of patient care independent of current clinical gestalt, or was it found because patients are already changing from pediatric to adult providers at that time, and our analysis is simply identifying that change. Validation in a dataset of patients from other institutions would help confirm these results. The methodology employed in this analysis offers a high level of reproducibility, largely owing to its reliance on established methodologies and readily available tools. The foundation of our approach rests heavily on the work of Neuman et al., who have provided comprehensive Python and MATLAB packages for both the OPERAND and transitivity-based methodologies12. Furthermore, the Diana-based clustering can be executed using numerous pre-existing packages. Additionally, all structured data was obtained through the Observational Medical Outcomes Partnership (OMOP) Common Data Model created by the Observational Health Data Sciences and Informatics (OHDSI) program14. Facilitation by OHDSI would allow for the analysis of a large-scale multi-institutional dataset on the previously described existing pipelines.
Our analysis has proposed three methods to investigate the stages of patient care in the healthcare setting. Our results indicate that current clinical gestalt may be similar to transition points identified through unsupervised approaches. An important distinction is that through all analysis methods, we identified multiple subgroups within the “adult” population conventionally thought of as 18–65, which are often treated as one patient population. This heterogeneity in the adult population may be a starting point for more granular consideration of age-specific care needs among adults. Furthermore, it may provide insight into resource allocation and efforts for when interventions before the development of new diseases in a patient may be most efficacious. More testing is needed to examine the clinical and epidemiological potential and validity of these transition points.
link