Irem Kar, n/a: No financial relationships to disclose
Selen Bozkurt, PhD: No financial relationships to disclose
Key Message : Time-to-event fairness analysis exposes hidden inequities in prognostic models used for survival prediction. Even when overall accuracy is high, subgroup and intersectional disparities remain, emphasizing the importance of integrating fairness evaluation to ensure that survival predictions support equitable healthcare planning.
Abstract:
Background: Survival prediction models are central to serious-illness analytics, guiding treatment planning and the timing of palliative and hospice referrals. However, conventional metrics (e.g., C-index, Brier score) capture overall accuracy and do not assess whether predictions remain reliable across patient subgroups.
Objectives: To evaluate subgroup and intersectional fairness of a metastatic NSCLC survival modeling workflow by applying fairness metrics tailored to time‑to‑event outcomes.
Methods: We analyzed SEER records for adults diagnosed with metastatic NSCLC (2004-2016). Cox proportional hazards (CoxPH)1, Random Survival Forests (RSF)2, and BlackBoost3 models were trained using demographic, clinical, pathologic, and tumor characteristics. Model performance was evaluated with the concordance index (C-index)5, integrated Brier score (IBS), and time-dependent area under the curve (iAUC). Fairness was assessed using three complementary metrics: individual fairness (Fi; lower values indicate more consistent predictions for similar patients), group fairness (Fg; parity across single subgroups), and intersectional fairness (F∩; parity across intersecting subgroups)5. Subgroups included age, sex, race, income, marital status, and rural/urban status. Lower fairness values indicated greater parity.
Results: The BlackBoost model achieved the highest overall predictive performance (C-index 0.69, IBS 0.06, iAUC 0.82). RSF had the lowest Fi (0.15), indicating the most consistent treatment of clinically similar patients, whereas CoxPH showed the largest individual-level disparities (Fi 2.19). Group fairness (Fg) results were generally balanced across single sociodemographic categories; however, intersectional analyses revealed substantial differences, particularly for age × income (CoxPH 0.86; RSF 0.51; BlackBoost 0.75) and race × age (CoxPH 1.04; RSF 0.59; BlackBoost 0.82).
Conclusions: Fairness in survival prediction is an emerging research area. Unlike fairness analyses based on confusion matrix metrics in binary models, survival fairness incorporates time-to-event data, revealing disparities not captured by standard metrics. Integrating fairness assessments into survival modeling is essential to ensure prognostic tools are accurate and equitable to support decision-making in serious illness care.
References: 1. Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187-202. 2. Ishwaran, H., Kogalur, U. B., Blackstone, E. H., & Lauer, M. S. (2008). Random survival forests. 3. Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M., & Hofner, B. (2010). Model-based boosting 2.0. 4. Harrell FE, Califf RM, Pryor DB, et al. Evaluating the yield of medical tests. JAMA 1982;247(18):2543-2546; doi:10.1001/jama.1982.03320430047030. 5. Hu, S., & Chen, G. H. (2024). Fairness in survival analysis with distributionally robust optimization. Journal of machine learning research, 25(246), 1-85.
Learning Objectives:
1. To describe fairness metrics for survival prediction models, including Individual Fairness (Fi), Group Fairness (Fg), and Intersectional Fairness (F∩).
2. To explain how fairness analysis in survival settings differs from binary outcomes and how intersectional metrics can uncover disparities hidden in single-group comparisons.