Article contents
OncoViz USA: ML-Driven Insights into Cancer Incidence, Mortality, and ScreeningDisparities
Abstract
Cancer incidence and mortality in the United States have shown overall improvement in recent decades, yet not all populations and regions have benefited equally. We present OncoViz USA, a comprehensive analysis integrating public datasets from 2010 to 2020 to uncover factors underlying the rising cancer burden and lagging screening rates. Using national cancer registries (CDC USCS) alongside data on screening (BRFSS), social determinants (ACS), and healthcare access (HRSA), we applied machine learning models (gradient boosting and random forests) to identify key predictors of high cancer incidence/mortality and low screening uptake. Model explainability tools (SHAP values) highlighted the contributions of demographics, socioeconomic status, health behaviors, and healthcare access to geographic disparities. We further conducted time-series forecasting (Prophet/ARIMA) to project short-term cancer trends and spatial analyses (Moran’s I and cluster detection) to identify high-risk “hotspots.” Our results indicate that socioeconomic and healthcare-access variables, including poverty, educational attainment, insurance coverage, and provider availability, are the strongest drivers of persistent cancer disparities, alongside behavioral factors such as smoking. Areas with increasing cancer death rates or poor screening uptake clustered in economically disadvantaged and rural regions. The findings provide data-driven insight into where targeted screening and prevention efforts are most urgently needed. The OncoViz USA approach demonstrates the power of machine learning and multi-source data integration to inform public health strategies aimed at achieving equitable cancer outcomes.