Defined benefit plans are designed to provide financial security in one’s golden years, however retirement preparedness can be influenced by various factors.
A data-driven clustering analysis can uncover insights to help improve retirement outcomes and lead to informed decisions and resource allocation. The data for this analysis is taken from federalreserve.gov.
Below are some code cells with plots that give good insights into the data. DB plans are correlated with a higher likelihood of individuals falling within the middle to upper-middle income percentiles. Individuals without DB plans are more frequently represented in the lower income percentiles.
Most households have debt levels below $20 million. For households with Defined Benefit Plans, there is a notable cluster with high-value homes in the range of $120–130 million.
Individuals with higher education levels are considerably more likely to have a Defined Benefit Plan (DBP) compared to those with lower educational attainment. This disparity in DBP participation suggests that individuals with lower educational attainment may face greater challenges in retirement planning.
Individuals with Defined Benefit Plans (DBPs) are more frequently found in higher net worth percentiles, suggesting greater wealth accumulation. The significant presence of DBP holders in the upper net worth categories indicates that this group is likely better prepared for retirement and enjoys greater financial security.
Segmenting the population through clustering revealed how income, assets, and debts impact financial security. These insights can inform policymakers and financial planners in developing strategies to improve retirement outcomes across various segments.
The complete source code and detailed methodology for this project are available on GitHub.
This project can help understand the differing levels of retirement preparedness among individuals with DB plans and can help policymakers and financial planners in developing strategies for improving retirement outcomes.
The libraries used in the analysis included pandas, numpy, plotly.express, matplotlib.pyplot, seaborn, scipy.stats.mstats, sklearn.cluster, sklearn.decomposition, sklearn.metrics, sklearn.pipeline, pd, np, px, plt, sns, silhouette_score, make_pipeline, StandardScaler, and chi2_contingency.