SafeTuneBed is a benchmark and toolkit designed to unify fine-tuning and defense evaluation for large language models (LLMs).
The toolkit curates a diverse repository of fine-tuning datasets across various tasks, integrates state-of-the-art defenses, and provides evaluators for safety and utility metrics.
SafeTuneBed is built on Python with dataclass-driven configs and plugins, requiring minimal additional code for specifying fine-tuning regimes, defense methods, and metric suites.
It aims to standardize data, code, and metrics to facilitate rigorous and comparable research in safe LLM fine-tuning, serving as the first focused toolkit of its kind in this domain.