AI systems in high-consequence domains need to detect rare, high-impact events while operating under tight constraints.
Traditional annotation strategies may introduce noise, limiting model generalization.
Smart-sizing training data strategy introduced, emphasizing label diversity and model-guided selection.
Experiments show that models trained on 20 to 40 percent of curated data can match or exceed full-data baselines, especially in rare-class recall and edge-case generalization.