Recent advances in large language models (LLMs) have enabled automated dataset labeling with minimal human supervision.
A novel online framework called Cost-aware Majority Voting (CaMVo) is proposed for efficient and accurate LLM-based dataset annotation.
CaMVo adaptively selects a subset of LLMs for each data instance based on contextual embeddings to balance confidence and cost without pre-training or ground-truth labels.
Empirical evaluation on the MMLU and IMDB Movie Review datasets shows that CaMVo achieves comparable or superior accuracy to full majority voting while significantly reducing labeling costs, making it a practical and robust solution for cost-efficient annotation.