<ul><li>Recent advances in large language models (LLMs) have enabled automated dataset labeling with minimal human supervision.</li><li>A novel online framework called Cost-aware Majority Voting (CaMVo) is proposed for efficient and accurate LLM-based dataset annotation.</li><li>CaMVo adaptively selects a subset of LLMs for each data instance based on contextual embeddings to balance confidence and cost without pre-training or ground-truth labels.</li><li>Empirical evaluation on the MMLU and IMDB Movie Review datasets shows that CaMVo achieves comparable or superior accuracy to full majority voting while significantly reducing labeling costs, making it a practical and robust solution for cost-efficient annotation.</li></ul>

Cost-aware LLM-based Online Dataset Annotation

Discover more