<ul data-eligibleForWebStory="true"><li>The study focuses on Large Language Models (LLMs) used for generating tabular data with in-context learning.</li><li>LLMs are crucial for data augmentation in scenarios with limited data availability.</li><li>Previous research showcased LLMs enhancing task performance by augmenting underrepresented groups.</li><li>However, this enhancement often assumes access to unbiased in-context examples.</li><li>Real-world data is typically noisy and skewed, differing from ideal scenarios.</li><li>The research delves into how biases within in-context examples impact the distribution of synthetic tabular data.</li><li>Even subtle biases in in-context examples can cause significant global statistical distortions.</li><li>An adversarial situation is introduced where a malicious contributor injects bias via in-context examples, jeopardizing classifier fairness for a specific subgroup.</li><li>The study uncovers a vulnerability in LLM-based data generation pipelines when using in-context prompts in sensitive domains.</li></ul>

In-Context Bias Propagation in LLM-Based Tabular Data Generation

Discover more