Differentially private (DP) language model inference is utilized for creating private synthetic text using large language models (LLM).
Clustering input data before selecting inference batches improves the quality of privately generated text, especially for heterogeneous topics.
A new algorithm aggregates next token statistics by privately computing medians instead of averages, benefiting from decreased local sensitivity.
This approach offers high-quality synthetic data with lower privacy cost compared to the previous state-of-the-art method, showcasing improvements in representativeness metrics and downstream task performance.