<ul data-eligibleForWebStory="false"><li>A new technique has been developed to reconstruct the exact input that led to a language model's output, aiding in post-incident analysis and fake output detection.</li><li>The technique, called SODA, is a gradient-based algorithm that outperforms existing methods in recovering shorter out-of-distribution inputs from language models.</li><li>The experiments conducted on LLMs ranging from 33M to 3B parameters showed that SODA was successful in fully recovering 79.5% of shorter inputs but faced challenges with longer input sequences.</li><li>The study suggests that standard deployment practices may currently offer sufficient protection against the potential misuse of this reconstructive method.</li></ul>

GPT, But Backwards: Exactly Inverting Language Model Outputs

Discover more