Gradient-based optimization in deep learning raises privacy and security concerns due to data poisoning attacks and overfitting risks.
Black box optimization methods offer an alternative by treating the model as an opaque function, but face challenges in scalability and computational costs, especially in large language models (LLMs).
A new method called BBoxER is introduced for LLM post-training, inducing an information bottleneck via implicit compression of training data.
BBoxER provides theoretical bounds on generalization, privacy, data poisoning attacks, and robustness to extraction attacks, demonstrating promising results in experiments with LLMs.