Large language models have transformed natural language processing, yet supervised fine-tuning (SFT) remains computationally intensive.
Capabilities acquired through supervised fine-tuning can be approximated by a base transformer model using inference-time techniques like in-context learning (ICL), without altering model parameters.
The paper extends these results to practical scenarios with finite context lengths and partial dataset access, providing insights into resource-efficient deployment of large language models.
For text generation tasks and linear classification, certain dataset sizes are identified to approximate fine-tuned behavior within specified error margins, offering a theoretical foundation for real-world applications.