<ul><li>Combining the feature extraction capabilities of neural networks with traditional algorithms like k-nearest neighbors (k-NN) in supervised machine learning is common.</li><li>Supervised fine-tuning (SFT) on a domain-appropriate feature extractor, followed by training a traditional predictor on the resulting SFT embeddings, often leads to improved performance.</li><li>Directly incorporating traditional algorithms into SFT as prediction layers can enhance performance, but challenges arise due to their non-differentiable nature.</li><li>Nearness of Neighbors Attention (NONA) regression layer, introduced as a solution, uses neural network attention mechanics and a novel attention-masking scheme to create a differentiable proxy of the k-NN regression algorithm, resulting in improved regression performance on various datasets.</li></ul>

Nearness of Neighbors Attention for Regression in Supervised Finetuning

Discover more