Combining the feature extraction capabilities of neural networks with traditional algorithms like k-nearest neighbors (k-NN) in supervised machine learning is common.
Supervised fine-tuning (SFT) on a domain-appropriate feature extractor, followed by training a traditional predictor on the resulting SFT embeddings, often leads to improved performance.
Directly incorporating traditional algorithms into SFT as prediction layers can enhance performance, but challenges arise due to their non-differentiable nature.
Nearness of Neighbors Attention (NONA) regression layer, introduced as a solution, uses neural network attention mechanics and a novel attention-masking scheme to create a differentiable proxy of the k-NN regression algorithm, resulting in improved regression performance on various datasets.