A new benchmarking paradigm called Leave-one-chromosome-out (LOCO) has been proposed for deep learning based prediction of enhancer-promoter interactions (EPI).
Traditional methods randomly split the dataset into training and testing subsets, leading to performance overestimation due to information leakage.
The LOCO cross-validation approach demonstrates that a deep learning algorithm's performance drops drastically, highlighting the overestimation of performance in random-splitting settings.
A novel hybrid deep neural network that combines k-mer features of the nucleotide sequence is proposed, showing significantly better performance in the LOCO setting.