Recent breakthroughs in single-cell technology have led to the need for efficient annotation of long-tailed single-cell data pertaining to disease conditions.
To address this challenge, Celler, a generative pre-training model, has been introduced that incorporates the Gaussian Inflation (GInf) Loss function and Hard Data Mining (HDM) strategy.
The GInf Loss function dynamically adjusts sample weights, improving the model's ability to learn from rare categories and reducing the risk of overfitting for common categories.
The HDM strategy targets difficult-to-learn minority data samples, significantly improving the model's predictive accuracy.