Cherry-picking datasets in time series forecasting can significantly distort the perceived performance of forecasting methods.Selective dataset selection can lead to an exaggeration of the effectiveness of forecasting methods.By selectively choosing just four datasets, 46% of methods could be considered best in class.Increasing the number of datasets tested from 3 to 6 reduces the risk of incorrectly identifying an algorithm as the best one by approximately 40%.