Machine learning has been used for network traffic classification for over two decades.
Recent findings suggest that a simple k-NN baseline using packet sequences metadata can perform as well as or even better than complex neural networks.
Analysis reveals that many datasets contain over 50% redundant samples, impacting model performance and accuracy estimation.
The study suggests that standard machine learning practices may not be suitable for network traffic classification and proposes new directions for evaluation in the field.