Machine learning (ML) plays a pivotal role in detecting malicious software.Inflated results in malware detection are due to spatial and temporal biases in experimental design.TESSERACT introduces constraints for fair experiment design and proposes a new metric, AUT, for classifier robustness.Performance enhancements are possible through periodic tuning and mitigation strategies.