The article discusses a peculiar intermittent failure encountered in a testing pipeline.
The author describes maintenance of a suite of automated tests, with one test repeatedly failing intermittently.
The failing test involved checking a chart plotting data from the last 30 days against data retrieved from an API.
Despite local transformations to align the data structures, the test would intermittently fail without a clear reason.
Deepdiff comparisons were used for data validation, highlighting differences that were challenging to interpret in the logs.
The author attempted to isolate the issue by running the test locally but was unable to reproduce the problem.
Further analysis revealed discrepancies in the date ranges requested from the API and displayed on the UI, leading to the intermittent failures.
A pattern was observed where the test failed only in the first run of the day shortly after midnight UTC, hinting at a time-related issue.
Additional logging was added to gather more information on the discrepancy between JavaScript and Python times during test execution.
Ultimately, the issue was traced back to clock synchronization problems on machines running Selenium Grid, resolved by configuring NTP to use a specific server.