Legacy software modernization is challenging due to a lack of documentation and understanding of original decision logic.
A novel pipeline using Reinforcement Learning (RL) and counterfactual analysis is proposed to extract interpretable decision logic from legacy systems treated as black boxes.
The approach involves using an RL agent to explore input space, identify decision boundaries, cluster counterfactual state transitions, and train decision trees to extract human-readable rules approximating the system's decision logic.
The pipeline's effectiveness was demonstrated on dummy legacy systems with various complexities, showing successful focus on relevant boundary regions and accurate extraction of core logic, offering potential for generating specifications and test cases during legacy migration.