To verify the reproducibility of an application, it is often necessary to execute it multiple times, each time with a different input, and evaluate whether the application outcome changes. Given the complexity of the software, any unexpected behavior in the outcome requires quick insight into application behavior. Efficient execution tracing is instrumental but traces must often be compared to attain that insight. Comparing execution traces is a significant challenge, especially for parallel MPI applications as tasks may exchange messages in a non-deterministic order. Our results demonstrate that, without selective replay, multiple sources of non-determinism lead to numerous false positives in comparing two runs. Through selective replay, we uncover and explain all divergences and convergences, achieving a reduction in false positives by more than 50%.
@article{lu2025accurate,title={Accurate Differential Analysis using Record and Selective Replay},author={Nakamura, Yuta and Chu, Xulu and Malik, Tanu and Laguna, Ignacio},journal={International Conference on Scientific and Statistical Database Management (SSDBM)},year={2025},}