Orthrus: Efficient and Timely Detection of Silent User Data Corruption in the Cloud with Resource-Adaptive Computation Validation
Orthrus is a system for the timely detection of silent user-data corruption (SDC) caused by post-installation CPU errors. Orthrus enables high-coverage protection of user-data in the cloud with minimal performance impact.
For more details, please refer to our SOSP 2025 paper:
Orthrus: Efficient and Timely Detection of Silent User Data Corruption in the Cloud with Resource-Adaptive Computation Validation
This repository contains the research artifact for evaluating the performance of the Orthrus Runtime.
For the automated testing framework of Fault-Injection (SDC errors Injection using LLVM compilers), please refer to the FaultInjection repository. We are continuing to develop that repository into an easy-to-use automated test platform for fault injection.
Orthrus is evaluated on Ubuntu 18.04 and Ubuntu 20.04 (recommended).
- CPU: Minimum 48 cores required for testing scripts.
- CMake: Version 3.20 or higher.
- Detailed requirements can be found in docs/prerequisite.md.
On Ubuntu 20.04, you can initialize the environment using the provided script:
sudo ./init.sh
source env.shAlternatively, you can use Docker Compose to run evaluations without manual setup.
To calculate the detection rate, you need pre-generated fault injection results.
- Download the file
fault_injection.tar.gzfrom: OneDrive Link - Extract it to the
datasets/directory.
You can run the full evaluation suite using one of the following methods:
Method A: Native Environment
just test-allMethod B: Docker Compose
docker-compose run test-allPerformance results will be saved in the results/ folder.
This section corresponds to Figures 6-9 in the paper. The tests cover: Memcached, Masstree, Phoenix, and LSMTree.
- The complete performance evaluation takes approximately 7 hours.
- For details of individual tests, please refer to:
This section corresponds to Table 2 in the paper.
- Full Test: A complete error injection test may take over 30 hours. Results in the paper are based on the full test.
- Fast Check: We provide a partial test for quick verification (~2-3 hours).
- More details can be found in Table 2 Fast Check and Table 2 Full.
- FaultInjection: Automated testing framework for SDC error injection using LLVM compilers.
If you use Orthrus in your research, please cite our SOSP 2025 paper:
@inproceedings{liu2025orthrus,
title={Orthrus: Efficient and Timely Detection of Silent User Data Corruption in the Cloud with Resource-Adaptive Computation Validation},
author={Liu, Chenxiao and Zhu, Zhenting and Li, Quanxi and Xia, Yanwen and Qiao, Yifan and Deng, Xiangyun and Lu, Youyou and Xie, Tao and Cui, Huimin and Du, Zidong and others},
booktitle={Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles},
pages={286--304},
year={2025}
}