Both proudly and embarrassedly, all inspected inconsistencies in our ASE '25 work were manually done. We publish the dataset here for future research on automating this process, particularly
- How to obtain a standalone, reduced program that retains the same buggy symptom from the original large, multi-file, dependent Debian source code?
- How to cluster bug triggering code (reduced or not) so that each cluster ideally represents a distinct bug? How to attribute newly seen bug triggering code to known bugs?
- How are the unique challenges in reducing or deduplicating coverage tool bugs than general compiler bugs?
The dataset can be found ./reduce/dataset/ with the following structure1:
reduce/dataset/
├── ET-inconsistencies
│ ├── line_coverage.csv
│ ├── branch_coverage.csv
│ └── mcdc.csv
├── ET-inspection
│ ├── line_coverage.csv
│ ├── branch_coverage.csv
│ └── mcdc.csv
├── SC-inconsistencies
│ ├── line_coverage.csv
│ ├── branch_coverage.csv
│ └── mcdc.csv
└── SC-inspection
├── line_coverage.csv
├── branch_coverage.csv
└── mcdc.csv
-
"Inconsistencies": different coverage reported by Gcov and LLVM-cov
Columns:
package,file name,line number,inconsistency type,gcov report,llvm-cov reportExamples:
apache2,apache2-2.4.62/server/mpm_unix.c,901,line_val,4,2
grep,grep-3.8/lib/stackvma.c,363,branch_val,"[0, 52, 52, 4038, 4038, 4090]","[0, 52, 52, 3962, 3962, 4014]"
-
"Inspection": manually labeled cause
Columns:
package,file name,line number,reason type,reasonExamples:
bzip2,bzip2-1.0.8/blocksort.c,514,bug,GCC#121901
less,less-590/main.c,145,bug,KNOWN BUG LLVM#UCF
-
Small Commands ("SC") and Existing Tests ("ET"): please refer to paper III-C.
Given an entry such as bzip2,bzip2-1.0.8/blocksort.c,514,bug,GCC#121901,
how do I find and view this blocksort.c?
-
If you've run sections 1, 2, and 3 in README.md, the source code is under
/var/lib/sbuild/build*directories. In this particular example,/var/lib/sbuild/build-ET/bzip2-gcc-1/bzip2-1.0.8/blocksort.c. Please refer to "4. Inspect Raw DebCovDiff Results" for details. -
Alternatively, you can skip the end-to-end run and directly get the source code and source code only.
git clone https://github.com/xlab-uiuc/DebCovDiff.git cd DebCovDiff/reduce bash download-source.shThe source code is now in
./sourcedirectory. In this particular example,source/bzip2-gcc-1/bzip2-1.0.8/blocksort.c.
For now please run DebCovDiff from end to end. We are working on a fast path for you, potentially by providing you a tarball of the authors' own run.
Footnotes
-
This dataset is derived and cleaned from the version for the ASE '25 paper (./tables-and-figures), with unified terminologies and labels, and slight changes. The conversion is reproduced and documented in this script. ↩