Add Meson WrapDB mining pipeline #803#823
Add Meson WrapDB mining pipeline #803#823Adityakk9031 wants to merge 3 commits intoaboutcode-org:mainfrom
Conversation
Signed-off-by: Aditya kumar singh <143548997+Adityakk9031@users.noreply.github.com>
|
@pombredanne please have a look |
|
Out of curiosity, did you use any AI tool to create this code? |
pombredanne
left a comment
There was a problem hiding this comment.
Thanks. Please see comments. I look forward to your detailed replies.
| @@ -0,0 +1,33 @@ | |||
| { | |||
There was a problem hiding this comment.
Please use a real extract from the real JSON, not truncated.
There was a problem hiding this comment.
Test data — Replaced with real, complete entries verbatim from WrapDB
releases.json
(adamyaxley-obfuscate, aklomp-base64, apache-orc, bzip2, catch2 — all untruncated).
|
|
||
| self.assertEqual(len(all_results), 3) # ogg, zlib, catch2 | ||
|
|
||
| # Check ogg |
There was a problem hiding this comment.
Do not check these one by one. Reuse the data-driven with a JSON expected file
There was a problem hiding this comment.
Switched to fully data-driven approach comparing against
expected_purls.json
. No more one-by-one assertions.
| releases_path = Path(self.wrapdb_repo.working_dir) / "releases.json" | ||
| if not releases_path.exists(): | ||
| return 0 | ||
| with open(releases_path, encoding="utf-8") as f: |
There was a problem hiding this comment.
Are you really loading the whole file just to get a count?
There was a problem hiding this comment.
HTTP fetch — Removed repo cloning entirely. Now uses requests.get() to fetch just
releases.json
directly.
| WrapDB versions use a ``-N`` suffix to denote build recipe revisions that | ||
| are specific to the WrapDB and do not exist upstream. | ||
| """ | ||
| base_purl = PackageURL(type="meson", name=package_name) |
There was a problem hiding this comment.
Is this a registered PURL in the spec repo? If not we need one there first
There was a problem hiding this comment.
Removed from
pipes/meson.py
, kept only in
mine_meson.py
where it's used.
minecode_pipelines/pipes/meson.py
Outdated
| logger(f"releases.json not found at {releases_path}") | ||
| return | ||
|
|
||
| with open(releases_path, encoding="utf-8") as f: |
There was a problem hiding this comment.
You are already opening that file before.
minecode_pipelines/pipes/meson.py
Outdated
| from packageurl import PackageURL | ||
|
|
||
|
|
||
| MESON_WRAPDB_RELEASES_URL = ( |
There was a problem hiding this comment.
Where is this used? Why not use that rather than a repo clone?
| ) | ||
|
|
||
| def clone_wrapdb_index(self): | ||
| """Clone the Meson WrapDB repository.""" |
There was a problem hiding this comment.
Why do you clone the whole repo for now, since you are using only a single release file? And why try also a release.json URL elsewhere?
| logger=self.log, | ||
| ) | ||
|
|
||
| def packages_count(self): |
Signed-off-by: Aditya kumar singh <143548997+Adityakk9031@users.noreply.github.com>
|
@pombredanne Addressed all comments — real test data, data-driven tests, HTTP fetch instead of clone, removed duplicates and unused imports. Regarding the PURL type: meson |
Closes #803
Description
Add a new mining pipeline to collect package metadata from the
Meson WrapDB repository.
The WrapDB maintains a curated registry of ~350+ upstream C/C++ packages
used as subproject dependencies in the Meson build system. The pipeline
clones the WrapDB repo, parses its releases.json index, and yields
PURLs in the format
pkg:meson/<name>@<version>.Changes
New Files
releases.json and yields (base_purl, [versioned_purls]) tuples
MineMeson that clones WrapDB, counts packages, and publishes PURLs
via FederatedCode
get_meson_packages (4 test cases)
fixture with a 3-package subset of real WrapDB data
Modified Files
point under
[project.entry-points."scancodeio_pipelines"]Design Notes
pkg:mesonsince WrapDB versions carry a-Nsuffix for build recipe revisions that don't exist upstream (e.g.,
1.3.6-1), warranting a distinct type.rather than individual
.wrapfiles, since.wrapfiles only describethe latest version while releases.json contains all historical
versions.
pipeline (clone repo → count → yield PURLs → publish).
References
dependency_namesandversionsarrays