Skip to content

Add Meson WrapDB mining pipeline #803#823

Open
Adityakk9031 wants to merge 3 commits intoaboutcode-org:mainfrom
Adityakk9031:#803
Open

Add Meson WrapDB mining pipeline #803#823
Adityakk9031 wants to merge 3 commits intoaboutcode-org:mainfrom
Adityakk9031:#803

Conversation

@Adityakk9031
Copy link

Closes #803

Description

Add a new mining pipeline to collect package metadata from the
Meson WrapDB repository.

The WrapDB maintains a curated registry of ~350+ upstream C/C++ packages
used as subproject dependencies in the Meson build system. The pipeline
clones the WrapDB repo, parses its releases.json index, and yields
PURLs in the format pkg:meson/<name>@<version>.

Changes

New Files

  • minecode_pipelines/pipes/meson.py — Pipe module that parses
    releases.json and yields (base_purl, [versioned_purls]) tuples
  • minecode_pipelines/pipelines/mine_meson.py — Pipeline class
    MineMeson that clones WrapDB, counts packages, and publishes PURLs
    via FederatedCode
  • minecode_pipelines/tests/pipes/test_meson.py — Unit tests for
    get_meson_packages (4 test cases)
  • minecode_pipelines/tests/test_data/meson/releases.json — Test
    fixture with a 3-package subset of real WrapDB data

Modified Files

  • pyproject-minecode_pipelines.toml — Registered mine_meson entry
    point under [project.entry-points."scancodeio_pipelines"]

Design Notes

  • PURL type: Uses pkg:meson since WrapDB versions carry a -N
    suffix for build recipe revisions that don't exist upstream (e.g.,
    1.3.6-1), warranting a distinct type.
  • Data source: Uses releases.json (single JSON index at repo root)
    rather than individual .wrap files, since .wrap files only describe
    the latest version while releases.json contains all historical
    versions.
  • Architecture: Follows the same pattern as the existing Conan
    pipeline (clone repo → count → yield PURLs → publish).

References

Signed-off-by: Aditya kumar singh <143548997+Adityakk9031@users.noreply.github.com>
@Adityakk9031
Copy link
Author

@pombredanne please have a look

@pombredanne
Copy link
Member

Out of curiosity, did you use any AI tool to create this code?

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Please see comments. I look forward to your detailed replies.

@@ -0,0 +1,33 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a real extract from the real JSON, not truncated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test data — Replaced with real, complete entries verbatim from WrapDB

releases.json
(adamyaxley-obfuscate, aklomp-base64, apache-orc, bzip2, catch2 — all untruncated).


self.assertEqual(len(all_results), 3) # ogg, zlib, catch2

# Check ogg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not check these one by one. Reuse the data-driven with a JSON expected file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to fully data-driven approach comparing against

expected_purls.json
. No more one-by-one assertions.

releases_path = Path(self.wrapdb_repo.working_dir) / "releases.json"
if not releases_path.exists():
return 0
with open(releases_path, encoding="utf-8") as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you really loading the whole file just to get a count?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP fetch — Removed repo cloning entirely. Now uses requests.get() to fetch just

releases.json
directly.

WrapDB versions use a ``-N`` suffix to denote build recipe revisions that
are specific to the WrapDB and do not exist upstream.
"""
base_purl = PackageURL(type="meson", name=package_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a registered PURL in the spec repo? If not we need one there first

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed from

pipes/meson.py
, kept only in

mine_meson.py
where it's used.

logger(f"releases.json not found at {releases_path}")
return

with open(releases_path, encoding="utf-8") as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are already opening that file before.

from packageurl import PackageURL


MESON_WRAPDB_RELEASES_URL = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used? Why not use that rather than a repo clone?

)

def clone_wrapdb_index(self):
"""Clone the Meson WrapDB repository."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you clone the whole repo for now, since you are using only a single release file? And why try also a release.json URL elsewhere?

logger=self.log,
)

def packages_count(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

@Adityakk9031 Adityakk9031 reopened this Mar 3, 2026
@Adityakk9031 Adityakk9031 reopened this Mar 4, 2026
Signed-off-by: Aditya kumar singh <143548997+Adityakk9031@users.noreply.github.com>
@Adityakk9031
Copy link
Author

Adityakk9031 commented Mar 4, 2026

@pombredanne Addressed all comments — real test data, data-driven tests, HTTP fetch instead of clone, removed duplicates and unused imports. Regarding the PURL type:

meson
isn't registered yet. Should I open a PR at purl-spec first, or use pkg:generic for now? And yes, I used AI for initial scaffolding but reviewed everything myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Collect meson wrapdb

2 participants