-
Notifications
You must be signed in to change notification settings - Fork 12
chore(medcat-gliner): CU-869c3bvm0 Migrate gliner implementation to public #328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
mart-r
merged 28 commits into
main
from
chore/medcat-gliner/CU-869c3bvm0-migrate-gliner-implementation-to-public
Feb 13, 2026
Merged
Changes from 10 commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
2f5f644
CU-869bepjj9 Add gliner for medcat (#165)
mart-r b8688f3
CU-869bepjj9: Adopt rename of external (to MedCAT) projects as plugin…
mart-r 7bbc7ad
Add NER recall comparison section to README (#196)
mart-r 8620d20
Merge remote-tracking branch 'temp-source-for-gliner/main' into chore…
mart-r 884ba50
CU-869c3bvm0: Add workflow for gliner
mart-r 28f8dd3
CU-869c3bvm0: Add release workflow for gliner
mart-r f549067
CU-869c3bvm0: Only do lazy registration for gliner
mart-r 6dff54b
CU-869c3bvm0: Update pyproject toml to only support medcat 2.5 and up…
mart-r 393ad21
CU-869c3bvm0: Add publish to TestPyPI to workflow
mart-r f10a918
CU-869c3bvm0: Update version scheme in pyproject.toml
mart-r ce819a8
CU-869c3bvm0: Update pyproject.toml with root / git path for gliner
mart-r 7a7a18e
CU-869c3bvm0: Update pyproject.toml with git describe command
mart-r f09b22e
CU-869c3bvm0: Setup dev version before dep install
mart-r 25f2b8e
CU-869c3bvm0: Update some of the actions
mart-r b9c5f81
CU-869c36ruk: Update tag regex
mart-r 2c15273
CU-869c3bvm0: [TODO: REMOVE] Add debug output regarding tags to main …
mart-r 42ad9f0
CU-869c3bvm0: Fix workflow actions version issue
mart-r 88c0f8e
Revert "CU-869c3bvm0: [TODO: REMOVE] Add debug output regarding tags …
mart-r d912335
CU-869c3bvm0: Fix linting issue
mart-r 9fae9d9
CU-869c3bvm0: Fix permissions issue with workflow PyPI push
mart-r 2b3102c
CU-869c3bvm0: Update gliner plugin details in plugin catalog
mart-r dfdec2f
CU-869c3bvm0: Fix small issue with plugin catalog
mart-r 8ee4b78
CU-869c3bvm0: Update docstring for clarity in gliner_ner.py
mart-r 64362a9
CU-869c3bvm0: Rename workflow file
mart-r 230494e
CU-869c3bvm0: Centralise workflow file
mart-r 89bbd71
CU-869c3bvm0: Remove unnecessary line
mart-r d0bed62
CU-869c3bvm0: Moved to uv in workflows
mart-r eebb351
CU-869c3bvm0: Add transitive dependency (with description) to pyproje…
mart-r File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,88 @@ | ||
| name: MedCAT-Gliner-main | ||
|
|
||
| on: | ||
| push: | ||
| branches: [ main ] | ||
| paths: | ||
| - 'medcat-plugins/medcat-gliner/**' | ||
| - '.github/workflows/medcat-gliner**' | ||
| pull_request: | ||
| paths: | ||
| - 'medcat-plugins/medcat-gliner/**' | ||
| - '.github/workflows/medcat-gliner**' | ||
|
|
||
| defaults: | ||
| run: | ||
| working-directory: ./medcat-plugins/medcat-gliner | ||
|
|
||
| jobs: | ||
| tests: | ||
|
|
||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 30 | ||
| strategy: | ||
| matrix: | ||
| python-version: [ '3.10', '3.11', '3.12', '3.13' ] | ||
| max-parallel: 4 | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - name: Set up Python ${{ matrix.python-version }} | ||
| uses: actions/setup-python@v4 | ||
| with: | ||
| python-version: ${{ matrix.python-version }} | ||
| - name: Install dependencies | ||
| run: | | ||
| df -h # check spaces before | ||
| python -m pip install --upgrade pip | ||
| pip install -e ".[dev]" --extra-index-url https://download.pytorch.org/whl/cpu/ | ||
| df -h # check spaces after | ||
| - name: Check types | ||
| run: | | ||
| python -m mypy --follow-imports=normal src/medcat_gliner --follow-untyped-imports | ||
| - name: Lint | ||
| run: | | ||
| ruff check src/medcat_gliner --preview | ||
| - name: Test | ||
| run: | | ||
| python -m unittest discover | ||
|
|
||
| publish-to-test-PyPI: | ||
|
|
||
| runs-on: ubuntu-latest | ||
| needs: tests | ||
| steps: | ||
| - name: Checkout main | ||
| uses: actions/checkout@v6 | ||
| with: | ||
| fetch-depth: 0 # fetch all history | ||
| fetch-tags: true # fetch tags explicitly | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v6 | ||
| with: | ||
| python-version: '3.10' | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| python -m pip install --upgrade pip | ||
| python -m pip install --upgrade build | ||
|
|
||
| - name: Set timestamp-based dev version | ||
| run: | | ||
| TS=$(date -u +"%Y%m%d%H%M%S") | ||
| echo "SETUPTOOLS_SCM_PRETEND_VERSION_FOR_MEDCAT_GLINER=0.2.2.dev${TS}" >> $GITHUB_ENV | ||
|
|
||
| - name: Install package in development mode | ||
| run: | | ||
| pip install -e . | ||
|
|
||
| - name: Build package | ||
| run: | | ||
| python -m build | ||
|
|
||
| - name: Publish distribution to TestPyPI | ||
| uses: pypa/gh-action-pypi-publish@release/v1 | ||
| with: | ||
| repository_url: https://test.pypi.org/legacy/ | ||
| packages_dir: medcat-plugins/medcat-gliner/dist | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| name: medcat-gliner release-build | ||
|
|
||
| on: | ||
| push: | ||
| tags: | ||
| - 'medcat-gliner/v*.*.*' | ||
|
|
||
| permissions: | ||
| id-token: write | ||
|
|
||
| defaults: | ||
| run: | ||
| working-directory: ./medcat-plugins/medcat-gliner | ||
|
|
||
| jobs: | ||
| test-and-publish-to-PyPI: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Checkout main | ||
| uses: actions/checkout@v6 | ||
|
|
||
| - name: Release Tag | ||
| # If GITHUB_REF=refs/tags/medcat-gliner/v0.1.2, this returns v0.1.2. Note it's including the "v" though it probably shouldnt | ||
| run: echo "RELEASE_VERSION=${GITHUB_REF##refs/*/}" >> $GITHUB_ENV | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v6 | ||
| with: | ||
| python-version: '3.10' | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| python -m pip install --upgrade pip | ||
| python -m pip install --upgrade build | ||
|
|
||
| - name: Install client package in development mode | ||
| run: | | ||
| pip install -e ".[dev]" | ||
|
|
||
| - name: Test | ||
| run: | | ||
| pytest tests | ||
|
|
||
| - name: Build client package | ||
| run: | | ||
| python -m build | ||
|
|
||
| - name: Publish production distribution to PyPI | ||
| uses: pypa/gh-action-pypi-publish@release/v1 | ||
| with: | ||
| packages_dir: medcat-plugins/medcat-gliner/dist |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| # MedCAT-gliner | ||
|
|
||
| This provides [gliner](https://github.com/urchade/GLiNER) based NER step for MedCAT core library. | ||
|
|
||
| # Usage | ||
|
|
||
| First install from PyPI, e.g: | ||
| ``` | ||
| pip install medcat-gliner | ||
| ``` | ||
| Subsequently, if you have an existing model, you should be able to just change the NER component: | ||
| ``` | ||
| cat = CAT.load_model_pack("path/to/existing/model") | ||
| # change component | ||
| from medcat_gliner import GLiNERConfig | ||
| cat.config.components.ner.comp_name = "gliner_ner" | ||
| cat.config.components.ner.custom_cnf = GLiNERConfig() | ||
| # recreate pipe with new NER component | ||
| cat._recreate_pipe() | ||
| # use as needed | ||
| ``` | ||
|
|
||
| ## NER recall comparison (linkable SNOMED entities) | ||
|
|
||
| The following results compare the existing NER (vocab based NER with spell checking) implementation with the gliner implementation when used as the NER component within MedCAT. | ||
| Evaluation was performed on the **2023 SNOMED CT Linking Challenge** dataset. | ||
|
|
||
| > **Important caveat** | ||
| > This is **not a measure of general NER quality**. | ||
| > Recall is computed only with respect to annotated, linkable SNOMED CT entities present in the linking dataset. | ||
| > Mentions outside the annotation scope are treated as false positives by construction, so precision is not meaningful here. | ||
|
|
||
| | Implementation | True Positives | False Negatives | Recall | Runtime | | ||
| | ---------------------- | -------------- | --------------- | ------ | ------- | | ||
| | Vocab based NER | 10,545 | 3,917 | 0.729 | ~5m 50s | | ||
| | GliNER implementation | 7,971 | 6,491 | 0.551 | ~34m | | ||
|
|
||
| As we can see, for this dataset, GliNER is significantly slower and performs worse than the standard vocab based implementation. This is likely because the vocab based NER step has been configured and tuned to work best within the MedCAT pipeline. It is likely that with additional tuning the GliNER implementation could perform as good or better than the vocab based linker does. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| [build-system] | ||
| requires = ["setuptools>=61.0", "wheel", "setuptools_scm>=8"] | ||
| build-backend = "setuptools.build_meta" | ||
|
|
||
| [project] | ||
| name = "medcat_gliner" | ||
| dynamic = ["version"] | ||
| description = "" | ||
| readme = "README.md" | ||
| license = { text = "Apache-2.0" } | ||
| authors = [ | ||
| { name="Mart Ratas", email="mart.ratas@kcl.ac.uk" } | ||
| ] | ||
| requires-python = ">=3.10" | ||
|
|
||
| keywords = ["NLP", "NER", "medical", "MedCAT", "gliner"] | ||
|
|
||
| classifiers = [ | ||
| "Development Status :: 3 - Alpha", | ||
| "Intended Audience :: Science/Research", | ||
| "Topic :: Scientific/Engineering :: Artificial Intelligence", | ||
| "Programming Language :: Python :: 3", | ||
| "Programming Language :: Python :: 3.10", | ||
| "Programming Language :: Python :: 3.11", | ||
| "Programming Language :: Python :: 3.12", | ||
| "Programming Language :: Python :: 3.13", | ||
| "License :: OSI Approved :: Apache Software License" | ||
| ] | ||
|
|
||
| dependencies = [ | ||
| "medcat>=2.5", | ||
| "gliner", | ||
| ] | ||
|
|
||
| [project.optional-dependencies] | ||
| dev = [ | ||
| "ruff", | ||
| "mypy", | ||
| ] | ||
|
|
||
| # entry-points to add onto medcat | ||
| [project.entry-points."medcat.plugins"] | ||
| ner_gliner = "medcat_gliner" | ||
|
|
||
| [project.urls] | ||
| Homepage = "https://github.com/CogStack/medcat-ops/tree/main/medcat-gliner" | ||
| Repository = "https://github.com/CogStack/medcat-ops/tree/main/medcat-gliner" | ||
| Issues = "https://github.com/CogStack/medcat-ops/issues" | ||
|
|
||
| [tool.setuptools_scm] | ||
| root = ".." | ||
| tag_regex = "^medcat-gliner/v(?P<version>[0-9]+(?:\\.[0-9]+)*)$" | ||
| version_scheme = "post-release" | ||
| local_scheme = "no-local-version" | ||
|
|
||
| [tool.setuptools.packages.find] | ||
| where = ["src"] | ||
|
|
||
| [tool.setuptools.package-data] | ||
| "medcat_gliner" = ["py.typed"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| from .registration import do_registration as __register | ||
|
|
||
| __register() |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.