Skip to content
This repository was archived by the owner on Sep 9, 2025. It is now read-only.
This repository was archived by the owner on Sep 9, 2025. It is now read-only.

Active learning sampling strategy #125

@aenglebert

Description

@aenglebert

Hello !

I am trying to use medCAT with medCATTrainer in an active learning setup to label a subset of a large set of unannotated French documents.

In the medcattrainer paper ( https://arxiv.org/pdf/1907.07322.pdf ), in section 3.2 Active Learning, it's specified the use of selective certainty-based sampling to guide the sampling of documents to annotate.

But the only parameter I found related to active learning in MedCATTrainer is the "train_model_on_submit" parameter in ProjectAnnotateEntities.

train_model_on_submit = models.BooleanField(default=True, help_text='Active learning - configured CDB is trained '

From what I found, this parameter is responsible for a call to the train_medcat function when a document is submitted, but it seems to have no influence on the order/sampling of documents in the project annotation interface.

Is there another option I missed or misunderstood that allows for replicating the certainty-based sampling described in the paper?

Or does this part need to be done outside of MedCATTrainer with the creation of a new project at each annotation step containing only the sampled documents?

By the way, thank you for this amazing tool !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions