Active learning sampling strategy

Hello !

I am trying to use medCAT with medCATTrainer in an active learning setup to label a subset of a large set of unannotated French documents.

In the medcattrainer paper ( https://arxiv.org/pdf/1907.07322.pdf ), in section 3.2 Active Learning, it's specified the use of selective certainty-based sampling to guide the sampling of documents to annotate.

But the only parameter I found related to active learning in MedCATTrainer is the "train_model_on_submit" parameter in ProjectAnnotateEntities. https://github.com/CogStack/MedCATtrainer/blob/ec7900feeaf753cfaf60353659827a60e201b0ff/webapp/api/api/models.py#L246 
From what I found, this parameter is responsible for a call to the train_medcat function when a document is submitted, but it seems to have no influence on the order/sampling of documents in the project annotation interface.

Is there another option I missed or misunderstood that allows for replicating the certainty-based sampling described in the paper?

Or does this part need to be done outside of MedCATTrainer with the creation of a new project at each annotation step containing only the sampled documents?

By the way, thank you for this amazing tool !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Active learning sampling strategy #125

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Active learning sampling strategy #125

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions