Hello !
I am trying to use medCAT with medCATTrainer in an active learning setup to label a subset of a large set of unannotated French documents.
In the medcattrainer paper ( https://arxiv.org/pdf/1907.07322.pdf ), in section 3.2 Active Learning, it's specified the use of selective certainty-based sampling to guide the sampling of documents to annotate.
But the only parameter I found related to active learning in MedCATTrainer is the "train_model_on_submit" parameter in ProjectAnnotateEntities.
|
train_model_on_submit = models.BooleanField(default=True, help_text='Active learning - configured CDB is trained ' |
From what I found, this parameter is responsible for a call to the train_medcat function when a document is submitted, but it seems to have no influence on the order/sampling of documents in the project annotation interface.
Is there another option I missed or misunderstood that allows for replicating the certainty-based sampling described in the paper?
Or does this part need to be done outside of MedCATTrainer with the creation of a new project at each annotation step containing only the sampled documents?
By the way, thank you for this amazing tool !