Reliability‑first actions and timeline extraction from unstructured doctor’s notes
BioBERT (NER + Action→Time linking) + deterministic date normalization
Paper draft: Read the manuscript (also at Paper/index.html)
Clinical outpatient notes often contain follow‑up instructions such as:
“Order MRI brain in two weeks.”
These instructions are crucial for scheduling, care coordination, and downstream EHR validation — but they are embedded in free text and can be ambiguous when multiple actions and time expressions appear in the same note.
Key challenge: robustly extract actions and their execution dates while avoiding arithmetic errors common in end‑to‑end text generation.
note_text(free text clinical note)visit_date(anchor date)
A JSON list of structured follow‑up items:
[
{
"action": "MRI Brain",
"period_text": "in 2 weeks",
"period_date": "2026-01-24"
}
]This decomposes into:
- Action span detection (what to do)
- Time span detection (when)
- Action→Time linking (which time belongs to which action)
- Date normalization anchored to
visit_date
The system consists of:
Clinical Note ↓ Sliding Window Tokenization ↓ BioBERT Encoder ↓ Head A: BIO NER (Action/Time spans) ↓ Head B: Biaffine Linker (Action→Time) ↓ Date Normalization (visit_date anchored) ↓ Structured JSON Output
We implement a joint multi‑task architecture with a shared BioBERT encoder feeding two heads:
- Tags:
O,B-ACT,I-ACT,B-TIME,I-TIME - Learns to identify Action spans and Time spans
- Uses weighted cross‑entropy to reduce bias toward
Otokens
- Builds a span representation per entity:
[start_state; end_state; width_embedding] - Scores compatibility between each Action and each Time span using:
- biaffine semantic compatibility
- distance embeddings to encode proximity bias
- a NONE option for actions with no explicit time
- Uses
dateparserwithvisit_dateas the relative base to compute exact ISO dates. - This separates semantic understanding (learned) from date arithmetic (deterministic), reducing hallucinated or inconsistent dates.
To avoid PHI/HIPAA risks, the project uses a synthetic dataset generated with an LLM under a controlled schema.
- Inputs:
note_textvisit_date
- Labels:
action_text,action_char_start,action_char_endtime_text,time_char_start,time_char_endperiod_date(ISO date computed fromvisit_date+time_text)
Clinical notes can exceed 512 tokens, so we use sliding windows:
MAX_LEN = 512DOC_STRIDE = 128
The overlap reduces boundary truncation and preserves context around entities.
We applied:
Template randomization
Section reordering
Multi-action injection
History distractors
Clinical shorthand generation (x2w, q6mo, RTC 3mo)
Surface-form variation (weeks vs days vs months)
Stress-test subsets include:
Proximity traps
List-swapping traps
History traps
Section ambiguity
Shorthand temporal noise
Recommended structure (matches course repo requirements):
.
├── Code/
│ ├── ll_project_follow_up_instruction_extraction_2k_dataset_submit.ipynb
│ └── requirements.txt
├── Data
│ └── synthetic_clinical_notes_2000.csv
├── Results/
│ ├── biobert_metrics.json
│ ├── chatgpt_metrics.json
│ └── llama_metrics.json
├── Slides/
│ ├── follow up instructions extraction final presentation pdf.pdf
│ ├── follow up instructions extraction final presentation.pptx
│ ├── follow up instructions extraction first presentation pdf.pdf
│ ├── follow up instructions extraction first presentation.pptx
│ ├── follow up instructions extraction interim presentation.pptx
│ └──follow up instructions extraction interim presentation pdf.pdf
├── Visuals/
│ ├── confusion matrix.png
│ ├── date error mae.png
│ ├── error distribution.png
│ ├── model comparison f1.png
│ └── note length variation by specialty plot.png
└── README.md
batch_size 16 --lr 2e-5 --epochs 20 --fp16
Training details:
- Mixed precision: FP16
- Early stopping: patience 4 (based on validation loss)
- Joint objective:
\( \mathcal{L} = \mathcal{L}_{NER} + \alpha\,\mathcal{L}_{LINK} \) with \(\alpha=1\)
---
Expected output columns include:
- predicted action text
- predicted period text
- normalized ISO date (`period_date`)
---
Reported metrics:
- NER span F1 (Action, Time)
- Linking F1 (Action→Time)
- End‑to‑End Action+Date F1
- Date MAE (days)
---
## Results Summary (synthetic held‑out)
Reported in the project report on a held‑out synthetic test split:
- **NER span F1:** > 99.7%
- **Linking F1:** > 98.2%
- **End‑to‑end date accuracy:** ~ 98.4%
--
## Ethics & Data Privacy
- **No real patient data** is included in this repository.
- The dataset is synthetically generated to avoid PHI/HIPAA concerns.
- The system is intended as a research prototype; deployment requires governance, clinical validation, and privacy review.
---
## Limitations
- **Synthetic‑only training/evaluation** may not fully reflect real EHR note variability.
- Date normalization depends on `dateparser` coverage and assumptions (e.g., locale).
- Complex discourse cases (implicit times, cross‑sentence references) may require additional modeling or global constraints.
---
## Citation
If you use this code or dataset, cite as:
```bibtex
@misc{clinical_temporal_action_extraction_2026,
title = {Follow Up Instructions Extraction: Hybrid BioBERT + Deterministic Date Normalization},
author = {Michal Laufer and Alexander Apartsin and Yehudit Aperstein},
year = {2026},
}
