feat(nlp+tasks): Add sentence-level VADER sentiment scoring and MIMIC… by vtewari2 · Pull Request #968 · sunlabuiuc/PyHealth

vtewari2 · 2026-04-12T15:44:22Z

Summary

Adds clinical-text sentiment scoring infrastructure and a MIMIC-III task
that implements the negative-sentiment mistrust proxy from:

Boag et al. "Racial Disparities and Mistrust in End-of-Life Care."
MLHC 2018. arXiv:1808.03827

This addresses a gap in PyHealth: no support for sentiment analysis as a
clinical feature, and no mechanism to extract affective signals from
unstructured discharge notes for downstream ML tasks.

The saturation problem with full-text VADER

Standard full-text VADER compound scoring is unsuitable for clinical notes:

Approach	std	Usable
Full-text VADER compound	saturates at −1.0 for >94% of discharge notes	✗
Sentence-level mean compound	std ≈ 0.086, well-distributed	✓

Clinical discharge language is lexically negative ("pain", "failure",
"respiratory distress"), causing full-text VADER to bottom out.
Sentence-level averaging avoids saturation and closely approximates
the word-averaged pattern.en approach used in the original paper.

Changes 08:40 [88/1982]

`pyhealth/nlp/init.py` (new)

Initialises pyhealth.nlp as a proper Python package and exports
SentimentScorer and normalize_sentiment_scores.

`pyhealth/nlp/sentiment_scorer.py` (new)

SentimentScorer

Wraps NLTK VADER with sentence-level averaging via sent_tokenize
Stateless and thread-safe after initialisation; NLTK imports are
deferred to instantiation time (lazy import)
score(text) → mean sentence-level VADER compound in [−1.0, +1.0]
score_batch(texts) → batch variant
negate_and_zscore(raw_scores) → applies Boag et al. normalisation:
neg_score = -(raw - μ) / σ

normalize_sentiment_scores(sample_dataset, feature_key)

Post-task utility that Z-score normalises the neg_sentiment column
across all samples in a SampleDataset in-place
Z-scoring requires the global mean/std across all patients, so it
cannot be done inside __call__ (which processes one patient at a
time) — this utility fills that gap

`pyhealth/tasks/sentiment_mimic3.py` (new)

MistrustSentimentMIMIC3

input_schema = {"neg_sentiment": "tensor"} # 1-element list [float]
output_schema = {"noncompliance": "binary"} # configurable via output_label

For each admission:

Collect all NOTEEVENTS rows where CATEGORY = 'Discharge summary'
Score each note with SentimentScorer.score()
Average across notes: raw_mean = mean(note_scores)
Negate: raw_neg = -raw_mean (higher → more negative → more mistrust)
Derive binary label from NOTEEVENTS (noncompliance or autopsy consent)
Return {"neg_sentiment": [raw_neg], output_label: 0/1}

output_label parameter (default "noncompliance") aligns the output
schema with MistrustNoncomplianceMIMIC3 or MistrustAutopsyMIMIC3
for direct comparison across all three mistrust proxies.

NLTK initialisation is lazy — no import at module load time.

`pyhealth/tasks/init.py`

Exports MistrustSentimentMIMIC3.

Usage

from pyhealth.datasets import MIMIC3Dataset, split_by_patient, get_dataloader
from pyhealth.tasks import MistrustSentimentMIMIC3
from pyhealth.nlp import SentimentScorer, normalize_sentiment_scores
from pyhealth.models import LogisticRegression
from pyhealth.trainer import Trainer

# 1. Load dataset — only NOTEEVENTS needed
base_dataset = MIMIC3Dataset(
    root="/path/to/mimic-iii/1.4",
    tables=["NOTEEVENTS"],
)                                                 

# 2. Set task — scores discharge notes with sentence-level VADER
sample_dataset = base_dataset.set_task(MistrustSentimentMIMIC3())

# 3. Z-score normalise in-place (requires global stats — must be post set_task)
normalize_sentiment_scores(sample_dataset)

# 4. Train                                        
train_ds, val_ds, test_ds = split_by_patient(sample_dataset, [0.7, 0.15, 0.15])
model = LogisticRegression(dataset=sample_dataset)
trainer = Trainer(model=model)
trainer.train(                                    
    train_dataloader=get_dataloader(train_ds, batch_size=256, shuffle=True),
    val_dataloader=get_dataloader(val_ds, batch_size=256, shuffle=False),
    epochs=50,                                    
    monitor="roc_auc",                            
)                                                 
print(trainer.evaluate(get_dataloader(test_ds, batch_size=256)))
# Expected AUC-ROC: ~0.53–0.56 (weaker signal than supervised mistrust models)

---                                               
Expected results (MIMIC-III v1.4)

┌──────────────────────────────────┬───────────────────────────┐
│              Metric              │           Value           │
├──────────────────────────────────┼───────────────────────────┤
│ Discharge notes scored           │ 59,652                    │
├──────────────────────────────────┼───────────────────────────┤
│ Unique hadm_ids                  │ 52,726                    │
├──────────────────────────────────┼───────────────────────────┤
│ EOL cohort coverage              │ 96.8%                     │
├──────────────────────────────────┼───────────────────────────┤
│ Raw score mean / std             │ −0.069 / 0.067            │
├──────────────────────────────────┼───────────────────────────┤
│ White vs Black (neg_score) MWU p │ 0.106 (direction correct) │
└──────────────────────────────────┴───────────────────────────┘

The sentiment score is the weakest of the three mistrust proxies
(Pearson r with noncompliance score: +0.10; with autopsy score: −0.08)
but contributes additive signal in the BASELINE+ALL outcome model
(mortality AUC 0.629 → 0.661).

---                                               
Dependencies                                      

- nltk with vader_lexicon and punkt_tab corpora downloaded:
pip install nltk                                  
python -c "import nltk; nltk.download('vader_lexicon'); nltk.download('punkt_tab')"
- MIMIC-III v1.4 with PhysioNet credentialed access

References                                        

┌───────────────────┬────────────────────────────────────────────────────┐
│     Resource      │                        Link                        │
├───────────────────┼────────────────────────────────────────────────────┤
│ Paper (MLHC 2018) │ https://arxiv.org/abs/1808.03827                   │
├───────────────────┼────────────────────────────────────────────────────┤
│ NLTK VADER        │ https://www.nltk.org/api/nltk.sentiment.vader.html │
├───────────────────┼────────────────────────────────────────────────────┤
│ MIMIC-III v1.4    │ https://physionet.org/content/mimiciii/1.4/        │
├───────────────────┼────────────────────────────────────────────────────┤
│ Course            │ UIUC CS 598 DLH                                    │
├───────────────────┼────────────────────────────────────────────────────┤
│ ```               │                                                    │
└───────────────────┴────────────────────────────────────────────────────┘

…-III mistrust task Implements the negative-sentiment mistrust proxy from Boag et al. 2018 "Racial Disparities and Mistrust in End-of-Life Care" (MLHC 2018, arXiv:1808.03827) using sentence-level VADER averaging to avoid the full-text saturation problem specific to clinical discharge notes. pyhealth/nlp/__init__.py [new] - Initialise pyhealth.nlp as a proper Python package - Export SentimentScorer and normalize_sentiment_scores pyhealth/nlp/sentiment_scorer.py [new] - SentimentScorer: wraps NLTK VADER with sentence-level averaging (score each sentence, take mean) — avoids full-text compound saturation (>94% of clinical notes saturate at -1.0) - score(text): mean sentence-level VADER compound score for a document - score_batch(texts): batch variant - negate_and_zscore(raw_scores): applies Boag et al. normalisation: neg_score = -(raw - mu) / sigma - normalize_sentiment_scores(sample_dataset): post-task Z-score normalisation utility for MistrustSentimentMIMIC3 samples pyhealth/tasks/sentiment_mimic3.py [new] - MistrustSentimentMIMIC3: extracts discharge summary notes from NOTEEVENTS, scores with SentimentScorer, returns: input: neg_sentiment (tensor, 1-element list, raw negated score) output: noncompliance or autopsy_consent (binary, configurable) - Lazy NLTK initialisation (no import at module load time) - output_label param: switch between noncompliance / autopsy_consent to align with MistrustNoncomplianceMIMIC3 / MistrustAutopsyMIMIC3 pyhealth/tasks/__init__.py - Export MistrustSentimentMIMIC3 Co-Authored-By: Varun Tewari <vtewari2@illinois.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(nlp+tasks): Add sentence-level VADER sentiment scoring and MIMIC…#968

feat(nlp+tasks): Add sentence-level VADER sentiment scoring and MIMIC…#968
vtewari2 wants to merge 1 commit intosunlabuiuc:masterfrom
vtewari2:pr/uiuccs598dlh/paper-sentiment-analysis

vtewari2 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vtewari2 commented Apr 12, 2026

Summary

The saturation problem with full-text VADER

Changes 08:40 [88/1982]

pyhealth/nlp/__init__.py (new)

pyhealth/nlp/sentiment_scorer.py (new)

pyhealth/tasks/sentiment_mimic3.py (new)

pyhealth/tasks/__init__.py

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`pyhealth/nlp/init.py` (new)

`pyhealth/nlp/sentiment_scorer.py` (new)

`pyhealth/tasks/sentiment_mimic3.py` (new)

`pyhealth/tasks/init.py`