Skip to content

feat(nlp+tasks): Add sentence-level VADER sentiment scoring and MIMIC…#968

Open
vtewari2 wants to merge 1 commit intosunlabuiuc:masterfrom
vtewari2:pr/uiuccs598dlh/paper-sentiment-analysis
Open

feat(nlp+tasks): Add sentence-level VADER sentiment scoring and MIMIC…#968
vtewari2 wants to merge 1 commit intosunlabuiuc:masterfrom
vtewari2:pr/uiuccs598dlh/paper-sentiment-analysis

Conversation

@vtewari2
Copy link
Copy Markdown

Summary

Adds clinical-text sentiment scoring infrastructure and a MIMIC-III task
that implements the negative-sentiment mistrust proxy from:

Boag et al. "Racial Disparities and Mistrust in End-of-Life Care."
MLHC 2018. arXiv:1808.03827

This addresses a gap in PyHealth: no support for sentiment analysis as a
clinical feature, and no mechanism to extract affective signals from
unstructured discharge notes for downstream ML tasks.


The saturation problem with full-text VADER

Standard full-text VADER compound scoring is unsuitable for clinical notes:

Approach std Usable
Full-text VADER compound saturates at −1.0 for >94% of discharge notes
Sentence-level mean compound std ≈ 0.086, well-distributed

Clinical discharge language is lexically negative ("pain", "failure",
"respiratory distress"), causing full-text VADER to bottom out.
Sentence-level averaging avoids saturation and closely approximates
the word-averaged pattern.en approach used in the original paper.


Changes 08:40 [88/1982]

pyhealth/nlp/__init__.py (new)

Initialises pyhealth.nlp as a proper Python package and exports
SentimentScorer and normalize_sentiment_scores.

pyhealth/nlp/sentiment_scorer.py (new)

SentimentScorer

  • Wraps NLTK VADER with sentence-level averaging via sent_tokenize
  • Stateless and thread-safe after initialisation; NLTK imports are
    deferred to instantiation time (lazy import)
  • score(text) → mean sentence-level VADER compound in [−1.0, +1.0]
  • score_batch(texts) → batch variant
  • negate_and_zscore(raw_scores) → applies Boag et al. normalisation:
    neg_score = -(raw - μ) / σ

normalize_sentiment_scores(sample_dataset, feature_key)

  • Post-task utility that Z-score normalises the neg_sentiment column
    across all samples in a SampleDataset in-place
  • Z-scoring requires the global mean/std across all patients, so it
    cannot be done inside __call__ (which processes one patient at a
    time) — this utility fills that gap

pyhealth/tasks/sentiment_mimic3.py (new)

MistrustSentimentMIMIC3

input_schema = {"neg_sentiment": "tensor"} # 1-element list [float]
output_schema = {"noncompliance": "binary"} # configurable via output_label

For each admission:

  1. Collect all NOTEEVENTS rows where CATEGORY = 'Discharge summary'
  2. Score each note with SentimentScorer.score()
  3. Average across notes: raw_mean = mean(note_scores)
  4. Negate: raw_neg = -raw_mean (higher → more negative → more mistrust)
  5. Derive binary label from NOTEEVENTS (noncompliance or autopsy consent)
  6. Return {"neg_sentiment": [raw_neg], output_label: 0/1}

output_label parameter (default "noncompliance") aligns the output
schema with MistrustNoncomplianceMIMIC3 or MistrustAutopsyMIMIC3
for direct comparison across all three mistrust proxies.

NLTK initialisation is lazy — no import at module load time.

pyhealth/tasks/__init__.py

Exports MistrustSentimentMIMIC3.


Usage

from pyhealth.datasets import MIMIC3Dataset, split_by_patient, get_dataloader
from pyhealth.tasks import MistrustSentimentMIMIC3
from pyhealth.nlp import SentimentScorer, normalize_sentiment_scores
from pyhealth.models import LogisticRegression
from pyhealth.trainer import Trainer

# 1. Load dataset — only NOTEEVENTS needed
base_dataset = MIMIC3Dataset(
    root="/path/to/mimic-iii/1.4",
    tables=["NOTEEVENTS"],
)                                                 

# 2. Set task — scores discharge notes with sentence-level VADER
sample_dataset = base_dataset.set_task(MistrustSentimentMIMIC3())

# 3. Z-score normalise in-place (requires global stats — must be post set_task)
normalize_sentiment_scores(sample_dataset)

# 4. Train                                        
train_ds, val_ds, test_ds = split_by_patient(sample_dataset, [0.7, 0.15, 0.15])
model = LogisticRegression(dataset=sample_dataset)
trainer = Trainer(model=model)
trainer.train(                                    
    train_dataloader=get_dataloader(train_ds, batch_size=256, shuffle=True),
    val_dataloader=get_dataloader(val_ds, batch_size=256, shuffle=False),
    epochs=50,                                    
    monitor="roc_auc",                            
)                                                 
print(trainer.evaluate(get_dataloader(test_ds, batch_size=256)))
# Expected AUC-ROC: ~0.53–0.56 (weaker signal than supervised mistrust models)

---                                               
Expected results (MIMIC-III v1.4)

┌──────────────────────────────────┬───────────────────────────┐
│              MetricValue           │
├──────────────────────────────────┼───────────────────────────┤
│ Discharge notes scored59,652                    │
├──────────────────────────────────┼───────────────────────────┤
│ Unique hadm_ids52,726                    │
├──────────────────────────────────┼───────────────────────────┤
│ EOL cohort coverage96.8%                     │
├──────────────────────────────────┼───────────────────────────┤
│ Raw score mean / std             │ −0.069 / 0.067            │
├──────────────────────────────────┼───────────────────────────┤
│ White vs Black (neg_score) MWU p0.106 (direction correct) │
└──────────────────────────────────┴───────────────────────────┘

The sentiment score is the weakest of the three mistrust proxies
(Pearson r with noncompliance score: +0.10; with autopsy score: −0.08)
but contributes additive signal in the BASELINE+ALL outcome model
(mortality AUC 0.6290.661).

---                                               
Dependencies                                      

- nltk with vader_lexicon and punkt_tab corpora downloaded:
pip install nltk                                  
python -c "import nltk; nltk.download('vader_lexicon'); nltk.download('punkt_tab')"
- MIMIC-III v1.4 with PhysioNet credentialed access

References                                        

┌───────────────────┬────────────────────────────────────────────────────┐
│     ResourceLink                        │
├───────────────────┼────────────────────────────────────────────────────┤
│ Paper (MLHC 2018) │ https://arxiv.org/abs/1808.03827                   │
├───────────────────┼────────────────────────────────────────────────────┤
│ NLTK VADERhttps://www.nltk.org/api/nltk.sentiment.vader.html │
├───────────────────┼────────────────────────────────────────────────────┤
│ MIMIC-III v1.4https://physionet.org/content/mimiciii/1.4/        │
├───────────────────┼────────────────────────────────────────────────────┤
│ CourseUIUC CS 598 DLH                                    │
├───────────────────┼────────────────────────────────────────────────────┤
│ ```               │                                                    │
└───────────────────┴────────────────────────────────────────────────────┘

…-III mistrust task

Implements the negative-sentiment mistrust proxy from Boag et al. 2018
"Racial Disparities and Mistrust in End-of-Life Care" (MLHC 2018,
arXiv:1808.03827) using sentence-level VADER averaging to avoid the
full-text saturation problem specific to clinical discharge notes.

pyhealth/nlp/__init__.py  [new]
  - Initialise pyhealth.nlp as a proper Python package
  - Export SentimentScorer and normalize_sentiment_scores

pyhealth/nlp/sentiment_scorer.py  [new]
  - SentimentScorer: wraps NLTK VADER with sentence-level averaging
    (score each sentence, take mean) — avoids full-text compound
    saturation (>94% of clinical notes saturate at -1.0)
  - score(text): mean sentence-level VADER compound score for a document
  - score_batch(texts): batch variant
  - negate_and_zscore(raw_scores): applies Boag et al. normalisation:
    neg_score = -(raw - mu) / sigma
  - normalize_sentiment_scores(sample_dataset): post-task Z-score
    normalisation utility for MistrustSentimentMIMIC3 samples

pyhealth/tasks/sentiment_mimic3.py  [new]
  - MistrustSentimentMIMIC3: extracts discharge summary notes from
    NOTEEVENTS, scores with SentimentScorer, returns:
      input:  neg_sentiment (tensor, 1-element list, raw negated score)
      output: noncompliance or autopsy_consent (binary, configurable)
  - Lazy NLTK initialisation (no import at module load time)
  - output_label param: switch between noncompliance / autopsy_consent
    to align with MistrustNoncomplianceMIMIC3 / MistrustAutopsyMIMIC3

pyhealth/tasks/__init__.py
  - Export MistrustSentimentMIMIC3

Co-Authored-By: Varun Tewari <vtewari2@illinois.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant