GitHub - pfizer-opensource/tio2-sentiment-analysis

This is a repository for the "Using sentiment analysis to quantify the relative desirability and acceptability of drug product attributes" publication.

This repository aims to aid reproduction of the methods described in the Sentiment Analysis Publication. Because the publication utilizes private and protected patient data, we have repeated the methods by annotating a small section of the DrugLib dataset, an open source dataset which contains public drug reviews. This dataset is provided in the 'data' folder, and will be used to highlight several features of this repo. Approximately 1500 entries in total were labeled from the public dataset. 304 of these entries were labeled by both reviewers. These entries achieved a 0.91 Kappa/IRR Score.

Here, we reproduce the two main analyses featured in the publication:

Frequency Analysis: Review Preprocessing + Training DistilBERT for classification of reviews
Sentiment Analysis and Visualization

Frequency analysis allows you to view the prevalence of certain complaints or comments within your data. These review classifications can be used to visualize trends and concepts present in drug review data. To avoid manual classification of every review in large datasets, we tune a DistilBERT model that helps predict leading terms/qualifiers. All the scripts relating to frequency analysis/tuning DistilBERT are primarily located in the 'scripts' folder. You can use 'omni_script.py' to train a model on your own dataset.

Sentiment analysis is a powerful complement to frequency analysis that not only shows the prevalence of certain opinions, but the extent to which these opinions matter to patients. The sentiment analysis tools allow you to determine the sentiment for your text on any given set of words, and then neatly visualize them. These are primarily located in the 'tools' folder. You can use 'sentimentSeeker' to calculate the sentiment on your own dataset. Refer to 'examples/tools_demo.ipynb' to see the tool in action.

Getting started:

conda create -n sentiment_analysis -f environment.yml
conda activate sentiment_analysis

This will create an environment and install all the prerequisite packages. However, there are a few things that are not yet installed.

If interested in training DistilBERT: You will also need to install NLTK. This is a one-time requirement, and an example is provided in the 'examples/demo.ipynb' notebook.

If interested in sentiment analysis: You will need to install SpaCy's en_core_web_lg. This is a one-time requirement, and an example is provided in the 'examples/tools_demo.ipynb' notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
examples		examples
scripts		scripts
tools		tools
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
omni_script.py		omni_script.py

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages