drug_name_generator

This is a deployable dash app that generates [new] drug names, and displays them in a word cloud.

Installing Dependencies

To install dependencies, run the following command:

pip install -r requirements.txt

Data Download and Preprocessing

You can use a direct download link to obtain current brand drug names from the FDA: https://www.fda.gov/media/89850/download?attachment

If you save product.txt file to the data folder, and then run the command: python3 app.py, this will (re)generate the files df_prefix.csv, df_middle.csv, df_suffix.csv in venv/data/

Since the app expects these files to exist in order to run, I have included them in the repo. This preprocessing step isn't necessary to run the app, but if you want to change any of the preprocessing steps slightly, then you may want to overwrite the included csvs.

Algorithm

The name generation is based on a simple algorithm that uses regexes to extract [plausible] prefix, middle, and suffix tokens from existing brand drug names. The tokens are generated using a few regexes: we split on one or two consecutive vowels (with preference for two consecutive vowels), and then recombine single characters at the beginning or the ends to create true prefixes and suffixes.

Consider the four word drug name "corpus" below.

EXAMPLES:

wegovy --> we + go + vy
xanax --> xa + na + x --> xa + nax
mounjaro --> mou + nja + ro
amyvid --> a + my + vi + d --> amy + vid (recombining single characters)

We can now extract the following information.

Prefixes: [we, xa, zia, amy]
Middle: [go, nja]
Suffixes: [vy, nd, na, vid]

Then we can calculate the count and relative frequency (probability) of each prefix, middle, and suffix within its set. In this simplified example, each prefix would have a count 1 and a probability of 1/4. To create a new drug name, we select a prefix, middle, and suffix at random based on their probabilities, such as we + nja + vy = wenjavy

Word Cloud Visualization

A silly drug name generator deserves a surprisingly elegant looking app with fun and completely unnecessary features. There are two sliders: (1) the temperature slider determines how selective we are (where a higher temperature means we allow lower probability tokens to be selected), and (2) the number of drug slider allows us to display anywhere from 5-15 newly generate drug names in a cloud. Moving either slider automatically generates new drug names, and clicking the [Regenerate Drug Names] button generates new drug names with the current slider settings. Higher probability drug names are displayed in a larger font.

Running the app

Using terminal, cd into this directory and run the following command: python3 app.py

Future Work

Currently there is only one corpus of brand drug names and the data set is rather small with only 6,986 drug names. It would be interesting to expand the size of this data set, and also introduce trade classes for specific types of drugs so that new drugs for a certain trade class can be constructed using prefix + middle + suffix from existing drugs in that aprticualr trade class.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
venv		venv
.gitignore		.gitignore
README.md		README.md
preprocess.py		preprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

drug_name_generator

Installing Dependencies

Data Download and Preprocessing

Algorithm

Word Cloud Visualization

Running the app

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

drug_name_generator

Installing Dependencies

Data Download and Preprocessing

Algorithm

Word Cloud Visualization

Running the app

Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages