Skip to content

merillium/drug_name_generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

drug_name_generator

This is a deployable dash app that generates [new] drug names, and displays them in a word cloud.

alt text

Installing Dependencies

To install dependencies, run the following command:

pip install -r requirements.txt

Data Download and Preprocessing

You can use a direct download link to obtain current brand drug names from the FDA: https://www.fda.gov/media/89850/download?attachment

If you save product.txt file to the data folder, and then run the command: python3 app.py, this will (re)generate the files df_prefix.csv, df_middle.csv, df_suffix.csv in venv/data/

Since the app expects these files to exist in order to run, I have included them in the repo. This preprocessing step isn't necessary to run the app, but if you want to change any of the preprocessing steps slightly, then you may want to overwrite the included csvs.

Algorithm

The name generation is based on a simple algorithm that uses regexes to extract [plausible] prefix, middle, and suffix tokens from existing brand drug names. The tokens are generated using a few regexes: we split on one or two consecutive vowels (with preference for two consecutive vowels), and then recombine single characters at the beginning or the ends to create true prefixes and suffixes.

Consider the four word drug name "corpus" below.

EXAMPLES:

  1. wegovy --> we + go + vy
  2. xanax --> xa + na + x --> xa + nax
  3. mounjaro --> mou + nja + ro
  4. amyvid --> a + my + vi + d --> amy + vid (recombining single characters)

We can now extract the following information.

Prefixes: [we, xa, zia, amy]
Middle: [go, nja]
Suffixes: [vy, nd, na, vid]

Then we can calculate the count and relative frequency (probability) of each prefix, middle, and suffix within its set. In this simplified example, each prefix would have a count 1 and a probability of 1/4. To create a new drug name, we select a prefix, middle, and suffix at random based on their probabilities, such as we + nja + vy = wenjavy

Word Cloud Visualization

A silly drug name generator deserves a surprisingly elegant looking app with fun and completely unnecessary features. There are two sliders: (1) the temperature slider determines how selective we are (where a higher temperature means we allow lower probability tokens to be selected), and (2) the number of drug slider allows us to display anywhere from 5-15 newly generate drug names in a cloud. Moving either slider automatically generates new drug names, and clicking the [Regenerate Drug Names] button generates new drug names with the current slider settings. Higher probability drug names are displayed in a larger font.

Running the app

Using terminal, cd into this directory and run the following command: python3 app.py

Future Work

Currently there is only one corpus of brand drug names and the data set is rather small with only 6,986 drug names. It would be interesting to expand the size of this data set, and also introduce trade classes for specific types of drugs so that new drugs for a certain trade class can be constructed using prefix + middle + suffix from existing drugs in that aprticualr trade class.

About

An app displaying the results of a simple token-based drug name generating algorithm.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors