OpenFDA BigData Pipeline enables collection, processing, and real-time presentation of data - on adverse drug events from the openFDA database.
The solution uses Apache Kafka as a message broker, Mongo DB as a document storage, Spring Boot for services and is Dockerized.
This repository contains the code for the openFDA BigData Pipeline solution
- openfda-producer it's a microservice build with Spring Boot and written in Java
- openfda-consumer it's a microservice build with Spring Boot and written in Java
- openfda-live-dashboard it's a web application build with Flask, Dash and written in Python
The project runs with the default configuration defined in each of services and in pipeline.yml. For more details refer directly to:
If you intend to try running project yourself, I have put together a pipeline.yml configuration that can help you get started.
Calling the following command
docker-compose -f pipeline.yml up
will:
- Start
openfda-producercontainer - Start
zookepercontainer - Start
kafkacontainer - Start
mongodbcontainer - Start
openfda-consumercontainer - Start
openfda-live-dashboardcontainer which will expose port8050 - Start
jupyter-notebookcontainer which will expose port8888
Once all your Docker containers are up and running you can access openfda-live-dashaboard web dashboard via a browser under the following URL:
In addition, you can access Jupyter Notebook jupyter-notebook via a browser under the following URL:
Bug reports and pull requests are welcome on GitHub at https://github.com/koziolk/openfda-bigdata-pipeline

