GitHub - wanadzhar913/multi-application-model-serving-ray-serve: This small project serves Qwen3 embedding and reranker models using Ray Serve, specificially following the Multi-Application pattern on the Ray Serve docs.

Introduction

Our aim is to serve Qwen/Qwen3-Embedding-0.6B and Qwen/Qwen3-Reranker-0.6B using Ray Serve, specificially, we're aiming to follow the Multi-application design pattern.

How to setup your environment for testing & development

OPTIONAL (if you're CUDA drivers aren't updated, etc.): On VSCode, set up the devcontainer.json by clicking CTRL + SHIFT + p > Reopen in Container.

Set up uv. Really goated package manager. It's blazing fast! Other methods here.

pip install --upgrade pip \
pip install uv \
# uv self update

Once everything is set up run the below:

uv venv
source .venv/bin/activate
uv pip install -r pyproject.toml --group dev # to add dev dependencies

How to deploy the project (Locally)

Run the below and you should see the dashboard pop up at http://localhost:8265/#/serve.*

serve build app.text_embedding:app -o config.yaml # generate `config.yaml` (if you haven't)
ray start --head --dashboard-port=8265
serve run config.yaml

ray stop # shut down ray cluster once your done testing

How to deploy the project (with Docker)

docker build -t ray-embedding-service .
docker run -it --rm --gpus all -p 8000:8000 -p 8265:8265 -p 6379:6379 ray-embedding-service

To do's

Figure out why Ray Dashboard isn't showing up at port 8265
Serve Reranker model
Add dynamic check to see if Ray Cluster is up in scripts/entrypoint.sh
Use smaller Docker Image for Dockerfile

Resources

Multi-application for Ray Serve (main project inspiration): https://docs.ray.io/en/latest/serve/multi-app.html
For FastAPI integration: https://github.com/ray-project/ray/blob/cfcc68f13798eb5c2c9888a089d4b9c95d21b7fc/python/ray/serve/tests/test_fastapi.py#L153-L325
How to install flash-attn with --no-build-isolation using uv: astral-sh/uv#6437 (comment) & https://docs.astral.sh/uv/concepts/projects/config/#build-isolation
Devcontainer: Diff between Remote & Container Users: https://stackoverflow.com/questions/67468439/vs-code-devcontainers-what-is-the-difference-between-remoteuser-and-containeru
Ray Dashboard is empty: https://discuss.ray.io/t/ray-dashboard-is-empty/12883/6 *(we solved this by bumping up Ray version from 2.8.2 to 2.9.0)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.devcontainer		.devcontainer
app		app
images		images
notebooks		notebooks
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

How to setup your environment for testing & development

How to deploy the project (Locally)

How to deploy the project (with Docker)

To do's

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

How to setup your environment for testing & development

How to deploy the project (Locally)

How to deploy the project (with Docker)

To do's

Resources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages