Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
---
name: " \U0001F41E Bug report "
about: Create a report to help us improve
title: ''
title: ""
labels: bug
assignees: ''

assignees: ""
---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:

1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
Expand All @@ -24,8 +24,9 @@ A clear and concise description of what you expected to happen.
If applicable, add screenshots to help explain your problem.

**Desktop (please complete the following information):**
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]

- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]

**Additional context**
Add any other context about the problem here.
5 changes: 2 additions & 3 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
---
name: "\U0001F680 Feature request"
about: Suggest an idea for this project
title: ''
title: ""
labels: enhancement
assignees: ''

assignees: ""
---

**Is your feature request related to a problem? Please describe.**
Expand Down
35 changes: 33 additions & 2 deletions .github/workflows/lint-test-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,32 @@ jobs:
key: datasets-${{ hashFiles('datasets/**') }}
- run: datasets/populate

format-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/setup-node@v6
with:
node-version-file: .nvmrc
- uses: actions/cache@v5
with:
path: |
~/.npm
~/.cache/Cypress
key: npm-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
- run: npm ci
- run: npm run format:check

lint-most:
needs: [build-cli, build-lib, build-lib-node, build-lib-web, build-server, build-webapp]
needs:
[
build-cli,
build-lib,
build-lib-node,
build-lib-web,
build-server,
build-webapp,
]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
Expand Down Expand Up @@ -175,7 +199,14 @@ jobs:
working-directory: docs/examples

test-most:
needs: [build-lib, build-lib-node, build-lib-web, build-server, download-datasets]
needs:
[
build-lib,
build-lib-node,
build-lib-web,
build-server,
download-datasets,
]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/record-cypress.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ jobs:
working-directory: webapp
install: false
start: npm start
wait-on: 'http://localhost:8081' # Waits for above
wait-on: "http://localhost:8081" # Waits for above
# Records to Cypress Cloud
# https://docs.cypress.io/guides/cloud/projects#Set-up-a-project-to-record
record: true
Expand Down
2 changes: 2 additions & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
package-lock.json
datasets
43 changes: 25 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,42 +2,49 @@
<img
src="https://storage.googleapis.com/deai-313515.appspot.com/gifs/disco_middle.gif"
/>
</p>
</p>

# **DISCO** - DIStributed COllaborative Machine Learning

DISCO leverages federated :star2: and decentralized :sparkles: learning to allow several data owners to collaboratively build machine learning models without sharing any original data.

The latest version is always running on the following link, for web and mobile:

<p align="center">
<b> :man_dancing: https://discolab.ai/ :man_dancing:</b>
</p>

___
---

:magic_wand: **DEVELOPERS:** DISCO is written fully in JavaScript/TypeScript. Have a look at our [developer guide](DEV.md).
___

:question: **WHY DISCO?**
---

:question: **WHY DISCO?**

- To build deep learning models across private datasets without compromising data privacy, ownership, sovereignty, or model performance
- To create an easy-to-use platform that allows non-specialists to participate in collaborative learning

___
---

:gear: **HOW DISCO WORKS**
- DISCO has a *public model – private data* approach
- Private and secure model updates – *not data* – are communicated to either:
- a central server : **federated** learning ( :star2: )
- directly between users : **decentralized** learning ( :sparkles: ) i.e. no central coordination

- DISCO has a _public model – private data_ approach
- Private and secure model updates – _not data_ – are communicated to either:
- a central server : **federated** learning ( :star2: )
- directly between users : **decentralized** learning ( :sparkles: ) i.e. no central coordination
- Model updates are then securely aggregated into a trained model
- See more [HERE](https://discolab.ai/#/information)

___
:question: **DISCO TECHNOLOGY**
---

:question: **DISCO TECHNOLOGY**

- DISCO runs arbitrary deep learning tasks and model architectures in your browser, via [TF.js](https://www.tensorflow.org/js)
- Decentralized learning :sparkles: relies on [peer2peer](https://github.com/feross/simple-peer) communication
- Have a look at how DISCO ensures privacy and confidentiality [HERE](docs/PRIVACY.md)

___
---

:test_tube: **RESEARCH-BASED DESIGN**

Expand All @@ -56,13 +63,13 @@ And more on the roadmap
- :mirror: personalizable ([R10](https://arxiv.org/abs/2103.00710))
- :carrot: fairly incentivizing participation

___

---

:checkered_flag: **HOW TO USE DISCO**
- Start by exploring our examples tasks in the [`DISCOllaboratives` page](https://discolab.ai/#/list).

- Start by exploring our examples tasks in the [`DISCOllaboratives` page](https://discolab.ai/#/list).
- The example DISCOllaboratives are based on popular ML tasks such as [GPT2](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf), [Titanic](https://www.kaggle.com/c/titanic), [MNIST](https://www.kaggle.com/c/digit-recognizer) or [CIFAR-10](https://www.kaggle.com/pankrzysiu/cifar10-python)
- It is also possible to create your own DISCOllaboratives without coding on the [custom training page](https://discolab.ai/#/create):
- Upload the initial model
- Choose between federated and decentralized for your DISCO training scheme ... connect your data and... done! :bar_chart:
- For more details on ML tasks and custom training have a look at [this guide](./docs/TASK.md)
- Upload the initial model
- Choose between federated and decentralized for your DISCO training scheme ... connect your data and... done! :bar_chart:
- For more details on ML tasks and custom training have a look at [this guide](./docs/TASK.md)
4 changes: 2 additions & 2 deletions app.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Custom means that it will use the Dockerfile
runtime: custom
runtime: custom

# Flex environment required for WebSocket support, which is required for PeerJS.
env: flex
env: flex

# Limit resources to one instance, one CPU, very little memory or disk.
instance_class: F1
12 changes: 11 additions & 1 deletion cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,23 +27,32 @@ npm -w cli start -- --help # or -h
```

## Command arguments

Based on the task specification, we can adjust the command arguments. Available arguments are listed below.
Non-mandatory fields will automatically use values from the task specification.

### Test specification arguments

- `testID`: (mandatory) arbitrary test ID defined by the user for the test run
- `task`: (mandatory) pre-defined task (adding a new task is described in the next section)
- `numberOfUsers`: number of users participating in the learning round
- `save`: whether to save the logs of the test run

### Learning hyperparameters

- `epochs`: total number of training epochs
- `roundDuration`: number of epochs per round
- `batchSize`: batch size
- `validationSplit`: ratio of the validation set used for evaluation

### Aggregator parameters

- `aggregator`: aggregator specification
- `clippingRadius`, `maxIterations`, `beta`: (optional, for byzantine aggregator settings) byzantine aggregator hyperparameters
- `maxShareValue`: (optional, for secure aggregator settings) secure aggregator hyperparameter

### Differential Privacy parameters

- `epsilon`, `delta`, `dpDefaultClippingRadius`: (optional, for testing with differential privacy) differential privacy hyperparameters

## Adding new tasks
Expand Down Expand Up @@ -80,18 +89,19 @@ The CLI includes a script to evaluate GPT models on the [HellaSwag](https://rowa
To run the evaluation: `npm -w cli run hellaswag_gpt`

The script benchmarks the following models:

- A TensorFlow.js implementation of GPT (`gpt-tfjs`)
- A pre-trained ONNX model (`Xenova/gpt2`)

Both models are evaluated using a shared tokenizer (`Xenova/gpt2`), and the script reports:

- Accuracy (proportion of correct multiple-choice predictions)
- Total evaluation time (in seconds)

### Output

Results are printed to the console and saved to a log file: `../datasets/logFile_hellaswag.txt`


This allows for a direct comparison between the inference performance and accuracy of the two architectures.

The TFJS implementation is generally slower and more memory-intensive than ONNX, but offers compatibility with browser-based environments and custom training workflows. See the [Benchmarking GPT-TF.js](#benchmarking-gpt-tfjs) section for more details on performance tradeoffs.
48 changes: 24 additions & 24 deletions cli/package.json
Original file line number Diff line number Diff line change
@@ -1,26 +1,26 @@
{
"name": "cli",
"private": true,
"type": "module",
"main": "dist/cli.js",
"scripts": {
"watch": "nodemon --ext ts --ignore dist --watch ../discojs-node/dist --watch ../server/dist --watch . --exec npm run",
"start": "npm run build && node dist/cli.js",
"benchmark_gpt": "npm run build && node dist/benchmark_gpt.js",
"train_gpt": "npm run build && node dist/train_gpt.js",
"hellaswag_gpt": "npm run build && node dist/hellaswag_gpt.js",
"build": "tsc --build",
"test": ": nothing"
},
"author": "",
"license": "ISC",
"dependencies": {
"@epfml/discojs-node": "*",
"server": "*",
"tslib": "2"
},
"devDependencies": {
"nodemon": "3",
"ts-command-line-args": "2"
}
"name": "cli",
"private": true,
"type": "module",
"main": "dist/cli.js",
"scripts": {
"watch": "nodemon --ext ts --ignore dist --watch ../discojs-node/dist --watch ../server/dist --watch . --exec npm run",
"start": "npm run build && node dist/cli.js",
"benchmark_gpt": "npm run build && node dist/benchmark_gpt.js",
"train_gpt": "npm run build && node dist/train_gpt.js",
"hellaswag_gpt": "npm run build && node dist/hellaswag_gpt.js",
"build": "tsc --build",
"test": ": nothing"
},
"author": "",
"license": "ISC",
"dependencies": {
"@epfml/discojs-node": "*",
"server": "*",
"tslib": "2"
},
"devDependencies": {
"nodemon": "3",
"ts-command-line-args": "2"
}
}
Loading