Skip to content

Add DiskANN benchmark pipeline to GitHub Actions#857

Open
YuanyuanTian-hh wants to merge 22 commits intomainfrom
user/tianyuanyuan/add-benchmark-pipeline
Open

Add DiskANN benchmark pipeline to GitHub Actions#857
YuanyuanTian-hh wants to merge 22 commits intomainfrom
user/tianyuanyuan/add-benchmark-pipeline

Conversation

@YuanyuanTian-hh
Copy link

@YuanyuanTian-hh YuanyuanTian-hh commented Mar 20, 2026

Add DiskANN benchmark pipeline to GitHub Actions

Summary

Adds an automated benchmark pipeline to GitHub Actions, enabling regression detection for every PR. The pipeline builds and searches two public ANN datasets, compares performance against a baseline branch, and gates on configurable thresholds for recall, QPS, latency, and I/O metrics.

What's Added

File Description
.github/workflows/benchmarks.yml Main workflow — 2 parallel jobs (Wikipedia-100K + OpenAI-100K), triggered via workflow_dispatch with configurable baseline ref
.github/workflows/benchmarks-aa.yml Daily A/A stability test (main vs main) at 9 AM UTC — creates a GitHub issue on failure to notify @microsoft/diskann-admin
.github/scripts/compare_disk_index_json_output.py Diffs two benchmark crate JSON outputs → CSV with deviation % for every metric
.github/scripts/benchmark_result_parse.py Parses diff CSV, validates against relative/absolute thresholds, posts PR comments on regression
diskann-benchmark/perf_test_inputs/wikipedia-100K-disk-index.json Benchmark config: 768-dim, inner_product, R=59, L=72, 4 threads
diskann-benchmark/perf_test_inputs/openai-100K-disk-index.json Benchmark config: 1536-dim, squared_l2, R=59, L=80, SQ_1_2.0, 4 threads

How It Works

  1. Checkout both current branch and baseline (defaults to main)
  2. Download public datasets from big-ann-benchmarks (Wikipedia-100K and OpenAI ArXiv-100K)
  3. Build & search disk index on both branches using diskann-benchmark crate
  4. Compare baseline vs target metrics (recall, QPS, latency, I/O, CPU time, etc.)
  5. Validate against configurable thresholds — fail the workflow if regression exceeds allowed %
  6. Upload artifacts (CSV + JSON) for 30-day retention

Datasets

Dataset Dimensions Distance Vectors Queries Groundtruth K
Wikipedia-100K 768 inner_product 100K 5,000 100
OpenAI ArXiv-100K 1,536 squared_l2 100K 20,000 100

Reliability Testing (A/A)

Ran 20 A/A workflow executions (both sides build identical benchmark code) to validate pipeline stability on shared GitHub runners:

Runs Passed Pass Rate
20 19 95%
  • Recall is deterministic: Wikipedia 99.87% ±0.005%, OpenAI 99.67% ±0.003%
  • QPS stable at 9.6 across all runs
  • The 1 failure was a false positive: mean_cpus deviated 13.19% on a noisy shared runner (threshold is ±10%)

Pipeline Runtime

Dataset Job Duration Baseline Step Target Step
Wikipedia-100K ~25 min ~11 min ~12.5 min
OpenAI-100K ~78 min ~38 min ~39 min

Both jobs run in parallel → ~78 min wall-clock per workflow run, gated by OpenAI-100K.

Before Merge TODO

  • Remove push trigger for feature branch (keep only workflow_dispatch)
  • Consider loosening mean_cpus threshold from ±10% to ±15–20% for shared runners

Yuanyuan Tian (from Dev Box) added 4 commits March 19, 2026 16:08
- Add benchmarks.yml workflow using workflow_dispatch, comparing current
  branch against a configurable baseline ref
- Add compare_disk_index_json_output.py to diff benchmark crate JSON outputs
  into a CSV suitable for benchmark_result_parse.py
- Add benchmark_result_parse.py for validating results and posting PR comments
- Add wikipedia-100K-disk-index.json benchmark config using the public
  Wikipedia-100K dataset from big-ann-benchmarks (100K Cohere embeddings,
  768-dim, cosine distance) to replace internal ADO datasets
@codecov-commenter
Copy link

codecov-commenter commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.45%. Comparing base (504b2ca) to head (5b0e7f7).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #857      +/-   ##
==========================================
+ Coverage   89.28%   90.45%   +1.17%     
==========================================
  Files         442      442              
  Lines       83009    83248     +239     
==========================================
+ Hits        74111    75301    +1190     
+ Misses       8898     7947     -951     
Flag Coverage Δ
miri 90.45% <ø> (+1.17%) ⬆️
unittests 90.41% <ø> (+1.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 72 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@YuanyuanTian-hh YuanyuanTian-hh marked this pull request as ready for review March 25, 2026 02:34
@YuanyuanTian-hh YuanyuanTian-hh requested review from a team and Copilot March 25, 2026 02:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a GitHub Actions workflow to run DiskANN macro-benchmarks (disk-index build + search) on two public 100K datasets, compare current vs a baseline ref, and validate performance regressions via CSV-based threshold checks.

Changes:

  • Add benchmark input configs for Wikipedia-100K and OpenAI ArXiv 100K disk-index benchmarks.
  • Add a new Benchmarks GitHub Actions workflow that runs baseline + target, diffs results, and uploads artifacts.
  • Add Python scripts to (1) diff disk-index JSON outputs into a CSV and (2) parse/validate the CSV against thresholds (optionally commenting on PRs).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
diskann-benchmark/perf_test_inputs/wikipedia-100K-disk-index.json Adds a disk-index build/search benchmark config for Wikipedia-100K.
diskann-benchmark/perf_test_inputs/openai-100K-disk-index.json Adds a disk-index build/search benchmark config for OpenAI ArXiv 100K.
.github/workflows/benchmarks.yml Introduces a workflow to run baseline vs current benchmarks, diff, validate, and upload artifacts.
.github/scripts/compare_disk_index_json_output.py Generates a comparison CSV from two disk-index benchmark JSON outputs.
.github/scripts/benchmark_result_parse.py Parses the comparison CSV, checks thresholds/contracts, and reports regressions (with optional PR commenting).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

metrics["mean_hops"] = sr.get("mean_hops", 0)
metrics["mean_io_time"] = sr.get("mean_io_time", 0)
metrics["mean_cpus"] = sr.get("mean_cpu_time", 0)
metrics["latency_95"] = sr.get("p999_latency", 0) # Use p999 as proxy for 95th percentile
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latency_95 is currently populated from p999_latency, which makes the reported/validated latency metric incorrect (the benchmark JSON includes p95_latency separately). Use the real p95 field (or rename the metric/key everywhere to p999) so the CSV and threshold checks match what they claim to measure.

Suggested change
metrics["latency_95"] = sr.get("p999_latency", 0) # Use p999 as proxy for 95th percentile
metrics["latency_95"] = sr.get("p95_latency", 0) # Use actual 95th percentile latency

Copilot uses AI. Check for mistakes.

# Total build time (in seconds)
build_time = build.get("build_time")
if build_time:
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_time is checked with a truthiness test (if build_time:), which will skip recording total_time if the value is 0. Prefer an explicit is not None check so 0 is handled consistently and the intent is clear.

Suggested change
if build_time:
if build_time is not None:

Copilot uses AI. Check for mistakes.
Comment on lines +351 to +381
# Parse values: [current, baseline, change%]
try:
value_current = float(values[0])
value_baseline = float(values[1])
change = float(values[2]) if values[2] else 0.0
except (ValueError, IndexError) as e:
print(f"ERROR: Failed to parse {category}/{metric}: {e}")
return True, f"Parse error for {category}/{metric}"

# Get threshold config
threshold_config = thresholds[category][metric]
threshold_pct = threshold_config[0]
direction = threshold_config[1]
contract_value = threshold_config[2]

# Check thresholds
target_range = get_target_change_range(threshold_pct, direction, mode)
threshold_failed = is_change_threshold_failed(change, target_range)
promise_broken, target_formatted = is_promise_broken(value_current, contract_value, direction)

if threshold_failed:
print(f"THRESHOLD FAILED: {category}/{metric} change={change}% allowed={format_interval(*target_range)}")
if promise_broken:
print(f"PROMISE BROKEN: {category}/{metric} value={value_current} required={target_formatted}")

if threshold_failed or promise_broken:
outcome = get_outcome_message(threshold_failed, promise_broken)
failed_rows.append(
f"| {category}/{metric} | {value_baseline} | {value_current} | "
f"{target_formatted} | {change}% | {format_interval(*target_range)} | {outcome} |"
)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_csv() appends values for every matching row, but check_thresholds() only reads values[0..2]. If the CSV contains multiple runs/jobs for the same category/metric, later entries will be ignored (or worse, the list will be misinterpreted). Consider storing each row as a structured triple (or iterating values in chunks of 3) and validating all entries.

Suggested change
# Parse values: [current, baseline, change%]
try:
value_current = float(values[0])
value_baseline = float(values[1])
change = float(values[2]) if values[2] else 0.0
except (ValueError, IndexError) as e:
print(f"ERROR: Failed to parse {category}/{metric}: {e}")
return True, f"Parse error for {category}/{metric}"
# Get threshold config
threshold_config = thresholds[category][metric]
threshold_pct = threshold_config[0]
direction = threshold_config[1]
contract_value = threshold_config[2]
# Check thresholds
target_range = get_target_change_range(threshold_pct, direction, mode)
threshold_failed = is_change_threshold_failed(change, target_range)
promise_broken, target_formatted = is_promise_broken(value_current, contract_value, direction)
if threshold_failed:
print(f"THRESHOLD FAILED: {category}/{metric} change={change}% allowed={format_interval(*target_range)}")
if promise_broken:
print(f"PROMISE BROKEN: {category}/{metric} value={value_current} required={target_formatted}")
if threshold_failed or promise_broken:
outcome = get_outcome_message(threshold_failed, promise_broken)
failed_rows.append(
f"| {category}/{metric} | {value_baseline} | {value_current} | "
f"{target_formatted} | {change}% | {format_interval(*target_range)} | {outcome} |"
)
# Values may contain multiple triples: [current, baseline, change%] * N
# Validate each triple independently.
for i in range(0, len(values), 3):
triple = values[i:i + 3]
if len(triple) < 3:
# Malformed data: incomplete triple
print(f"ERROR: Incomplete data for {category}/{metric} at index {i}: {triple}")
return True, f"Parse error for {category}/{metric}"
try:
value_current = float(triple[0])
value_baseline = float(triple[1])
change = float(triple[2]) if triple[2] else 0.0
except (ValueError, IndexError) as e:
print(f"ERROR: Failed to parse {category}/{metric} at index {i}: {e}")
return True, f"Parse error for {category}/{metric}"
# Get threshold config
threshold_config = thresholds[category][metric]
threshold_pct = threshold_config[0]
direction = threshold_config[1]
contract_value = threshold_config[2]
# Check thresholds
target_range = get_target_change_range(threshold_pct, direction, mode)
threshold_failed = is_change_threshold_failed(change, target_range)
promise_broken, target_formatted = is_promise_broken(value_current, contract_value, direction)
if threshold_failed:
print(f"THRESHOLD FAILED: {category}/{metric} change={change}% allowed={format_interval(*target_range)}")
if promise_broken:
print(f"PROMISE BROKEN: {category}/{metric} value={value_current} required={target_formatted}")
if threshold_failed or promise_broken:
outcome = get_outcome_message(threshold_failed, promise_broken)
failed_rows.append(
f"| {category}/{metric} | {value_baseline} | {value_current} | "
f"{target_formatted} | {change}% | {format_interval(*target_range)} | {outcome} |"
)

Copilot uses AI. Check for mistakes.
Comment on lines +490 to +491
choices=['aa', 'pr', 'lkg'],
help='Benchmark mode: aa=A/A test (symmetric), pr=PR test (directional), lkg=last known good'
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--mode accepts lkg, but the threshold logic treats any non-aa mode as PR-directional. Either implement distinct lkg behavior (and document it), or remove lkg from the accepted choices to avoid misleading callers.

Suggested change
choices=['aa', 'pr', 'lkg'],
help='Benchmark mode: aa=A/A test (symmetric), pr=PR test (directional), lkg=last known good'
choices=['aa', 'pr'],
help='Benchmark mode: aa=A/A test (symmetric), pr=PR test (directional)'

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +22
push:
branches:
- 'user/tianyuanyuan/add-benchmark-pipeline'
paths:
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow is titled as a general benchmark pipeline, but the push trigger is hard-coded to a single user branch. That means it won’t run for typical PRs/branches in this repo. Consider switching to pull_request and/or push on main (or removing the push trigger entirely if this is intended to be manual-only via workflow_dispatch).

Copilot uses AI. Check for mistakes.
Comment on lines +155 to +162
python .github/scripts/benchmark_result_parse.py \
--mode pr \
--file target/tmp/wikipedia-100K_change.csv
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY: ${{ github.repository }}
GITHUB_PR_NUMBER: ${{ github.event.pull_request.number }}
GITHUB_RUN_ID: ${{ github.run_id }}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GITHUB_PR_NUMBER is populated from github.event.pull_request.number, but this workflow currently triggers on workflow_dispatch/push (no pull_request payload), so this env var will be empty and PR commenting can’t work. Either add a pull_request trigger or gate PR-comment behavior on the event type / provide the PR number as an input.

Copilot uses AI. Check for mistakes.
@hildebrandmw
Copy link
Contributor

Hey @YuanyuanTian-hh — thanks for putting this together! Porting automated benchmark regression detection into our CI is definitely something we need!

I've been thinking about the overall architecture and want to share some thoughts for discussion among the broader team. This is a tricky design problem, and I think a bit of planning now will save us a lot of time in the future.

My main concern is cross-language hot-potato with results and data coupling. The pipeline currently flows as Rust structs -> JSON -> Python (CSV conversion) -> Python (threshold check + Markdown). That means the field names from #[derive(Serialize)] in our Rust benchmark structs are implicitly depended on by two Python scripts. If someone renames a field in Rust (say mean_cpu_time to cpu_time_mean), the Python won't crash, it'll silently read 0 via .get() defaults and either false-alarm or quietly pass. That kind of failure mode is really hard to debug because it looks like the pipeline is working fine. Or if it does fail, it's only after a long CI run.

Every step in this chain introduces a place where something can go wrong and with GitHub-centric checking scripts, it's difficult to validate changes locally.

The benchmark crate already owns the Rust types for these results. I think the right long-term approach is to have the benchmark crate itself handle the comparison and threshold checking with something like (this is just a sketch):

cargo run -p diskann-benchmark -- compare \
    baseline.json target.json \
    --thresholds thresholds.toml

This way:

  • Field renames either become compile errors, not silent bugs, or get fixed automatically.
  • We eliminate two serialization boundaries (JSON→CSV→threshold check becomes just: deserialize both JSONs into the same Rust types and compare).
  • The comparison logic is testable with cargo test instead of only being exercisable by running the full workflow.
  • Adding new metrics to the benchmark automatically makes them available for comparison.
  • A/B comparison becomes available for local runs as well, resulting in an overall more broadly useful tool.

Suggested path forward

I don't want this effort to go to waste, there's good stuff here. Here's what I'd suggest:

  1. Keep the workflow structure and benchmark JSON configs from this PR. This is good!
  2. Add comparison utilities diskann-benchmark: This would take two JSON output files and a threshold config, do the diff, and emit results directly. This would take a little design work to make robust and allow early detection of errors (rather than erroring at the very end of a run) but is much more impactful to the project long term.
  3. Simplify the workflow to call that single command and deduplicate the nearly identical jobs into a matrix strategy or reusable workflow.

I know this is a bigger scope change than you might have expected, and I'm sorry about that. I want to make sure we build this in a way that's maintainable as the benchmark crate evolves and can be reused more broadly. I'll put together an issue with enough detail that we can work through the design async — let me know if you have questions or want to discuss any of this further!

A smaller note on dataset downloads

The big-ann-benchmarks clone + create_dataset.py flow works, but it pulls in a full Python project (plus numpy/scipy) just to download a handful of data files. We could likely replace that step with a couple of curl commands, which would remove the transitive dependency and speed up the job. Not a blocker for now, just something to consider simplifying.

@arrayka
Copy link
Contributor

arrayka commented Mar 26, 2026

"A/B (branch vs main)" number in the description doesn't make sense. It is essentially the same as "A/A (branch vs itself)", no? why do we want to present both numbers?

type: string
push:
branches:
- 'user/tianyuanyuan/add-benchmark-pipeline'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this specific branch here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code allows the new pipeline to be tested on my branch, I will update it to 'main' before I merge it.

push:
branches:
- 'user/tianyuanyuan/add-benchmark-pipeline'
paths:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we limit the triggering to these paths?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code allows the new pipeline to be tested on my branch, I will update it to 'main' before I merge it.


name: Benchmarks

on:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want A/A testing to happen daily and fail if deviation is more than expected threshold - ideally notifying diskann-admins about failure. Is it enabled?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added in this PR as a separate workflow benchmarks-aa.yml.
It runs daily, any deviation beyond the calibrated range fails the job
. On failure, a notify-on-failure job automatically creates a GitHub issue and cc diskann-admin.
It will activate once this PR is merged to main.

@harsha-simhadri
Copy link
Contributor

Coudl the datasets be hosted on github artifacts? This would reduce the chance of upstream data not being available and potentually simplifies the code.

@YuanyuanTian-hh
Copy link
Author

"A/B (branch vs main)" number in the description doesn't make sense. It is essentially the same as "A/A (branch vs itself)", no? why do we want to present both numbers?

You are right. Since this PR only adds workflow files and scripts — it doesn't change the actual core library — the "A/B (branch vs main)" runs are building and running the exact same benchmark binary as the "A/A (main vs main)" runs. Both sides compile identical Rust code, so the results are functionally A/A in both cases. I consolidated the A/A and A/B tables in PR summary into a single "20 A/A runs, 19 passed, 95%" since both test identical benchmark code.

@YuanyuanTian-hh
Copy link
Author

Coudl the datasets be hosted on github artifacts? This would reduce the chance of upstream data not being available and potentually simplifies the code.

I pre downloaded the data here https://github.com/microsoft/DiskANN/releases/tag/benchmark-data-v1, now the pipeline will directly download them. 1d18ae5

@randybird
Copy link

Hey @YuanyuanTian-hh — thanks for putting this together! Porting automated benchmark regression detection into our CI is definitely something we need!

I've been thinking about the overall architecture and want to share some thoughts for discussion among the broader team. This is a tricky design problem, and I think a bit of planning now will save us a lot of time in the future.

My main concern is cross-language hot-potato with results and data coupling. The pipeline currently flows as Rust structs -> JSON -> Python (CSV conversion) -> Python (threshold check + Markdown). That means the field names from #[derive(Serialize)] in our Rust benchmark structs are implicitly depended on by two Python scripts. If someone renames a field in Rust (say mean_cpu_time to cpu_time_mean), the Python won't crash, it'll silently read 0 via .get() defaults and either false-alarm or quietly pass. That kind of failure mode is really hard to debug because it looks like the pipeline is working fine. Or if it does fail, it's only after a long CI run.

Every step in this chain introduces a place where something can go wrong and with GitHub-centric checking scripts, it's difficult to validate changes locally.

The benchmark crate already owns the Rust types for these results. I think the right long-term approach is to have the benchmark crate itself handle the comparison and threshold checking with something like (this is just a sketch):

cargo run -p diskann-benchmark -- compare \
    baseline.json target.json \
    --thresholds thresholds.toml

This way:

  • Field renames either become compile errors, not silent bugs, or get fixed automatically.
  • We eliminate two serialization boundaries (JSON→CSV→threshold check becomes just: deserialize both JSONs into the same Rust types and compare).
  • The comparison logic is testable with cargo test instead of only being exercisable by running the full workflow.
  • Adding new metrics to the benchmark automatically makes them available for comparison.
  • A/B comparison becomes available for local runs as well, resulting in an overall more broadly useful tool.

Suggested path forward

I don't want this effort to go to waste, there's good stuff here. Here's what I'd suggest:

  1. Keep the workflow structure and benchmark JSON configs from this PR. This is good!
  2. Add comparison utilities diskann-benchmark: This would take two JSON output files and a threshold config, do the diff, and emit results directly. This would take a little design work to make robust and allow early detection of errors (rather than erroring at the very end of a run) but is much more impactful to the project long term.
  3. Simplify the workflow to call that single command and deduplicate the nearly identical jobs into a matrix strategy or reusable workflow.

I know this is a bigger scope change than you might have expected, and I'm sorry about that. I want to make sure we build this in a way that's maintainable as the benchmark crate evolves and can be reused more broadly. I'll put together an issue with enough detail that we can work through the design async — let me know if you have questions or want to discuss any of this further!

A smaller note on dataset downloads

The big-ann-benchmarks clone + create_dataset.py flow works, but it pulls in a full Python project (plus numpy/scipy) just to download a handful of data files. We could likely replace that step with a couple of curl commands, which would remove the transitive dependency and speed up the job. Not a blocker for now, just something to consider simplifying.

Agree that we should adopt this new framework to keep it less error-prone. Please share the issue you have it.
And for this current PR, we can first land to have the original quality gate working immediately.

@YuanyuanTian-hh YuanyuanTian-hh force-pushed the user/tianyuanyuan/add-benchmark-pipeline branch from 1d18ae5 to f58e084 Compare March 27, 2026 05:59
@microsoft-github-policy-service

@YuanyuanTian-hh please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

Contribution License Agreement

This Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
and conveys certain license rights to Microsoft Corporation and its affiliates (“Microsoft”) for Your
contributions to Microsoft open source projects. This Agreement is effective as of the latest signature
date below.

  1. Definitions.
    “Code” means the computer software code, whether in human-readable or machine-executable form,
    that is delivered by You to Microsoft under this Agreement.
    “Project” means any of the projects owned or managed by Microsoft and offered under a license
    approved by the Open Source Initiative (www.opensource.org).
    “Submit” is the act of uploading, submitting, transmitting, or distributing code or other content to any
    Project, including but not limited to communication on electronic mailing lists, source code control
    systems, and issue tracking systems that are managed by, or on behalf of, the Project for the purpose of
    discussing and improving that Project, but excluding communication that is conspicuously marked or
    otherwise designated in writing by You as “Not a Submission.”
    “Submission” means the Code and any other copyrightable material Submitted by You, including any
    associated comments and documentation.
  2. Your Submission. You must agree to the terms of this Agreement before making a Submission to any
    Project. This Agreement covers any and all Submissions that You, now or in the future (except as
    described in Section 4 below), Submit to any Project.
  3. Originality of Work. You represent that each of Your Submissions is entirely Your original work.
    Should You wish to Submit materials that are not Your original work, You may Submit them separately
    to the Project if You (a) retain all copyright and license information that was in the materials as You
    received them, (b) in the description accompanying Your Submission, include the phrase “Submission
    containing materials of a third party:” followed by the names of the third party and any licenses or other
    restrictions of which You are aware, and (c) follow any other instructions in the Project’s written
    guidelines concerning Submissions.
  4. Your Employer. References to “employer” in this Agreement include Your employer or anyone else
    for whom You are acting in making Your Submission, e.g. as a contractor, vendor, or agent. If Your
    Submission is made in the course of Your work for an employer or Your employer has intellectual
    property rights in Your Submission by contract or applicable law, You must secure permission from Your
    employer to make the Submission before signing this Agreement. In that case, the term “You” in this
    Agreement will refer to You and the employer collectively. If You change employers in the future and
    desire to Submit additional Submissions for the new employer, then You agree to sign a new Agreement
    and secure permission from the new employer before Submitting those Submissions.
  5. Licenses.
  • Copyright License. You grant Microsoft, and those who receive the Submission directly or
    indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license in the
    Submission to reproduce, prepare derivative works of, publicly display, publicly perform, and distribute
    the Submission and such derivative works, and to sublicense any or all of the foregoing rights to third
    parties.
  • Patent License. You grant Microsoft, and those who receive the Submission directly or
    indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license under
    Your patent claims that are necessarily infringed by the Submission or the combination of the
    Submission with the Project to which it was Submitted to make, have made, use, offer to sell, sell and
    import or otherwise dispose of the Submission alone or with the Project.
  • Other Rights Reserved. Each party reserves all rights not expressly granted in this Agreement.
    No additional licenses or rights whatsoever (including, without limitation, any implied licenses) are
    granted by implication, exhaustion, estoppel or otherwise.
  1. Representations and Warranties. You represent that You are legally entitled to grant the above
    licenses. You represent that each of Your Submissions is entirely Your original work (except as You may
    have disclosed under Section 3). You represent that You have secured permission from Your employer to
    make the Submission in cases where Your Submission is made in the course of Your work for Your
    employer or Your employer has intellectual property rights in Your Submission by contract or applicable
    law. If You are signing this Agreement on behalf of Your employer, You represent and warrant that You
    have the necessary authority to bind the listed employer to the obligations contained in this Agreement.
    You are not expected to provide support for Your Submission, unless You choose to do so. UNLESS
    REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING, AND EXCEPT FOR THE WARRANTIES
    EXPRESSLY STATED IN SECTIONS 3, 4, AND 6, THE SUBMISSION PROVIDED UNDER THIS AGREEMENT IS
    PROVIDED WITHOUT WARRANTY OF ANY KIND, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY OF
    NONINFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
  2. Notice to Microsoft. You agree to notify Microsoft in writing of any facts or circumstances of which
    You later become aware that would make Your representations in this Agreement inaccurate in any
    respect.
  3. Information about Submissions. You agree that contributions to Projects and information about
    contributions may be maintained indefinitely and disclosed publicly, including Your name and other
    information that You submit with Your Submission.
  4. Governing Law/Jurisdiction. This Agreement is governed by the laws of the State of Washington, and
    the parties consent to exclusive jurisdiction and venue in the federal courts sitting in King County,
    Washington, unless no federal subject matter jurisdiction exists, in which case the parties consent to
    exclusive jurisdiction and venue in the Superior Court of King County, Washington. The parties waive all
    defenses of lack of personal jurisdiction and forum non-conveniens.
  5. Entire Agreement/Assignment. This Agreement is the entire agreement between the parties, and
    supersedes any and all prior agreements, understandings or communications, written or oral, between
    the parties relating to the subject matter hereof. This Agreement may be assigned by Microsoft.

1 similar comment
@microsoft-github-policy-service

@YuanyuanTian-hh please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

Contribution License Agreement

This Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
and conveys certain license rights to Microsoft Corporation and its affiliates (“Microsoft”) for Your
contributions to Microsoft open source projects. This Agreement is effective as of the latest signature
date below.

  1. Definitions.
    “Code” means the computer software code, whether in human-readable or machine-executable form,
    that is delivered by You to Microsoft under this Agreement.
    “Project” means any of the projects owned or managed by Microsoft and offered under a license
    approved by the Open Source Initiative (www.opensource.org).
    “Submit” is the act of uploading, submitting, transmitting, or distributing code or other content to any
    Project, including but not limited to communication on electronic mailing lists, source code control
    systems, and issue tracking systems that are managed by, or on behalf of, the Project for the purpose of
    discussing and improving that Project, but excluding communication that is conspicuously marked or
    otherwise designated in writing by You as “Not a Submission.”
    “Submission” means the Code and any other copyrightable material Submitted by You, including any
    associated comments and documentation.
  2. Your Submission. You must agree to the terms of this Agreement before making a Submission to any
    Project. This Agreement covers any and all Submissions that You, now or in the future (except as
    described in Section 4 below), Submit to any Project.
  3. Originality of Work. You represent that each of Your Submissions is entirely Your original work.
    Should You wish to Submit materials that are not Your original work, You may Submit them separately
    to the Project if You (a) retain all copyright and license information that was in the materials as You
    received them, (b) in the description accompanying Your Submission, include the phrase “Submission
    containing materials of a third party:” followed by the names of the third party and any licenses or other
    restrictions of which You are aware, and (c) follow any other instructions in the Project’s written
    guidelines concerning Submissions.
  4. Your Employer. References to “employer” in this Agreement include Your employer or anyone else
    for whom You are acting in making Your Submission, e.g. as a contractor, vendor, or agent. If Your
    Submission is made in the course of Your work for an employer or Your employer has intellectual
    property rights in Your Submission by contract or applicable law, You must secure permission from Your
    employer to make the Submission before signing this Agreement. In that case, the term “You” in this
    Agreement will refer to You and the employer collectively. If You change employers in the future and
    desire to Submit additional Submissions for the new employer, then You agree to sign a new Agreement
    and secure permission from the new employer before Submitting those Submissions.
  5. Licenses.
  • Copyright License. You grant Microsoft, and those who receive the Submission directly or
    indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license in the
    Submission to reproduce, prepare derivative works of, publicly display, publicly perform, and distribute
    the Submission and such derivative works, and to sublicense any or all of the foregoing rights to third
    parties.
  • Patent License. You grant Microsoft, and those who receive the Submission directly or
    indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license under
    Your patent claims that are necessarily infringed by the Submission or the combination of the
    Submission with the Project to which it was Submitted to make, have made, use, offer to sell, sell and
    import or otherwise dispose of the Submission alone or with the Project.
  • Other Rights Reserved. Each party reserves all rights not expressly granted in this Agreement.
    No additional licenses or rights whatsoever (including, without limitation, any implied licenses) are
    granted by implication, exhaustion, estoppel or otherwise.
  1. Representations and Warranties. You represent that You are legally entitled to grant the above
    licenses. You represent that each of Your Submissions is entirely Your original work (except as You may
    have disclosed under Section 3). You represent that You have secured permission from Your employer to
    make the Submission in cases where Your Submission is made in the course of Your work for Your
    employer or Your employer has intellectual property rights in Your Submission by contract or applicable
    law. If You are signing this Agreement on behalf of Your employer, You represent and warrant that You
    have the necessary authority to bind the listed employer to the obligations contained in this Agreement.
    You are not expected to provide support for Your Submission, unless You choose to do so. UNLESS
    REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING, AND EXCEPT FOR THE WARRANTIES
    EXPRESSLY STATED IN SECTIONS 3, 4, AND 6, THE SUBMISSION PROVIDED UNDER THIS AGREEMENT IS
    PROVIDED WITHOUT WARRANTY OF ANY KIND, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY OF
    NONINFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
  2. Notice to Microsoft. You agree to notify Microsoft in writing of any facts or circumstances of which
    You later become aware that would make Your representations in this Agreement inaccurate in any
    respect.
  3. Information about Submissions. You agree that contributions to Projects and information about
    contributions may be maintained indefinitely and disclosed publicly, including Your name and other
    information that You submit with Your Submission.
  4. Governing Law/Jurisdiction. This Agreement is governed by the laws of the State of Washington, and
    the parties consent to exclusive jurisdiction and venue in the federal courts sitting in King County,
    Washington, unless no federal subject matter jurisdiction exists, in which case the parties consent to
    exclusive jurisdiction and venue in the Superior Court of King County, Washington. The parties waive all
    defenses of lack of personal jurisdiction and forum non-conveniens.
  5. Entire Agreement/Assignment. This Agreement is the entire agreement between the parties, and
    supersedes any and all prior agreements, understandings or communications, written or oral, between
    the parties relating to the subject matter hereof. This Agreement may be assigned by Microsoft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants