Skip to content

Commit 7c33a66

Browse files
author
MOSTLY CI
committed
Updating OpenAPI Specification for release 4.6.0
1 parent 561c00b commit 7c33a66

1 file changed

Lines changed: 55 additions & 30 deletions

File tree

public-api.yaml

Lines changed: 55 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1369,10 +1369,6 @@ paths:
13691369
properties:
13701370
status:
13711371
$ref: "#/components/schemas/AssistantThreadSessionStatus"
1372-
totalVirtualCPUTime:
1373-
type: "number"
1374-
format: "double"
1375-
description: "Total virtual CPU time"
13761372
/assistant/threads/{id}/export:
13771373
parameters:
13781374
- $ref: "#/components/parameters/assistantThreadIdPath"
@@ -2070,7 +2066,7 @@ components:
20702066
$ref: "#/components/schemas/ProgressStatus"
20712067
filterBySearchTerm:
20722068
name: "searchTerm"
2073-
description: "Filter by search term"
2069+
description: "Filter by search term in the name or description."
20742070
in: "query"
20752071
style: "form"
20762072
explode: false
@@ -2097,7 +2093,7 @@ components:
20972093
required: true
20982094
filterByVisibility:
20992095
name: "visibility"
2100-
description: "Filter by visibility"
2096+
description: "Filter by visibility."
21012097
in: "query"
21022098
style: "form"
21032099
explode: false
@@ -2107,7 +2103,7 @@ components:
21072103
$ref: "#/components/schemas/Visibility"
21082104
filterByCreatedFrom:
21092105
name: "createdFrom"
2110-
description: "Filter connectors created from this date (inclusive). Format: YYYY-MM-DD."
2106+
description: "Filter by creation date, not older than this date. Format: YYYY-MM-DD."
21112107
in: "query"
21122108
style: "form"
21132109
explode: false
@@ -2116,7 +2112,7 @@ components:
21162112
format: "date"
21172113
filterByCreatedTo:
21182114
name: "createdTo"
2119-
description: "Filter connectors created until this date (inclusive). Format: YYYY-MM-DD."
2115+
description: "Filter by creation date, not younger than this date. Format: YYYY-MM-DD."
21202116
in: "query"
21212117
style: "form"
21222118
explode: false
@@ -4679,33 +4675,47 @@ components:
46794675
description: |
46804676
Specifies the maximum allowable epsilon value. If the training process exceeds this threshold, it will be terminated early. Only model checkpoints with epsilon values below this limit will be retained.
46814677
If not provided, the training will proceed without early termination based on epsilon constraints.
4678+
default: 10.0
46824679
minimum: 0.0
4680+
exclusiveMinimum: true
46834681
maximum: 10000.0
4682+
delta:
4683+
type: "number"
4684+
format: "double"
4685+
description: |
4686+
The delta value for differential privacy. It is the probability of the privacy guarantee not holding.
4687+
The smaller the delta, the more confident you can be that the privacy guarantee holds.
4688+
This delta will be equally distributed between the analysis and the training phase.
4689+
default: 1e-5
4690+
minimum: 0.0
4691+
exclusiveMinimum: true
4692+
maximum: 1.0
46844693
noiseMultiplier:
46854694
type: "number"
46864695
format: "double"
46874696
description: |
4688-
The ratio of the standard deviation of the Gaussian noise to the L2-sensitivity of the function to which the noise is added (How much noise to add).
4697+
Determines how much noise while training the model with differential privacy. This is the ratio of the standard deviation of the Gaussian noise to the L2-sensitivity of the function to which the noise is added.
46894698
default: 1.5
46904699
minimum: 0.0
46914700
maximum: 10000.0
46924701
maxGradNorm:
46934702
type: "number"
46944703
format: "double"
46954704
description: |
4696-
The maximum norm of the per-sample gradients for training the model with differential privacy.
4705+
Determines the maximum impact of a single sample on updating the model weights during training with differential privacy. This is the maximum norm of the per-sample gradients.
46974706
default: 1.0
46984707
minimum: 0.0
46994708
maximum: 10000.0
4700-
delta:
4709+
valueProtectionEpsilon:
47014710
type: "number"
47024711
format: "double"
47034712
description: |
4704-
The delta value for differential privacy. It is the probability of the privacy guarantee not holding.
4705-
The smaller the delta, the more confident you can be that the privacy guarantee holds.
4706-
default: 1e-5
4713+
The DP epsilon of the privacy budget for determining the value ranges, which are gathered prior to the model training during the analysis step. Only applicable if value protection is True.
4714+
Privacy budget will be equally distributed between the columns. For categorical we calculate noisy histograms and use a noisy threshold. For numeric and datetime we calculate bounds based on noisy histograms.
4715+
default: 1.0
47074716
minimum: 0.0
4708-
maximum: 1.0
4717+
exclusiveMinimum: true
4718+
maximum: 10000.0
47094719

47104720
#################
47114721
## mostlyai-qa ##
@@ -4721,7 +4731,7 @@ components:
47214731
2. **Similarity**: Metrics regarding the similarity of the full joint distributions of samples within an embedding
47224732
space.
47234733
3. **Distances**: Metrics regarding the nearest neighbor distances between training, holdout, and synthetic samples
4724-
in an embedding space. Useful for assessing the novelty / privacy of synthetic data.
4734+
in an numeric encoding space. Useful for assessing the novelty / privacy of synthetic data.
47254735
47264736
The quality of synthetic data is assessed by comparing these metrics to the same metrics of a holdout dataset.
47274737
The holdout dataset is a subset of the original training data, that was not used for training the synthetic data
@@ -4738,20 +4748,21 @@ components:
47384748
description: |
47394749
Metrics regarding the accuracy of synthetic data, measured as the closeness of discretized lower dimensional
47404750
marginal distributions.
4741-
4751+
47424752
1. **Univariate Accuracy**: The accuracy of the univariate distributions for all target columns.
47434753
2. **Bivariate Accuracy**: The accuracy of all pair-wise distributions for target columns, as well as for target
47444754
columns with respect to the context columns.
4745-
3. **Coherence Accuracy**: The accuracy of the auto-correlation for all target columns.
4746-
4755+
3. **Trivariate Accuracy**: The accuracy of all three-way distributions for target columns.
4756+
4. **Coherence Accuracy**: The accuracy of the auto-correlation for all target columns.
4757+
47474758
Accuracy is defined as 100% - [Total Variation Distance](https://en.wikipedia.org/wiki/Total_variation_distance_of_probability_measures) (TVD),
47484759
whereas TVD is half the sum of the absolute differences of the relative frequencies of the corresponding
47494760
distributions.
4750-
4761+
47514762
These accuracies are calculated for all discretized univariate, and bivariate distributions. In case of sequential
47524763
data, also for all coherence distributions. Overall metrics are then calculated as the average across these
47534764
accuracies.
4754-
4765+
47554766
All metrics can be compared against a theoretical maximum accuracy, which is calculated for a same-sized holdout.
47564767
The accuracy metrics shall be as close as possible to the theoretical maximum, but not significantly higher, as
47574768
this would indicate overfitting.
@@ -4777,6 +4788,13 @@ components:
47774788
format: "double"
47784789
minimum: 0.0
47794790
maximum: 1.0
4791+
trivariate:
4792+
description: |
4793+
Average accuracy of discretized trivariate distributions.
4794+
type: "number"
4795+
format: "double"
4796+
minimum: 0.0
4797+
maximum: 1.0
47804798
coherence:
47814799
description: |
47824800
Average accuracy of discretized coherence distributions. Only applicable for sequential data.
@@ -4805,6 +4823,13 @@ components:
48054823
format: "double"
48064824
minimum: 0.0
48074825
maximum: 1.0
4826+
trivariateMax:
4827+
description: |
4828+
Expected trivariate accuracy of a same-sized holdout. Serves as a reference for `trivariate`.
4829+
type: "number"
4830+
format: "double"
4831+
minimum: 0.0
4832+
maximum: 1.0
48084833
coherenceMax:
48094834
description: |
48104835
Expected coherence accuracy of a same-sized holdout. Serves as a reference for `coherence`.
@@ -4864,20 +4889,20 @@ components:
48644889
Distances:
48654890
type: "object"
48664891
description: |
4867-
Metrics regarding the nearest neighbor distances between training, holdout, and synthetic samples in an embedding
4868-
space. Useful for assessing the novelty / privacy of synthetic data.
4869-
4892+
Metrics regarding the nearest neighbor distances between training, holdout, and synthetic samples in an numerically
4893+
encoded space. Useful for assessing the novelty / privacy of synthetic data.
4894+
48704895
The provided data is first down-sampled, so that the number of samples match across all datasets. Note, that for
48714896
an optimal sensitivity of this privacy assessment it is recommended to use a 50/50 split between training and
48724897
holdout data, and then generate synthetic data of the same size.
4873-
4874-
The embeddings of these samples are then computed, and the nearest neighbor distances are calculated for each
4898+
4899+
The numerical encodings of these samples are then computed, and the nearest neighbor distances are calculated for each
48754900
synthetic sample to the training and holdout samples. Based on these nearest neighbor distances the following
48764901
metrics are calculated:
4877-
- Identical Match Share (IMS): The share of synthetic samples that are identical to a training or holdout sample.
4878-
- Distance to Closest Record (DCR): The average distance of synthetic to training or holdout samples.
4879-
- Nearest Neighbor Distance Ratio (NNDR): The 10-th smallest ratio of the distance to nearest and second nearest neighbor.
4880-
4902+
- Identical Match Share (IMS): The share of synthetic samples that are identical to a training or holdout sample.
4903+
- Distance to Closest Record (DCR): The average distance of synthetic to training or holdout samples.
4904+
- Nearest Neighbor Distance Ratio (NNDR): The 10-th smallest ratio of the distance to nearest and second nearest neighbor.
4905+
48814906
For privacy-safe synthetic data we expect to see about as many identical matches, and about the same distances
48824907
for synthetic samples to training, as we see for synthetic samples to holdout.
48834908
properties:

0 commit comments

Comments
 (0)