Skip to content

Latest commit

 

History

History
291 lines (195 loc) · 9.87 KB

File metadata and controls

291 lines (195 loc) · 9.87 KB

Model Artifact Configuration

Each model artifact has an associated JSON structure which describes some basic information about the model such as name and version, as well as technical metadata such as format, precision and quantization. This content is referred to as Model Artifact Configuration and is identified by the media type application/vnd.cncf.model.config.v1+json.

This section defines application/vnd.cncf.model.config.v1+json media type.

Terminology

The following terms are used in this section:

  • Layer

  • Layer DiffID

    A layer DiffID is the hash of the layer's uncompressed tar archive.

Properties

  • descriptor object, REQUIRED

    Contains the general information about the model.

    • createdAt string, OPTIONAL

      The date and time at which the model was created, formatted as defined by RFC 3339, section 5.6.

    • authors array of strings, OPTIONAL

      A list of contact details for the individuals or organizations responsible for the model (freeform string).

    • vendor string, OPTIONAL

      The name of the organization or company distributing the model.

    • family string, OPTIONAL

      The model family or lineage, such as "llama3", "gpt2", or "qwen2".

    • name string, OPTIONAL

      The name of the model.

    • version string, OPTIONAL

      The version of the model.

    • title string, OPTIONAL

      A human-readable title for the model.

    • description string, OPTIONAL

      A human-readable description of the model.

    • docURL string, OPTIONAL

      A URL to get more information or details about the model.

    • sourceURL string, OPTIONAL

      A URL to get the source code or resources needed to build or understand the model's implementation.

    • datasetsURL array of string, OPTIONAL

      A list of links or references to datasets that the model was trained upon.

    • revision string, OPTIONAL

      The source control revision identifier for the model.

    • licenses array of string, OPTIONAL

      A list of licenses under which the model is distributed, represented as SPDX License Expressions.

  • config object, REQUIRED

    Contains the technical metadata for the model.

    • architecture string, OPTIONAL

      The architecture of the model, such as "transformer", "cnn", or "rnn".

    • format string, OPTIONAL

      The format for the model, such as "onnx", "safetensors", "gguf", or "pt"(pytorch format).

    • paramSize string, OPTIONAL

      The model size is represented as a combination of a decimal count and a single-letter scale-prefix in the format of <count><scale-prefix>, which together specify the total number of parameters in the model.

      • count: A numeric value representing the base parameter count before scaling. This value may include up to one digit after the decimal point to allow for partial scaling precision. For example: 6.7.

      • scale-prefix: A single letter indicating the order of magnitude multiplier applied to the count. The prefix is case-insensitive and must be one of the following:

        • Q or q (Quadrillion)
        • T or t (Trillion)
        • B or b (Billion)
        • M or m (Million)
        • K or k (Thousand)

      Some examples: 6.7B(6.7 Billion parameters), 1.0t(1 Trillion parameters), 100m(100 Million parameters).

    • precision string, OPTIONAL

      The computational precision of the model. Supported values include:

      Precision Description
      "float32" 32-bit floating point
      "float64" 64-bit floating point
      "float16" 16-bit floating point. Uses 1 sign, 5 exponent, and 10 significand bits.
      "bfloat16" 16-bit brain floating point. Uses 1 sign, 8 exponent and 7 significand bits.
      "float8_e4m3" 8-bit floating point, e4m3 format. Uses 1 sign, 4 exponent, and 3 significand bits.
      "float8_e5m2" 8-bit floating point, e5m2 format. Uses 1 sign, 5 exponent, and 2 significand bits.
      "complex32" 32-bit complex
      "complex64" 64-bit complex
      "complex128" 128-bit complex
      "int8" 8-bit signed integer
      "int16" 16-bit signed integer
      "int32" 32-bit signed integer
      "int64" 64-bit signed integer
      "uint8" 8-bit unsigned integer
      "uint16" 16-bit unsigned integer
      "uint32" 32-bit unsigned integer
      "uint64" 64-bit unsigned integer
      "bool" Boolean

      If multiple precisions are used, they should be separated by commas. For example, if the model uses float16 and float8_e4m3, the precision should be set to "float16,float8_e4m3".

    • quantization string, OPTIONAL

      Quantization technique applied to the model, such as "awq", or "gptq".

    • transformerConfig object, OPTIONAL

      Transformer-specific architectural parameters. Should only be populated when architecture is "transformer".

      • attentionType string, OPTIONAL

        The attention mechanism variant. Supported values:

        Value Description
        "mha" Multi-Head Attention — standard attention with one KV head per query head
        "gqa" Grouped-Query Attention — fewer KV heads than query heads, reducing KV cache size (e.g. LLaMA 3, Mistral)
        "mla" Multi-Latent Attention — low-rank KV compression for minimal KV cache (e.g. DeepSeek-V2)
      • mlpType string, OPTIONAL

        The feed-forward / MLP layer variant. Supported values:

        Value Description
        "dense" Standard dense feed-forward layer
        "moe" Mixture-of-Experts — tokens are routed to a subset of expert FFN layers (e.g. Mixtral, DeepSeek-V3)
      • numLayers integer, OPTIONAL

        Total number of transformer layers (blocks).

      • numAttentionHeads integer, OPTIONAL

        Number of query attention heads.

      • numKVHeads integer, OPTIONAL

        Number of key/value heads. For GQA this is smaller than numAttentionHeads. Omitting this field or setting it equal to numAttentionHeads implies standard MHA.

      • hiddenSize integer, OPTIONAL

        The model's hidden dimension size (d_model).

      • intermediateSize integer, OPTIONAL

        The inner dimension of the feed-forward layer.

    • capabilities object, OPTIONAL

      Special capabilities that the model supports, such as reasoning, toolusage, etc.

      • inputTypes array of string, OPTIONAL

        An array of strings specifying the data types that the model can accept as input. The allowed values are: "text", "image", "audio", "video", or "embedding". For input types that are not explicitly defined, the value "other" value should be used.

      • outputTypes array of string, OPTIONAL

        An array of strings specifying the data types that the model can produce as output. The allowed values are: "text", "image", "audio", "video", or "embedding". For output types that are not explicitly defined, the value "other" value should be used.

      • knowledgeCutoff string, OPTIONAL

        The date and time of the datasets that the model was trained on, formatted as defined by RFC 3339, section 5.6.

      • reasoning boolean, OPTIONAL

        Whether the model can perform reasoning tasks.

      • toolUsage boolean, OPTIONAL

        Whether the model can use external tools or APIs to perform tasks.

      • reward boolean, OPTIONAL

        Whether the model is a reward model.

      • languages array of string, OPTIONAL

        What languages can the model speak. Encoded as ISO 639 two letter codes.

  • modelfs object, REQUIRED

    Contains hashes of each uncompressed layer's content.

    • type string, REQUIRED

      Must be set to "layers".

    • diffIds array of strings, REQUIRED

      An array of layer content hashes (DiffIDs), in order from first to last.

Example

Here is an example model artifact configuration JSON document:

{
  "descriptor": {
    "createdAt": "2025-01-01T00:00:00Z",
    "authors": [
      "xyz@xyz.com"
    ],
    "vendor": "XYZ Corp.",
    "family": "xyz3",
    "name": "xyz-3-8B-Instruct",
    "version": "3.1",
    "title": "XYZ 3 8B Instruct",
    "description": "xyz is a large language model.",
    "docURL": "https://www.xyz.com/get-started/",
    "sourceURL": "https://github.com/xyz/xyz3",
    "datasetsURL": ["https://www.xyz.com/datasets/"],
    "revision": "1234567890",
    "licenses": [
      "Apache-2.0"
    ]
  },
  "config": {
    "architecture": "transformer",
    "format": "pt",
    "paramSize": "8b",
    "precision": "float16",
    "quantization": "gptq",
    "transformerConfig": {
      "attentionType": "gqa",
      "mlpType": "dense",
      "numLayers": 32,
      "numAttentionHeads": 32,
      "numKVHeads": 8,
      "hiddenSize": 4096,
      "intermediateSize": 14336
    },
    "capabilities": {
      "inputTypes": [
        "text"
      ],
      "outputTypes": [
        "text",
        "image"
      ],
      "knowledgeCutoff": "2024-05-21T00:00:00Z",
      "reasoning": true,
      "toolUsage": false,
      "reward": false,
      "languages": ["en", "zh"]
    }
  },
  "modelfs": {
    "type": "layers",
    "diffIds": [
      "sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef",
      "sha256:abcdef1234567890abcdef1234567890abcdef1234567890abcdef1234567890"
    ]
  }
}