diff --git a/docs/_data/toc.yaml b/docs/_data/toc.yaml index 881c5523e154e..e8a6eefc570b6 100644 --- a/docs/_data/toc.yaml +++ b/docs/_data/toc.yaml @@ -77,6 +77,8 @@ url: clustering/discovery-in-the-cloud - title: Network Configuration url: clustering/network-configuration + - title: Multi Data Center Deployment + url: clustering/multi-data-center - title: Connecting Client Nodes url: clustering/connect-client-nodes - title: Baseline Topology diff --git a/docs/_docs/clustering/clustering.adoc b/docs/_docs/clustering/clustering.adoc index 8496a3c7799ac..08259bbcdfa7a 100644 --- a/docs/_docs/clustering/clustering.adoc +++ b/docs/_docs/clustering/clustering.adoc @@ -24,6 +24,9 @@ Client nodes join the topology as regular nodes but they do not store data. Clie To form a cluster, each node must be able to connect to all other nodes. To ensure that, a proper <> must be configured. +Ignite can also run a single cluster across multiple data centers. +See link:clustering/multi-data-center[Multi Data Center Deployment] for configuration details. + NOTE: In addition to client nodes, you can use Thin Clients to define and manipulate data in the cluster. Learn more about the thin clients in the link:thin-clients/getting-started-with-thin-clients[Thin Clients] section. diff --git a/docs/_docs/clustering/multi-data-center.adoc b/docs/_docs/clustering/multi-data-center.adoc new file mode 100644 index 0000000000000..e37dca248573d --- /dev/null +++ b/docs/_docs/clustering/multi-data-center.adoc @@ -0,0 +1,254 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. += Multi Data Center Deployment + +== Overview + +Ignite can run a single cluster across multiple data centers. +In this mode, every server node is assigned a data center ID. +Ignite uses this ID to group nodes by data center, reduce cross-data-center traffic when possible, +choose local data copies for reads, and validate cache topology during data center failures. + +[NOTE] +==== +Multi data center support is an experimental feature. +APIs and behavior can change in later releases. +==== + +The feature is based on the following components: + +* `IGNITE_DATA_CENTER_ID` - system property or user attribute that assigns a node to a data center. +* `ClusterNode#dataCenterId()` - node API that exposes the configured data center ID. +* `MdcTopologyValidator` - cache topology validator that can block updates when not enough data centers are visible. +* `MdcAffinityBackupFilter` - affinity backup filter that spreads partition copies evenly across data centers. +* Thin client data center awareness - routing that prefers nodes from the client's data center. + +== Configuring Data Center ID + +Set the same data center ID on all server nodes located in the same data center. +Nodes from different data centers must use different IDs. + +You can set the ID with the `IGNITE_DATA_CENTER_ID` system property: + +[source,shell] +---- +bin/ignite.sh -J-DIGNITE_DATA_CENTER_ID=DC1 +---- + +You can also set the same value as a user attribute in `IgniteConfiguration`: + +[source,java] +---- +IgniteConfiguration cfg = new IgniteConfiguration(); + +cfg.setUserAttributes(Collections.singletonMap( + IgniteSystemProperties.IGNITE_DATA_CENTER_ID, "DC1")); +---- + +Server nodes in the same cluster must be configured consistently: + +* If one server node has a data center ID, all server nodes must have a data center ID. +* Server nodes without a data center ID cannot join a cluster where server nodes have data center IDs. +* Server nodes with a data center ID cannot join a cluster where server nodes do not have data center IDs. +* Client nodes can join with or without a data center ID. + +== Discovery + +`TcpDiscoverySpi` uses data center IDs to keep nodes from the same data center close to each other in +the discovery ring. +When a data center becomes unavailable, discovery can check remote data centers in parallel and close the ring +through the local data center if the remote data center does not respond. + +Client nodes also use the data center ID during discovery. +If a client has a data center ID, it tries to connect through a server node from the same data center. +If no server node from the same data center is available, the client connects through any available server node. + +== Cache Topology Validation + +Use `MdcTopologyValidator` to protect caches from updates when the visible topology does not contain enough data centers. +The validator ignores client nodes and checks only server nodes. + +The validator supports two modes: + +* Majority-based validation - for an odd number of data centers, specify the full set of data centers. +Updates are allowed only when the visible topology contains a majority of the configured data centers. +* Main data center validation - for an even number of data centers, specify the main data center. +Updates are allowed while the main data center is visible. + +[tabs] +-- +tab:XML[] +[source,xml] +---- + + + + + + + DC1 + DC2 + DC3 + + + + + +---- + +tab:Java[] +[source,java] +---- +MdcTopologyValidator validator = new MdcTopologyValidator(); + +validator.setDatacenters(Set.of("DC1", "DC2", "DC3")); + +CacheConfiguration cacheCfg = + new CacheConfiguration("accounts") + .setTopologyValidator(validator); +---- +-- + +For an even number of data centers, configure a main data center: + +[tabs] +-- +tab:XML[] +[source,xml] +---- + + + + + + + + +---- + +tab:Java[] +[source,java] +---- +MdcTopologyValidator validator = new MdcTopologyValidator(); + +validator.setMainDatacenter("DC1"); + +CacheConfiguration cacheCfg = + new CacheConfiguration("accounts") + .setTopologyValidator(validator); +---- +-- + +The validator configuration is checked when the cache is created. +The following configurations are invalid: + +* Neither `datacenters` nor `mainDatacenter` is specified. +* `datacenters` is specified as an empty set. +* `mainDatacenter` is specified together with an odd number of data centers. + +== Placing Backup Copies Across Data Centers + +Use `MdcAffinityBackupFilter` with `RendezvousAffinityFunction` to distribute partition copies across data centers. +The filter ensures that each partition has the same number of copies in each data center. + +For example, a cache with `backups=3` has four partition copies: one primary and three backups. +In a two-data-center cluster, `MdcAffinityBackupFilter(2, 3)` places two copies in each data center. + +[tabs] +-- +tab:XML[] +[source,xml] +---- + + + + + + + + + + + + + + +---- + +tab:Java[] +[source,java] +---- +CacheConfiguration cacheCfg = + new CacheConfiguration("accounts") + .setBackups(3) + .setAffinity(new RendezvousAffinityFunction() + .setAffinityBackupFilter(new MdcAffinityBackupFilter(2, 3))); +---- +-- + +`MdcAffinityBackupFilter` has the following requirements: + +* The number of data centers must be at least 2. +* `(backups + 1)` must be evenly divisible by the number of data centers. +* All server nodes must have a data center ID. + +If these requirements are not met, the filter either rejects the configuration or cannot place copies predictably. + +For other backup placement strategies, see link:configuring-caches/configuring-backups#affinity-backup-filters[Affinity Backup Filters]. + +== Reading from Local Data Center + +When `CacheConfiguration#readFromBackup` is `true`, Ignite can read from a backup copy instead of the primary copy. +In a multi data center topology, this allows Ignite to use a backup located in the same data center as the +requester when such a backup is available. + +The Java thin client also uses the data center ID for partition awareness. +Set the client's data center ID with the `IGNITE_DATA_CENTER_ID` system property or in +`ClientConfiguration#setUserAttributes(...)`. +If both are configured, the system property takes precedence. +For partition-aware read operations, the client sends requests to a node from its own data center when the cache has +`readFromBackup=true`, its write synchronization mode is not `PRIMARY_SYNC`, and the local data center has a +partition copy. +Writes are still routed to the primary node. + +[source,java] +---- +ClientConfiguration cfg = new ClientConfiguration() + .setAddresses("dc1-node1:10800", "dc1-node2:10800", "dc2-node1:10800") + .setUserAttributes(Collections.singletonMap( + IgniteSystemProperties.IGNITE_DATA_CENTER_ID, "DC1")); +---- + +The same value can be set with the JVM system property: + +[source,shell] +---- +java -DIGNITE_DATA_CENTER_ID=DC1 ... +---- + +If no nodes are available in the client's data center, the client uses other available nodes. + +== Rebalancing + +During rebalancing, Ignite prefers nodes from the same data center when such suppliers are available. +If the required partition data cannot be obtained from the same data center, Ignite can rebalance from a +remote data center. +This behavior applies to both full and historical rebalance. + +== Monitoring + +The data center ID is exposed in the `NODES` system view as part of node information. +For client nodes, the `io.discovery.ClientRouterNodeId` metric exposes the ID of the server node used as +the discovery router. diff --git a/docs/_docs/configuring-caches/configuration-overview.adoc b/docs/_docs/configuring-caches/configuration-overview.adoc index 00541fd95a473..52521a5266b35 100644 --- a/docs/_docs/configuring-caches/configuration-overview.adoc +++ b/docs/_docs/configuring-caches/configuration-overview.adoc @@ -79,7 +79,10 @@ a| This parameter controls the way the rebalancing process is performed. Possibl | `IGNORE` |`readFromBackup` -| [[readfrombackup]] Read requested cache entry from the backup partition if it is available on the local node instead of making a request to the primary partition (which can be located on the remote nodes). +| [[readfrombackup]] Read requested cache entry from the backup partition if it is available on the local node +instead of making a request to the primary partition (which can be located on the remote nodes). +In link:clustering/multi-data-center[multi data center deployments], this setting allows Ignite to read from a backup +located in the same data center when such a backup is available. | `true` |`queryPrallelism` | The number of threads in a single node to process a SQL query executed on the cache. Refer to the link:SQL/sql-tuning#query-parallelism[Query Parallelism] section in the Performance guide for more information. diff --git a/docs/_docs/configuring-caches/configuring-backups.adoc b/docs/_docs/configuring-caches/configuring-backups.adoc index 30535eba25596..3d359fe2309d3 100644 --- a/docs/_docs/configuring-caches/configuring-backups.adoc +++ b/docs/_docs/configuring-caches/configuring-backups.adoc @@ -89,4 +89,34 @@ The write synchronization mode can be set to the following values: //NOTE: Regardless of write synchronization mode, cache data will always remain fully consistent across all participating nodes when using transactions. +== Affinity Backup Filters +`RendezvousAffinityFunction` can use an affinity backup filter to control where backup copies are placed. +Backup filters are useful when cache copies must follow hardware, availability zone, rack, cell, or data center boundaries. + +Ignite provides the following built-in filters: + +[cols="1,3",opts="header"] +|=== +| Filter | Description +| `ClusterNodeAttributeAffinityBackupFilter` +| Places backup copies on nodes that have different values of the configured node attributes. +Use it to spread partition copies across availability zones, racks, or other failure domains. + +| `ClusterNodeAttributeColocatedBackupFilter` +| Places backup copies on nodes that have the same value of the configured node attribute as the primary node. +Use it to keep all copies of a partition inside the same cell or another co-location group. +For example, if nodes are grouped by a `CELL` attribute, the primary and backups for one partition are placed +on nodes from the same cell, while other partitions can be assigned to other cells. + +| `MdcAffinityBackupFilter` +| Places partition copies evenly across data centers. +Use it in link:clustering/multi-data-center[multi data center deployments]. +|=== + +[NOTE] +==== +`ClusterNodeAttributeColocatedBackupFilter` intentionally co-locates partition copies in the same attribute group. +It is useful for cell-based layouts where the goal is to limit the failure impact to a cell. +It should not be used when the goal is to spread copies across data centers or availability zones. +==== diff --git a/docs/_docs/monitoring-metrics/new-metrics.adoc b/docs/_docs/monitoring-metrics/new-metrics.adoc index 0850d9ae33116..030b6253d659c 100644 --- a/docs/_docs/monitoring-metrics/new-metrics.adoc +++ b/docs/_docs/monitoring-metrics/new-metrics.adoc @@ -368,6 +368,7 @@ Register name: `io.discovery` [cols="2,1,3",opts="header"] |=== |Name| Type| Description +|ClientRouterNodeId| String| Client router node ID (metric is exported only from client nodes). |CoordinatorSince| long| Timestamp since which the local node became the coordinator (metric is exported only from server nodes). |Coordinator| UUID| Coordinator ID (metric is exported only from server nodes). |CurrentTopologyVersion| long| Current topology version. diff --git a/docs/_docs/monitoring-metrics/system-views.adoc b/docs/_docs/monitoring-metrics/system-views.adoc index 93c72f3672d01..799ea3e2c3c84 100644 --- a/docs/_docs/monitoring-metrics/system-views.adoc +++ b/docs/_docs/monitoring-metrics/system-views.adoc @@ -314,6 +314,7 @@ The NODES view contains information about the cluster nodes. |IS_CLIENT |BOOLEAN |Indicates whether the node is a client. |NODE_ID |UUID |Node ID. |NODE_ORDER |INT |Node order within the topology. +|DATA_CENTER_ID |VARCHAR |Node data center ID. |VERSION |VARCHAR |Node version. |=== diff --git a/docs/_docs/thin-clients/java-thin-client.adoc b/docs/_docs/thin-clients/java-thin-client.adoc index 33387aaaf6b5d..20783b5535aec 100644 --- a/docs/_docs/thin-clients/java-thin-client.adoc +++ b/docs/_docs/thin-clients/java-thin-client.adoc @@ -119,6 +119,13 @@ The following code sample illustrates how to use the partition awareness feature include::{sourceCodeFile}[tag=partition-awareness,indent=0] ---- +In link:clustering/multi-data-center[multi data center deployments], the Java thin client can prefer server nodes +located in the same data center as the client. +Set the `IGNITE_DATA_CENTER_ID` system property or set the same key in `ClientConfiguration#setUserAttributes(...)` +to enable data-center-aware routing. The system property takes precedence over the client user attribute. +For partition-aware reads from local backup nodes, the cache must have `readFromBackup=true` and a write +synchronization mode other than `PRIMARY_SYNC`. + The code sample below shows how to use a custom cache key to partition mapping function to enable affinity awareness on a thin client side if the cache already exists in a cluster or/and was created with custom AffinityFunction or AffinityKeyMapper.