Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ include::third-party:partial$nav.adoc[]
**** xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc[XDCR Conflict Logging]
***** xref:learn:clusters-and-availability/xdcr-viewing-conflict-logs.adoc[Viewing Conflict Logs]
**** xref:learn:clusters-and-availability/xdcr-active-active-sgw.adoc[XDCR Active-Active with Sync Gateway]
**** xref:xdcr-reference:xdcr-lowering-memory-footprint.adoc[]
*** xref:learn:clusters-and-availability/groups.adoc[Server Group Awareness]
* xref:learn:security/security-overview.adoc[Security]
** xref:learn:security/authentication.adoc[Authentication]
Expand Down
63 changes: 52 additions & 11 deletions modules/rest-api/pages/rest-xdcr-adv-settings.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,15 @@ GET /settings/replications/<settings_URI>
[#description]
== Description

Used with the `POST` method, the URIs respectively change global settings for _all_ replications; and for _a specific_ replication, which is referenced by its `settings_URI`.
The `settings_URI` comprises the _id_ for the replication, and can be retrieved by means of the `GET /pools/default/tasks` method and URI: see xref:rest-api:rest-get-cluster-tasks.adoc[Getting Cluster Tasks].
Used with the `POST` method, the URIs respectively change global settings for all replications; and for a specific replication, which is referenced by its `settings_URI`.
The `settings_URI` comprises the id for the replication, and can be retrieved by means of the `GET /pools/default/tasks` method and URI: see xref:rest-api:rest-get-cluster-tasks.adoc[Getting Cluster Tasks].

The global settings are the default values used if settings are not specified during the creation of a particular replication.
If settings are specified for a particular replication, the specified settings overwrite the global settings.

If the global settings are themselves changed, existing replications are not affected: only replications created _after_ the change made to the global settings receive the updated global settings.
If the global settings are themselves changed, existing replications are not affected: only replications created after the change made to the global settings receive the updated global settings.

Used with the `GET` method, the URIs respectively retrieve global settings for _all_ replications; and for _a specific_ replication, which is referenced by its `settings_URI`.
Used with the `GET` method, the URIs respectively retrieve global settings for all replications; and for a specific replication, which is referenced by its `settings_URI`.



Expand Down Expand Up @@ -126,15 +126,18 @@ curl -u Administrator:password -X GET http://localhost:8091/settings/replication

If successful, the call returns an object similar to the following:

[source,jsonlines]
Comment thread
RayOffiah marked this conversation as resolved.
----
{
"cLogConnPoolGCIntervalMs": 60000,
"cLogConnPoolLimit": 30,
"cLogConnPoolReapIntervalMs": 120000,
"cLogErrorTimeWindowMs": 120000,
"cLogMaxErrorCount": 10,
"cLogMonitorDuration": 0,
"cLogNetworkRetryCount": 5,
"cLogNetworkRetryIntervalMs": 2000,
"cLogPauseReplThreshold": 0,
"cLogPoolGetTimeoutMs": 5000,
"cLogQueueCapacity": 6000,
"cLogReattemptDurationMs": 600000,
Expand All @@ -143,28 +146,36 @@ If successful, the call returns an object similar to the following:
"casDriftThresholdSecs": 3900,
"checkpointInterval": 600,
"ckptSvcCacheEnabled": true,
"cngConnCount": 2,
"cngQueueSize": 1000,
"cngRPCDeadlineMs": 5000,
"cngWorkerCount": 500,
"collectionsOSOMode": true,
"componentEventsChanLength": 10000,
"compressionType": "Auto",
"conflictLogging": {},
"dcpEnablePurgeRollback": false,
"dcpFlowControlThrottle": 100,
"desiredLatency": 50,
"devReplOpts": "",
"disableHlvBasedShortCircuit": false,
"docBatchSizeKb": 2048,
"failureRestartInterval": 10,
"filterBinary": false,
"filterBypassExpiry": false,
"filterBypassUncommittedTxn": false,
"filterDeletion": false,
"filterDeletionsWithExpression": false,
"filterExpiration": false,
"filterExpirationsWithExpression": false,
"genericServicesLogLevel": {
< ... diagnostic items cut out due to length ... >
},
// < ... diagnostic items cut out due to length ... >
},
Comment thread
RayOffiah marked this conversation as resolved.
"goGC": 100,
"goMaxProcs": 4,
"jsFunctionTimeoutMs": 20000,
"logLevel": "Info",
"mergeFunctionMapping": {},
"mobile": "Off",
"minHLVHistoryLenForMobile": 5,
"networkUsageLimit": 0,
"optimisticReplicationThreshold": 256,
"preCheckCasDriftThresholdHours": 8760,
Expand All @@ -176,7 +187,6 @@ If successful, the call returns an object similar to the following:
"retryOnRemoteAuthErrMaxWaitSec": 360,
"skipReplSpecAutoGc": false,
"sourceNozzlePerNode": 2,
"targetTopologyLogFrequency": 1800,
"statsInterval": 1000,
"targetNozzlePerNode": 2,
"targetTopologyLogFrequency": 1800,
Expand Down Expand Up @@ -423,7 +433,7 @@ curl -X POST -u Administrator:password http://localhost:8091/<settings_URI> -d m

For information about _XDCR with Sync Gateway mobile clusters in a bi-directional, active-active replication_, see xref:learn:clusters-and-availability/xdcr-active-active-sgw.adoc[XDCR Active-Active with Sync Gateway].

===== Change Settings for XDCR Generic Services Log Levels
=== Change Settings for XDCR Generic Services Log Levels

The following example modifies the log levels for XDCR Generic Services, for a specific replication.
Usually, you modify the log levels only when requested by Couchbase Support.
Expand Down Expand Up @@ -519,6 +529,21 @@ This setting can only be established for and retrieved from an individual replic
| Whether the replication optimizes performance; by streaming, from a source bucket, mutations that could be out of order, in terms of sequence-number.
Default is true.

| `componentEventsChanLength`
| Integer (1000 to 10000)
a| Default: 10000
Comment thread
RayOffiah marked this conversation as resolved.

Sets the default length of the channels being used by event-listeners to buffer various internal events before processing.


NOTE: By decreasing this value, the replication throughput will be throttled in favour of a lower memory footprint.
See xref:xdcr-reference:xdcr-lowering-memory-footprint.adoc[] for more information.



This setting can be established and retrieved either for an individual replication or globally.


| `compressionType`
| String
| Default: `Auto`.
Expand All @@ -533,7 +558,23 @@ This setting can be established and retrieved either for an individual replicati

This configuration setting defines objects/parameters and options used to control how conflicts are logged during an XDCR replication.

For more information, see xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#configure-conflictlogging-settings[Enabling and Configuring Conflict Logging].
For more information, see xref:learn:clusters-and-availability/xdcr-conflict-logging-feature.adoc#configure-conflictlogging-settings[Enabling and Configuring Conflict Logging].

| `dcpFlowControlThrottle`

| Integer (5 to 100)
a| Default: 100
Comment thread
RayOffiah marked this conversation as resolved.

The percentage applied to the default value for both:

. The `connection_buffer_size` set by XDCR replication for the DCP connection to KV (1024*1024 bytes by default)
. Length of the channel used by XDCR replication to buffer incoming items (over the DCP connection) from KV (20000 by default)

NOTE: By decreasing this percentage value, the replication throughput is throttled in favor of a lower memory footprint.
See xref:xdcr-reference:xdcr-lowering-memory-footprint.adoc[] for more information.

This setting can be established and retrieved either for an individual replication or globally.


| `desiredLatency`
| Integer
Expand Down
63 changes: 63 additions & 0 deletions modules/xdcr-reference/pages/xdcr-lowering-memory-footprint.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
= Lowering the Memory Footprint in XDCR
:description: Using new global settings (introduced in Couchbase Server 8.1) to lower the memory footprint of Cross Data Center Replication.
:stem: asciimath

[abstract]
{description}

XDCR has 2 global replication settings which can be used to lower the memory footprint of a running replication:

dcpFlowControlThrottle::
This is a percentage applied to the default value of both:
+
--
* The `connection_buffer_size` set by XDCR replication for the DCP connection to KV (stem:[1024 times 1024] bytes by default)
* Length of the channel used by XDCR replication to buffer incoming items (via the DCP connection) from KV (20000 by default)
--
+
Effective value for both the above parameters becomes:
+
--
stem:[("<default value> " times " dcpFlowControlThrottle") / 100]
--
+
Reducing this setting from its default value of 100 causes a decrease in the max number of documents that can fit into XDCR's memory buffer. This limitation on the size of the buffer results in a drop in the replication throughput, thereby reducing the memory footprint of the replication.

componentEventsChanLength::
This sets the default length of the channels being used by event-listeners to buffer various internal events before processing.
+
Reducing this setting from its default value of 10,000 lowers the number of buffer slots available to each of the event-listeners, thereby reducing the memory allocated.
However, it may also cause a drop in replication throughput.

NOTE: Changing these replication settings for an ongoing replication causes it to restart from the last checkpoint sequence number
so that the new values can take effect.

== Monitoring the Health of a Throttled Replication

Lowering either of the above two settings may adversely affect the throughput of the replication, and care must be taken to not over-throttle the replication.

The health of the replication in response to the throttling can be measured via the following Prometheus metrics expression:

[source, text]
----
clamp_min(xdcr_changes_left_total\{pipelineType="Main", sourceBucketName="<A>", targetBucketName="<B>", targetClusterUUID="<C>"} - ignoring(name) rate(xdcr_docs_processed_total[1m]), 0)
Comment thread
RayOffiah marked this conversation as resolved.
----

The graph of the above expression will have a negative slope, or hover/plateau around zero, when the replication is in a healthy state.

If the graph has a consistently positive slope or plateaus without ever touching the X-axis, it implies that the replication is not able to keep up with the rate of mutations on the Source bucket.
In such a situation, any throttling imposed on the replication needs to be reversed.

In the event that a Source cluster node (running a throttled replication) is running out of disk space, it's advisable to reverse the throttling to alleviate any potential excesses in disk usage being caused to KV by the replication having a reduced `connection_buffer_size`.

== Proposed Values for these Settings

There is no exact formula to propose the value for *dcpFlowControlThrottle*, since it depends on the source bucket conditions (rate of mutations, size of documents, etc.)
Qualitatively, a replication whose source bucket has a lower mutation rate can be throttled more than a replication with a higher mutation rate.

In practice, it's recommended to reduce the *dcpFlowControlThrottle* percentage incrementally from the default value of 100, then check the effect of the new value on the replication health graph described in the previous section.
If the graph starts diverging from zero
(specifically, it has a consistent positive slope), then one must undo the setting change to a value that does not have the same issue.
For example, use an incremental decrease (100 → 75 → 50 → …) while monitoring the graph.

Similar incremental changes are advised for the *componentEventsChanLength* setting.