feat(cluster): disconnect replication connections to orphan slaves#2850
feat(cluster): disconnect replication connections to orphan slaves#2850RiversJin wants to merge 19 commits intoapache:unstablefrom
Conversation
|
|
||
| Status Execute([[maybe_unused]] engine::Context &ctx, [[maybe_unused]] Server *srv, Connection *conn, | ||
| std::string *output) override { | ||
| if (port_ != 0) { |
There was a problem hiding this comment.
Based on the replconf command definition, it's impossible to reach this code path without port_ being set, so I've removed the validation check for port_.
|
Thank you for your contribution! Could you add some test cases for it? |
Of course, but save the boring tasks for weekdays :P. |
@PragmaTwice After a closer look, this PR doesn't introduce any new changes other than disconnecting data replication for nodes removed from the cluster. I added a small snippet of assertion code in cluster_test.go for this scenario. Do you think there are any other areas that might need more testing? |
| ClusterNode::ClusterNode(std::string id, std::string host, int port, int role, std::string master_id, | ||
| const std::bitset<kClusterSlots> &slots) | ||
| : id(std::move(id)), host(std::move(host)), port(port), role(role), master_id(std::move(master_id)), slots(slots) {} | ||
| ClusterNode::ClusterNode(std::string &&id, std::string &&host, int port, int role, std::string &&master_id, |
There was a problem hiding this comment.
Some notes:
- for
std::bitset, in most impl it is stored on the stack memory instead heap so move ctor has no difference from the copy ctor. usually you don't need to move it. - for
ClassA(std::string x) : x(x) {}, you can pass either lval or rval to it. for rval, it involves two move ctors. And forClassA(std::string &&x) : x(x) {}, you can only pass rval to it, and also the performance difference is quite little or none.
So these changes are not so useful.
| if (version_ >= 0 && nodes_->count(node_id) > 0) { | ||
| myself_ = nodes_->at(node_id); |
There was a problem hiding this comment.
if the performance is important, why not use find() and iterator based operations? Which could make it just search once
| auto it = nodes_->find(node_id); | ||
| if (it == nodes_->end()) { | ||
| return {Status::NotOK, "No this node in the cluster"}; | ||
| } | ||
|
|
||
| auto to_assign_node = it->second; |
There was a problem hiding this comment.
| auto it = nodes_->find(node_id); | |
| if (it == nodes_->end()) { | |
| return {Status::NotOK, "No this node in the cluster"}; | |
| } | |
| auto to_assign_node = it->second; | |
| std::shared_ptr<ClusterNode> to_assign_node; | |
| if (auto it = nodes_.find(); it != nodes_.end()) { | |
| to_assign_node = .. | |
| } else { | |
| return {Status::NotOK, "No this node in the cluster"}; | |
| } |
| auto it = nodes_->find(myid_); | ||
| if (it != nodes_->end()) { |
| if (!is_slave) { | ||
| srv_->CleanupOrphanSlaves(version_, *nodes_); | ||
| } |
There was a problem hiding this comment.
So this is what this patch actually does?
| std::string_view ip_; | ||
| std::string_view addr_; |
There was a problem hiding this comment.
This looks really unsafe..Who would owns memory for them?
| const auto peer_info = slave_thread->GetConn()->GetPeerInfo(); | ||
| auto peer_version = peer_info->GetPeerVersion(); | ||
| if (peer_version < 0 || peer_version > version) { | ||
| // The peer version is greater than the current version, |
There was a problem hiding this comment.
what does peer_version < 0 means?
| std::lock_guard<std::mutex> lg(slave_threads_mu_); | ||
|
|
||
| for (auto &slave_thread : slave_threads_) { | ||
| const auto peer_info = slave_thread->GetConn()->GetPeerInfo(); |
There was a problem hiding this comment.
do we need check GetPeerInfo != nullptr?
| SetPeerInfo(std::make_unique<PeerInfo>(ip_, port_, "", -1)); | ||
| return peer_info_.get(); |
… update GetPeerInfo to return by reference
7bddf14 to
eaec818
Compare
| inline constexpr const char *errClusterNoInitialized = "The cluster is not initialized"; | ||
| inline constexpr const char *errInvalidClusterNodeInfo = "Invalid cluster nodes info"; | ||
| inline constexpr const char *errInvalidImportState = "Invalid import state"; | ||
| inline constexpr const char *errYouAreFired = "You are fired"; |
There was a problem hiding this comment.
I think it's not a right wording. Could be something like disconnected due to the topology change or others.
|
@RiversJin, I found that this PR introduces many code styles and fixes, except for the disconnecting feature. Would you mind separating them into a few dedicated PRs that would be easy to review and clarify the context? |
Co-authored-by: hulk <hulk.website@gmail.com>
251ff86 to
19e81f3
Compare
@git-hulk Sure! The unrelated tweaks were legacy from our internal codebase. I've removed the obvious ones, but the iterator-based index optimization in Cluster::nodes_ (avoiding double lookups) is interleaved in some places. It's technically out of scope, but actually improves performance. but need some time to remove those... |
3b0cb8f to
14a7750
Compare
|
|
Yeah these changes in |



it solves #2841