Skip to content

[Bug] FE Observer node fails to join cluster in cloud mode with Docker host network #61536

@zhangdong1015

Description

@zhangdong1015

Search before asking

  • I had searched in the issues and found no similar issues.

Version

Doris 4.0.3, 4.0.4-slim

What's Wrong?

In Doris cloud mode, when deploying FE cluster using Docker host network where each FE node uses a
different http_port, Observer nodes fail to join the cluster.

Error log:
WARN [Env.getFeNodeTypeAndNameFromHelpers():1520] failed to get fe node type from helper node:
HostInfo{host='127.0.0.1', port=9010}.
java.net.ConnectException: Connection refused
WARN [Env.getClusterIdAndRole():1342] current node HostInfo{host='127.0.0.1', port=9011} is not added to
the group. please add it first.

Root cause:

In Env.getFeNodeTypeAndNameFromHelpers() method:
String url = "http://" + NetUtils.getHostPortInAccessibleFormat(
helperNode.getHost(), // "127.0.0.1" (correct)
Config.http_port // Uses current node's http_port, NOT helper's!
) + "/role?host=...";

The code uses Config.http_port (current node's port) instead of the helper node's http_port. This assumes
all FE nodes use the same http_port, which fails when:

  • Using Docker host network mode (containers share host network, must use different ports)

What You Expected

FE-2 (Observer) should connect to FE-1's HTTP endpoint at port 8030 and join the cluster successfully.

The code should use the helper node's http_port when constructing the HTTP URL, not the current node's
Config.http_port.

What You Expected?

  1. FE-2 should successfully connect to FE-1's HTTP endpoint and join the cluster as an Observer node.
  2. The code should use the helper node's http_port when constructing the HTTP URL, not the current node's
    Config.http_port.

Current behavior (wrong):
FE-2 tries: http://127.0.0.1:8031/role (FE-2's own http_port, no service)
Should be: http://127.0.0.1:8030/role (FE-1's http_port, correct)

Suggested fix:

  • Store http_port in Meta Service when registering FE nodes
  • Or provide a way to specify helper node's http_port in the --helper parameter (e.g., --helper
    host:http_port:edit_log_port)

How to Reproduce?

  1. Deploy Doris cloud mode with Meta Service + FoundationDB
  2. Configure FE-1 (Master) with http_port=8030, edit_log_port=9010
  3. Configure FE-2 (Observer) with http_port=8031, edit_log_port=9011
  4. Start FE-1 successfully
  5. Start FE-2 with --helper 127.0.0.1:9010
  6. FE-2 fails to join with "Connection refused" error

Configuration example:

┌──────┬──────────┬───────────────┬───────────┐
│ Node │ Role │ edit_log_port │ http_port │
├──────┼──────────┼───────────────┼───────────┤
│ FE-1 │ Master │ 9010 │ 8030 │
├──────┼──────────┼───────────────┼───────────┤
│ FE-2 │ Observer │ 9011 │ 8031 │
└──────┴──────────┴───────────────┴───────────┘

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions