Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
237 changes: 237 additions & 0 deletions README-INFRA-METRICS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
# TRON Prometheus Infrastructure Metrics

**PR for [java-tron Issue #6590](https://github.com/tronprotocol/java-tron/issues/6590)** |
**Penn Blockchain Conference Hackathon 2026 — TRON Bounty 2**

---

## What This PR Does

Implements Prometheus metrics for empty block detection and SR set change monitoring, addressing critical operational blind spots in java-tron's monitoring infrastructure.

## New Metrics Reference

| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `tron:block_transaction_count` | Histogram | `miner` | Distribution of transaction counts per block |
| `tron:sr_set_change_total` | Counter | `witness`, `change_type` | SR set changes (added/removed) |

### Empty Blocks via Histogram

Query empty blocks using the histogram's `le="0.0"` bucket:

```promql
# Empty blocks count by miner
tron:block_transaction_count_bucket{le="0.0"}

# Empty block ratio
rate(tron:block_transaction_count_bucket{le="0.0"}[1h]) / rate(tron:block_transaction_count_count[1h])
```

### Histogram Buckets

`[0, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000]`

### Label Values for `tron:sr_set_change_total`

| Label | Value | Description |
|-------|-------|-------------|
| `change_type` | `added` | A new SR entered the active set |
| `change_type` | `removed` | An existing SR left the active set |
| `witness` | base58 address | The SR address affected |

## Setup Instructions

### Enable Prometheus Metrics

In your node's `config.conf`:

```hocon
node {
metricsPrometheusEnable = true
}
```

Or via CLI flag:

```bash
java -jar FullNode.jar --metrics-prometheus-enable
```

### Prometheus Endpoint

When enabled, metrics are available at:

```
http://localhost:9527/metrics
```

### Prometheus Configuration

Add to your `prometheus.yml`:

```yaml
scrape_configs:
- job_name: 'tron-node'
static_configs:
- targets: ['localhost:9527']
metrics_path: '/metrics'
```

## PromQL Example Queries

### Empty Block Rate (per minute)

```promql
rate(tron:block_transaction_count_bucket{le="0.0"}[1m])
```

### Empty Block Ratio (last hour)

```promql
rate(tron:block_transaction_count_bucket{le="0.0"}[1h]) / rate(tron:block_transaction_count_count[1h])
```

### Total Empty Blocks by Miner

```promql
tron:block_transaction_count_bucket{le="0.0"}
```

### Transaction Count Distribution

```promql
# Blocks with 0-10 transactions
tron:block_transaction_count_bucket{le="10"} - tron:block_transaction_count_bucket{le="0"}

# Average transactions per block
rate(tron:block_transaction_count_sum[5m]) / rate(tron:block_transaction_count_count[5m])
```

### SR Set Changes Over Time

```promql
rate(tron:sr_set_change_total[5m])
```

### SRs Added vs Removed

```promql
# Added
sum by (change_type) (tron:sr_set_change_total{change_type="added"})

# Removed
sum by (change_type) (tron:sr_set_change_total{change_type="removed"})
```

### Alert: High Empty Block Rate

```promql
rate(tron:block_transaction_count_bucket{le="0.0"}[5m]) > 10
```

### Alert: SR Set Changed

```promql
increase(tron:sr_set_change_total[1h]) > 0
```

## Files Modified

| File | Change |
|------|--------|
| `common/src/main/java/org/tron/common/prometheus/MetricKeys.java` | Removed `BLOCK_EMPTY`, added `BLOCK_TRANSACTION_COUNT` histogram constant |
| `common/src/main/java/org/tron/common/prometheus/MetricsCounter.java` | Removed `BLOCK_EMPTY` counter registration |
| `common/src/main/java/org/tron/common/prometheus/MetricsHistogram.java` | Added overloaded `init()` for custom buckets, registered `BLOCK_TRANSACTION_COUNT` |
| `framework/src/main/java/org/tron/core/metrics/blockchain/BlockChainMetricManager.java` | Replaced counter with `histogramObserve()` for all blocks, kept SR counter |
| `framework/src/test/java/org/tron/core/metrics/prometheus/PrometheusApiServiceTest.java` | Updated tests for histogram bucket queries |

## Build & Test Commands

```bash
# Build without tests (fast)
./gradlew clean build -x test

# Compile modified source
./gradlew :framework:compileJava :common:compileJava

# Run Prometheus metric tests only
./gradlew :framework:test --tests \
"org.tron.core.metrics.prometheus.PrometheusApiServiceTest"

# Run all metrics tests
./gradlew :framework:test --tests "org.tron.core.metrics.*"

# Full test suite
./gradlew test

# Coverage report
./gradlew :framework:jacocoTestReport
# Report: framework/build/reports/jacoco/test/html/index.html
```

## Implementation Details

### Block Transaction Count Histogram

Records transaction count for **all blocks** (including empty blocks):

```java
int txCount = block.getTransactions().size();
Metrics.histogramObserve(MetricKeys.Histogram.BLOCK_TRANSACTION_COUNT, txCount,
StringUtil.encode58Check(address));
```

Benefits over simple counter:
- **Rich insights**: Tracks full distribution of tx counts
- **Flexible queries**: Percentiles, trends, specific ranges
- **Empty block detection**: Via `le="0.0"` bucket

### SR Set Change Detection

```java
List<ByteString> currentSrList =
chainBaseManager.getWitnessScheduleStore().getActiveWitnesses();
Set<String> currentSrSet = currentSrList.stream()
.map(bs -> Hex.toHexString(bs.toByteArray()))
.collect(Collectors.toSet());

if (!previousSrSet.isEmpty() && !currentSrSet.equals(previousSrSet)) {
for (String sr : Sets.difference(currentSrSet, previousSrSet)) {
Metrics.counterInc(MetricKeys.Counter.SR_SET_CHANGE, 1,
sr, MetricLabels.Counter.SR_ADDED);
}
for (String sr : Sets.difference(previousSrSet, currentSrSet)) {
Metrics.counterInc(MetricKeys.Counter.SR_SET_CHANGE, 1,
sr, MetricLabels.Counter.SR_REMOVED);
}
}
previousSrSet = currentSrSet;
```

### Code Style

- Purely additive — zero protocol changes, zero API changes, zero backward compatibility issues
- Uses existing `Metrics.histogramObserve()` pattern for histogram
- Uses existing `Metrics.counterInc()` pattern for counter
- All constants defined in `MetricKeys.java` (no hardcoded strings)
- Java 8 compatible
- No new Gradle dependencies

## Why Histogram for Empty Blocks?

The histogram approach (as suggested by Sunny6889) provides richer insights:

| Approach | Pros |
|----------|------|
| Counter | Simple, single-purpose |
| **Histogram** | Tracks distribution, enables ratio queries, supports percentiles |

Example queries enabled by histogram:
- Empty block ratio over any time window
- Transaction distribution patterns
- Block capacity utilization

## Related Issues

- [java-tron #6590](https://github.com/tronprotocol/java-tron/issues/6590) — Prometheus metrics for empty blocks and SR changes
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ public static class Counter {
public static final String P2P_ERROR = "tron:p2p_error";
public static final String P2P_DISCONNECT = "tron:p2p_disconnect";
public static final String INTERNAL_SERVICE_FAIL = "tron:internal_service_fail";
public static final String SR_SET_CHANGE = "tron:sr_set_change_total";

private Counter() {
throw new IllegalStateException("Counter");
Expand Down Expand Up @@ -62,6 +63,7 @@ public static class Histogram {
public static final String MESSAGE_PROCESS_LATENCY = "tron:message_process_latency_seconds";
public static final String BLOCK_FETCH_LATENCY = "tron:block_fetch_latency_seconds";
public static final String BLOCK_RECEIVE_DELAY = "tron:block_receive_delay_seconds";
public static final String BLOCK_TRANSACTION_COUNT = "tron:block_transaction_count";

private Histogram() {
throw new IllegalStateException("Histogram");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ public static class Counter {
public static final String TXS_FAIL_SIG = "sig";
public static final String TXS_FAIL_TAPOS = "tapos";
public static final String TXS_FAIL_DUP = "dup";
public static final String BLOCK_EMPTY = "empty";
public static final String SR_ADDED = "added";
public static final String SR_REMOVED = "removed";

private Counter() {
throw new IllegalStateException("Counter");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ class MetricsCounter {
init(MetricKeys.Counter.P2P_DISCONNECT, "tron p2p disconnect .", "type");
init(MetricKeys.Counter.INTERNAL_SERVICE_FAIL, "internal Service fail.",
"class", "method");
init(MetricKeys.Counter.SR_SET_CHANGE,
"Total SR set changes during maintenance periods.",
"witness", "change_type");
}

private MetricsCounter() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@ public class MetricsHistogram {
init(MetricKeys.Histogram.BLOCK_FETCH_LATENCY, "fetch block latency.");
init(MetricKeys.Histogram.BLOCK_RECEIVE_DELAY,
"receive block delay time, receiveTime - blockTime.");
init(MetricKeys.Histogram.BLOCK_TRANSACTION_COUNT,
"Distribution of transaction counts per block.",
new double[]{0, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000},
"miner");
}

private MetricsHistogram() {
Expand All @@ -62,6 +66,17 @@ private static void init(String name, String help, String... labels) {
.register());
}

private static void init(String name, String help, double[] buckets, String... labels) {
Histogram.Builder builder = Histogram.build()
.name(name)
.help(help)
.labelNames(labels);
if (buckets != null && buckets.length > 0) {
builder.buckets(buckets);
}
container.put(name, builder.register());
}

static Histogram.Timer startTimer(String key, String... labels) {
if (Metrics.enabled()) {
Histogram histogram = container.get(key);
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
package org.tron.core.metrics.blockchain;

import com.codahale.metrics.Counter;
import com.google.common.collect.Sets;
import com.google.protobuf.ByteString;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.Map;
import java.util.SortedMap;
import java.util.concurrent.ConcurrentHashMap;
Expand Down Expand Up @@ -42,6 +46,7 @@ public class BlockChainMetricManager {
private long failProcessBlockNum = 0;
@Setter
private String failProcessBlockReason = "";
private Set<String> previousSrSet = new HashSet<>();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the applyBlock() call path is currently single-threaded
(protected by the Manager lock), writes to previousSrSet are not guaranteed to be visible to other threads (e.g., Metrics query threads) without proper synchronization.


public BlockChainInfo getBlockChainInfo() {
BlockChainInfo blockChainInfo = new BlockChainInfo();
Expand Down Expand Up @@ -169,6 +174,30 @@ public void applyBlock(BlockCapsule block) {
Metrics.counterInc(MetricKeys.Counter.TXS, block.getTransactions().size(),
MetricLabels.Counter.TXS_SUCCESS, MetricLabels.Counter.TXS_SUCCESS);
}

// Record transaction count distribution for all blocks (including empty blocks)
int txCount = block.getTransactions().size();
Metrics.histogramObserve(MetricKeys.Histogram.BLOCK_TRANSACTION_COUNT, txCount,
StringUtil.encode58Check(address));

// SR set change detection
List<ByteString> currentSrList =
Copy link
Copy Markdown

@warku123 warku123 Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ToXMon, thanks for the contribution! I've identified a few issues that need attention:

1. SR Set Change Detection Logic Issue
The current implementation checks for SR changes on every applyBlock() call without considering the maintenance period:

// Current: runs on EVERY block
List<ByteString> currentSrList = chainBaseManager.getWitnessScheduleStore().getActiveWitnesses();

However, SR set changes only happen during maintenance periods. This could cause:

  • Unnecessary performance overhead (set comparison on every block)
  • False positives if there are temporary witness list fluctuations

2. Label Naming Inconsistency
The Issue #6590 specification suggests using action (add/remove) for the label key, but this PR uses change_type (added/removed). While both work, aligning with the original issue specification would be better for consistency.

3. GitHub CI Check
It looks like the PR may not pass all CI checks yet. Please ensure:

  • All tests pass (./gradlew test)
  • Checkstyle passes (./gradlew checkstyleMain checkstyleTest)
  • No compilation warnings

chainBaseManager.getWitnessScheduleStore().getActiveWitnesses();
Set<String> currentSrSet = currentSrList.stream()
.map(bs -> Hex.toHexString(bs.toByteArray()))
.collect(Collectors.toSet());

if (!previousSrSet.isEmpty() && !currentSrSet.equals(previousSrSet)) {
for (String sr : Sets.difference(currentSrSet, previousSrSet)) {
Metrics.counterInc(MetricKeys.Counter.SR_SET_CHANGE, 1,
sr, MetricLabels.Counter.SR_ADDED);
}
for (String sr : Sets.difference(previousSrSet, currentSrSet)) {
Metrics.counterInc(MetricKeys.Counter.SR_SET_CHANGE, 1,
sr, MetricLabels.Counter.SR_REMOVED);
}
}
previousSrSet = currentSrSet;
}

private List<WitnessInfo> getSrList() {
Expand Down
Loading
Loading