MDEV-21423 - lock-free trx_sys get performance regression cause by lf_find and ut_delay by svoj · Pull Request #5043 · MariaDB/server

svoj · 2026-05-05T22:10:51Z

TBD

gemini-code-assist

Code Review

This pull request introduces a new rw_trx_ids_t class to manage read-write transaction IDs and serialization numbers, moving them from the hash elements into a centralized vector within the transaction system. This involves significant updates to transaction registration, deregistration, and snapshotting logic across InnoDB. Feedback highlights a critical thread-safety issue where the ids vector is accessed without a read lock during potential reallocations, which could lead to memory corruption. Additionally, the use of memset to initialize a synchronization primitive was identified as unsafe and should be removed in favor of the standard initialization call.

…_find and ut_delay TBD

dr-m · 2026-05-06T08:56:08Z

+  {
+    ut_ad(trx->rw_trx_ids_slot != std::numeric_limits<uint32_t>::max());
+    ut_ad(trx->id < trx->no);
+    latch.rd_lock(SRW_LOCK_CALL);


Can we avoid adding PERFORMANCE_SCHEMA instrumentation for this? That would remove some conditional branches in frequently executed code. It could make a measurable performance impact.

#4784 is normally using the type srw_lock_low for log_sys.latch. Given that this latch is private, I don’t think there is a need to use a different type in debug builds.

dr-m · 2026-05-06T09:03:36Z

+  trx_id_t snapshot_ids(trx_ids_t &view_ids,
+                        const trx_id_t max_trx_id) const noexcept
+  {
+    trx_id_t min_trx_no{max_trx_id};
+    view_ids.clear();
+    latch.rd_lock(SRW_LOCK_CALL);
+    view_ids.reserve(ids.size());


std::vector::reserve can throw std::bad_alloc. This would not be the only place where we’d happily crash when running out of memory. I’m not familiar with what would happen when a noexcept function fails to handle an exception. I guess we’d just crash. But would the diagnostics for that be acceptable? Could you test this with an artificially bloated allocation and a tight ulimit -m?

The parameters and the return value are not documented with Doxygen comments.

dr-m · 2026-05-06T09:10:34Z

+  void deregister_rw(const trx_t *trx) noexcept
+  {
+    ut_ad(trx->rw_trx_ids_slot != std::numeric_limits<uint32_t>::max());
+    latch.rd_lock(SRW_LOCK_CALL);
+    ut_ad(ids[trx->rw_trx_ids_slot].id != TRX_ID_MAX);
+    ut_ad(ids[trx->rw_trx_ids_slot].id == trx->id);
+    ut_ad(ids[trx->rw_trx_ids_slot].no == trx->no);
+    ids[trx->rw_trx_ids_slot].id= TRX_ID_MAX;
+    ids[trx->rw_trx_ids_slot].no= TRX_ID_MAX;
+    latch.rd_unlock();
+  }


When I reviewed the code that GCC 16 -O2 generated for an earlier version of this patch, the address calculation for the assignments was being duplicated. I would suggest to rewrite this as follows:

void deregister_rw(const trx_t *trx) noexcept { latch.rd_lock(); ut_ad(trx->rw_trx_ids_slot < ids.size()); rw_trx_id &trx_ids= ids[trx->rw_trx_ids_slot]; ut_ad(trx_ids.id == trx->id); ut_ad(trx_ids.no == trx->no); trx_ids.id= TRX_ID_MAX; trx_ids.no= TRX_ID_MAX; latch.rd_unlock(); }

dr-m · 2026-05-06T09:13:55Z

+  struct rw_trx_id
+  {
+    Atomic_relaxed<trx_id_t> id{TRX_ID_MAX};
+    Atomic_relaxed<trx_id_t> no{TRX_ID_MAX};
+    trx_t *trx;
+    rw_trx_id(trx_t *t): trx(t) {}
+  };


The data members need to be documented with Doxygen comments. Do we strictly need all 3 fields, or are some of them a cache that allows us to avoid dereferencing the trx? The comment should explain when trx is dereferenced.

trx is needed for deregister_trx() to efficiently update trx->rw_trx_ids_slot of moved transaction, that's it. I will document it separately.

dr-m · 2026-05-06T09:17:02Z

+  void register_rw(const trx_t *trx) noexcept
+  {
+    ut_ad(trx->rw_trx_ids_slot != std::numeric_limits<uint32_t>::max());
+    ut_ad(trx->no == TRX_ID_MAX);
+    latch.rd_lock(SRW_LOCK_CALL);
+    ut_ad(ids[trx->rw_trx_ids_slot].id == TRX_ID_MAX);
+    ut_ad(ids[trx->rw_trx_ids_slot].no == TRX_ID_MAX);
+    ids[trx->rw_trx_ids_slot].id= trx->id;


Could we replace the first assertion with a tighter one, inside the critical section?

ut_ad(trx->rw_trx_ids_slot < ids.size());

Can we also assert the following inside the critical section?

ut_ad(&ids[trx->rw_trx_ids_slot] == trx);

dr-m · 2026-05-06T09:20:18Z

  void assign_new_trx_no(trx_t *trx)
  {
-    trx->rw_trx_hash_element->no= get_new_trx_id_no_refresh();
+    trx->no= get_new_trx_id_no_refresh();
+    rw_trx_ids.assign_new_trx_no(trx);
    refresh_rw_trx_hash_version();
  }


As far as I can tell, the field trx_t::no is duplicating a field in an element of rw_trx_ids.ids. We can just pass the data directly by value, not via a redundant data member of trx:

rw_trx_ids.assign_new_trx_no(trx, get_new_trx_id_no_refresh());

In addition to this, trx_t::write_serialisation_history() would have to be modified so that it’d pass the commit number to trx_purge_add_undo_to_history() by value.

dr-m · 2026-05-06T09:23:18Z

-    while ((arg.m_id= get_rw_trx_hash_version()) != get_max_trx_id())
+    while ((max_trx_id= get_rw_trx_hash_version()) != get_max_trx_id())
      ut_delay(1);


ut_delay() is always being invoked with the parameter 1. Could we merge it with LF_BACKOFF()? For some reason, the latter is missing HMT_low() and HMT_medium(). The impact of this would be good to test on POWER.

LF_BACKOFF() and ut_delay() are a mix of a warm and a soft. I'd say ut_delay() is more correct on POWER, because MY_RELAX_CPU() on POWER is a regular __builtin_ppc_get_timebase() instruction, which isn't anywhere similar to Intel PAUSE.

A good POWER implementation would probably make use of Hardware Transactional Memory (HTM) surrounded with HMT_low() / HMT_medium().

Less good, but still alright would be:

HMT_low(); while ((max_trx_id= get_rw_trx_hash_version()) != get_max_trx_id()) /* no-op */; HMT_medium();

Or even:

if ((max_trx_id= get_rw_trx_hash_version()) != get_max_trx_id()) { HMT_low(); while ((max_trx_id= get_rw_trx_hash_version()) != get_max_trx_id()) /* no-op */; HMT_medium(); }

It is evident that both LF_BACKOFF() and ut_delay() were not properly tested on POWER. I don't like performing another blind attempt within the scope of this task.

dr-m · 2026-05-06T09:24:35Z

-  srw_spin_mutex mutex;
+  mutable srw_spin_mutex mutex;
 #ifdef UNIV_DEBUG
  /** The owner of mutex (0 if none); protected by mutex */
-  std::atomic<pthread_t> mutex_owner{0};
+  mutable std::atomic<pthread_t> mutex_owner{0};
 #endif /* UNIV_DEBUG */
 public:
  void mutex_init() { mutex.init(); }
  void mutex_destroy() { mutex.destroy(); }

  /** Acquire the mutex */
-  void mutex_lock()
+  void mutex_lock() const
  {
    ut_ad(!mutex_is_owner());
    mutex.wr_lock();
    assert(!mutex_owner.exchange(pthread_self(),
                                 std::memory_order_relaxed));
  }
  /** Release the mutex */
-  void mutex_unlock()
+  void mutex_unlock() const


These trx_t member functions are no longer being used by the patch. Hence, these modifications could be removed, right?

svoj requested a review from dr-m May 5, 2026 22:10

gemini-code-assist Bot reviewed May 5, 2026

View reviewed changes

Comment thread storage/innobase/include/trx0sys.h

Comment thread storage/innobase/include/trx0sys.h Outdated

dr-m reviewed May 6, 2026

View reviewed changes

Comment thread storage/innobase/include/trx0sys.h Outdated

Comment thread storage/innobase/include/trx0sys.h Outdated

Comment thread storage/innobase/include/trx0sys.h Outdated

Comment thread storage/innobase/log/log0log.cc Outdated

MDEV-21423 - lock-free trx_sys get performance regression cause by lf…

6d2f280

…_find and ut_delay TBD

svoj force-pushed the pr-main-MDEV-21423 branch from 3b998f0 to 6d2f280 Compare May 6, 2026 07:11

dr-m reviewed May 6, 2026

View reviewed changes

Uh oh!

Conversation

svoj commented May 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants