Skip to content

Bitmap.Clone() data race on copy-on-write bitmaps (no writers) #528

@olga676767

Description

@olga676767

Package: github.com/RoaringBitmap/roaring/v2
Version: v2.18.0
Go: 1.26.2

Summary

Bitmap.Clone() is not safe for concurrent use on a copy-on-write bitmap, even when there are no writers. Two goroutines that each only call src.Clone() on the same shared bitmap race with each other.

When copyOnWrite is enabled, (*roaringArray).clone() calls ra.markAllAsNeedingCopyOnWrite(), which writes the source bitmap's needCopyOnWrite slice (sets every element to true) on every clone. So concurrent clones of one source perform concurrent writes to the same slice → data race.

This is surprising because:

  • Clone() reads like a read-only operation on the source.
  • A COW bitmap is a natural candidate for a shared, immutable snapshot that many goroutines clone concurrently (e.g. a decoded index block held in a cache).

Relevant source

roaringarray.go:

func (ra *roaringArray) clone() *roaringArray {
	sa := roaringArray{}
	sa.copyOnWrite = ra.copyOnWrite

	// this is where copyOnWrite is used.
	if ra.copyOnWrite {
		sa.keys = make([]uint16, len(ra.keys))
		copy(sa.keys, ra.keys)
		sa.containers = make([]container, len(ra.containers))
		copy(sa.containers, ra.containers)
		sa.needCopyOnWrite = make([]bool, len(ra.needCopyOnWrite))

		ra.markAllAsNeedingCopyOnWrite()   // <-- WRITES THE SOURCE
		sa.markAllAsNeedingCopyOnWrite()

		// sa.needCopyOnWrite is shared
	} else {
		// make a full copy
		...
	}
	return &sa
}

func (ra *roaringArray) markAllAsNeedingCopyOnWrite() {
	for i := range ra.needCopyOnWrite {
		ra.needCopyOnWrite[i] = true   // <-- racy write under concurrent Clone()
	}
}

Reproducer

Standalone module that depends only on roaring/v2.

go.mod:

module roaringracerepro

go 1.22

require github.com/RoaringBitmap/roaring/v2 v2.18.0

race_test.go:

package roaringracerepro

import (
	"sync"
	"testing"

	"github.com/RoaringBitmap/roaring/v2"
)

// TestConcurrentCloneOfCOWBitmapRaces shows that Clone() of a single
// copy-on-write bitmap is NOT safe to call concurrently, even though every
// goroutine only "reads" the shared source by cloning it.
//
// Root cause: with copy-on-write enabled, (*Bitmap).Clone() ->
// roaringArray.clone() calls ra.markAllAsNeedingCopyOnWrite(), which WRITES
// the *source* bitmap's needCopyOnWrite slice (sets every element to true).
// Two goroutines cloning the same source therefore write the same slice
// concurrently -> data race.
//
// Run: go test -race -run TestConcurrentCloneOfCOWBitmapRaces
func TestConcurrentCloneOfCOWBitmapRaces(t *testing.T) {
	src := roaring.New()
	// Add enough values across many high-16-bit groups so the bitmap holds
	// multiple containers (i.e. needCopyOnWrite has several entries to scribble on).
	for i := uint32(0); i < 1<<20; i += 977 {
		src.Add(i)
	}
	src.SetCopyOnWrite(true)

	const (
		goroutines = 8
		iterations = 2000
	)
	start := make(chan struct{})
	var wg sync.WaitGroup
	for g := 0; g < goroutines; g++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			<-start // line everyone up so the clones overlap
			for i := 0; i < iterations; i++ {
				_ = src.Clone()
			}
		}()
	}
	close(start)
	wg.Wait()
}

Run:

go test -race -run TestConcurrentCloneOfCOWBitmapRaces

Race detector output

==================
WARNING: DATA RACE
Write at 0x00c000012490 by goroutine 10:
  github.com/RoaringBitmap/roaring/v2.(*roaringArray).markAllAsNeedingCopyOnWrite()
      .../roaring/v2@v2.18.0/roaringarray.go:776 +0x52c
  github.com/RoaringBitmap/roaring/v2.(*roaringArray).clone()
      .../roaring/v2@v2.18.0/roaringarray.go:270 +0x284
  github.com/RoaringBitmap/roaring/v2.(*Bitmap).Clone()
      .../roaring/v2@v2.18.0/roaring.go:1038 +0xc0
  roaringracerepro.TestConcurrentCloneOfCOWBitmapRaces.func1()
      race_test.go:42 +0x9c

Previous write at 0x00c000012490 by goroutine 12:
  github.com/RoaringBitmap/roaring/v2.(*roaringArray).markAllAsNeedingCopyOnWrite()
      .../roaring/v2@v2.18.0/roaringarray.go:776 +0x52c
  github.com/RoaringBitmap/roaring/v2.(*roaringArray).clone()
      .../roaring/v2@v2.18.0/roaringarray.go:270 +0x284
  github.com/RoaringBitmap/roaring/v2.(*Bitmap).Clone()
      .../roaring/v2@v2.18.0/roaring.go:1038 +0xc0
  roaringracerepro.TestConcurrentCloneOfCOWBitmapRaces.func1()
      race_test.go:42 +0x9c
==================
--- FAIL: TestConcurrentCloneOfCOWBitmapRaces (0.04s)
    testing.go:1712: race detected during execution of test
FAIL

Both goroutines write the same address (0x00c000012490), i.e. an element of the source bitmap's needCopyOnWrite slice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions