Lock Free Snowflake

A novel, lock-free, multithreaded implementation of a unique, roughly time-ordered, 64-bit id generator for use in distributed server and database systems. Header-only library.

Terminlogy

Snowfake ID - a unique, roughly time-ordered, 64-bit unsigned integer identifier for use in database systems.

Output Sequence - a sequence of Snowflake IDs that are produced by the algorithm.

Server - a multithreaded machine (or collection of such) running one or more server application instances

Process - an instance of a server application running on a server

Timestamp - time in milliseconds from UNIX epoch (or custom epoch).

Machine Process ID - or MPID is a unique ID for a machine the server process it is running. A machine may run more than one server process as long as each server process has a unique MPID. All threads on a process share the same MPID.

Sequence number - a unique per-millisecond number that is shared between threads on a single process. This number is reset to 0 on the start of a new millisecond.

Abstract

Snowflake IDs are a 64-bit unique number that can be used to represent any object within distributed server and database systems. A snowflake is generally comprised of three parts: a timestamp, a machine-process ID, and a per-millisecond sequence number.

Common methods of generating snowflakes utilize a single daemon running a snowflake algorithm. Other server threads may query the generator daemon at any time for a new id without worrying about uniqueness.

Using this approach, however, involves a significant overhead with the interprocess communication for requesting and transfering Snowflake IDs.

My solution to this problem is to run the generation algorithm on each thread by making clever use of atomic synchronization operations. This solution achieves low overhead, high throughput, while maintaining the desired behavior.

Design Constraints

Every generated Snowflake must be unique among all process among all servers
No repeat process IDs, all servers and their process must have unique process IDs. Threads within a single process must have shared the same process IDs.
No repeat sequence numbers within the same process within the same millisecond
The sequence number must reset on the edge of each millisecond
No use of locks (mutexes, semaphores, or otherwise)
All operations on the shared sequence number must be atomically synchonized between threads on the same process.

Implementation

My implementation uses the same format as Twitter's Snowflake implementation:

# Snowflake is stored in the following format:
|--1 bit--|--41 bit timestamp--|--10 bit MPID--|--12 bit sequence number--|
# 64 bit snowflake (frontmost bit not used)
|----------------------------64 bit snowflake-----------------------------|

The sequence number is a 12 bit number within the range [0, 4,095]

The Machine-Process ID (MPID) is a 10 bit number within the range [0, 1,023]

The Timestamp (in milliseconds) is a 41 bit number that resets in about 70 years.

The last bit is unused because database systems may treat an unsigned 64 number incorrectly as a signed 64 bit number.

Usage

To use the library, you will need to include the following file from the /include dir in the project dir:

#include <lfsnowflake/lockfree.h>

To use the library, call the lf::get function with your chosen machine-process ID (MPID):

#include <lfsnowflake/lockfree.h>

int main() {
  using u64 = std::uint64_t;
  // set the machine-process ID to 0 (ie. machine = 0, process = 0)
  u64 const kMpid = 0ull;
  // create a unique snowflake
  u64 const snowflake = lf::getSnowflake(kMpid);
  assert((snowflake != 0ull) && "Received invalid snowflake");
  // do something with the snowflake...
  return 0;
}

The library provides utilities to decompose each snowflake ID into its components if desired:

#include <lfsnowflake/lockfree.h>

int main() {
  using u64 = std::uint64_t;
  // set the machine-process ID to 0 (ie. machine = 0, process = 0)
  u64 const kMpid = 0ull;
  // create a unique snowflake
  u64 const snowflake = lf::getSnowflake(kMpid);
  assert((snowflake != 0ull) && "Received invalid snowflake");
  // decompose the snowflake: extract the timestamp
  u64 const timestamp = lf::utils::getTimestamp(snowflake);
  // extract the
  u64 const mpid = lf::utils::getMpid(snowflake);
  assert((mpid == kMpid) && "Snowflake mpid must equal the input");
  // extract the sequence number
  u64 const sequenceNumber = lf::utils::getSequence(snowflake);
}

Performance

This library contains a number of lockfree algorithms that were tested for multithreaded use for t=1 to t=16. Tests of generating 4,096,000 ids total were run for each algorithm. The results of the tests are shown below with IDs per millisecond on the y-axis (higher is better) vs thread count on the x-axis.*

The table below shows the results from the testing.

From these tests, it is clear that the lockfree::v4d algorithm is the superior algorithm, meeting all design constraints with the highest snowflake ID throughput for most use cases. This algorithm is provided by default with the library for use in your program.

*algorithms that failed to meet the design constraints are omitted from the results.

Building Tests

Clone the repo to your machine and pull the required submodules

cd path/to/cwd
git clone https://github.com/khelmka25/lock-free-snowflake
cd lock-free-snowflake
git submodule update --init --recursive

Create and open the build folder

mkdir build
cd build

Using cmake and make, build the program (you will need cmake, make (or otherwise), and a C++20 compatible compiler)

cmake ..
make

After building the program, it may be used with the following commands:

./snowflake_test PARAMS
# where PARAMS is one or more of the following:
-h          # print help screen and exit
-t <n>      # number of threads to run the tests with
-i <n>      # number of ids to generate per thread
-I <n>      # number of total ids to generate
-lf         # test the lockfree algorithms

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
deps		deps
images		images
include/lfsnowflake		include/lfsnowflake
src		src
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
analysis.py		analysis.py
main.cc		main.cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lock Free Snowflake

Contents

Terminlogy

Abstract

Design Constraints

Implementation

Usage

Performance

Building Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lock Free Snowflake

Contents

Terminlogy

Abstract

Design Constraints

Implementation

Usage

Performance

Building Tests

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages