Skip to content

Adding "networkit" flavour to scanpy.tl.leiden for parallel community detection#4170

Draft
amalia-k510 wants to merge 1 commit into
scverse:mainfrom
amalia-k510:networkit_gve
Draft

Adding "networkit" flavour to scanpy.tl.leiden for parallel community detection#4170
amalia-k510 wants to merge 1 commit into
scverse:mainfrom
amalia-k510:networkit_gve

Conversation

@amalia-k510

Copy link
Copy Markdown
Contributor

This PR adds NetworKit's ParallelLeiden as a new flavor option for scanpy.tl.leiden, enabling multithreaded Leiden community detection via scanpy.tl.leiden(adata, flavor="networkit"). The motivation behind is that scanpy's current Leiden backends (igraph, leidenalg) are single-threaded. For atlas-scale datasets (500k+ cells), clustering becomes a bottleneck. NetworKit's ParallelLeiden is a C++ parallel implementation available via pip with no additional compilation, making it the lowest-friction path to parallel Leiden in scanpy.

@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

⚠️ JUnit XML file not found

The CLI was unable to find any JUnit XML files to upload.
For more help, visit our troubleshooting guide.

@ilan-gold ilan-gold left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Would be best to run the installation check :)

Also please add an extra to our pyproject.toml.

Let's try to use a heuristic about warning users about large inputs to igraph and the faster alternative

raise ImportError(msg)


def ensure_network() -> None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used?

Suggested change
def ensure_network() -> None:
def ensure_networkit() -> None:



def ensure_network() -> None:
if importlib.util.find_spec("netowrkit"):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if importlib.util.find_spec("netowrkit"):
if importlib.util.find_spec("networkit"):

Comment on lines +193 to +194
seed = int(rng.integers(np.iinfo(np.int64).max))
networkit.setSeed(seed, useThreadId=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite this, things aren't deterministic, right? Just making sure, but if that's the case, please comment why we do this (helps with determinism even if it's not perfect maybe?)

networkit.setSeed(seed, useThreadId=True)
# only undirected for Parallel Leiden
g = _utils.get_networkit_from_adjacency(adjacency, weighted=use_weights)
iterations = n_iterations if n_iterations > 0 else 3

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did the number 3 come from?

@ilan-gold

Copy link
Copy Markdown
Contributor

Also can you point this branch at networkit/networkit#1422 (and then main after merge followed by the correct min version when released) and use this method (also in your notebook) to make sure it works properly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants