Adding "networkit" flavour to scanpy.tl.leiden for parallel community detection#4170
Adding "networkit" flavour to scanpy.tl.leiden for parallel community detection#4170amalia-k510 wants to merge 1 commit into
Conversation
|
ilan-gold
left a comment
There was a problem hiding this comment.
Nice! Would be best to run the installation check :)
Also please add an extra to our pyproject.toml.
Let's try to use a heuristic about warning users about large inputs to igraph and the faster alternative
| raise ImportError(msg) | ||
|
|
||
|
|
||
| def ensure_network() -> None: |
There was a problem hiding this comment.
Where is this used?
| def ensure_network() -> None: | |
| def ensure_networkit() -> None: |
|
|
||
|
|
||
| def ensure_network() -> None: | ||
| if importlib.util.find_spec("netowrkit"): |
There was a problem hiding this comment.
| if importlib.util.find_spec("netowrkit"): | |
| if importlib.util.find_spec("networkit"): |
| seed = int(rng.integers(np.iinfo(np.int64).max)) | ||
| networkit.setSeed(seed, useThreadId=True) |
There was a problem hiding this comment.
Despite this, things aren't deterministic, right? Just making sure, but if that's the case, please comment why we do this (helps with determinism even if it's not perfect maybe?)
| networkit.setSeed(seed, useThreadId=True) | ||
| # only undirected for Parallel Leiden | ||
| g = _utils.get_networkit_from_adjacency(adjacency, weighted=use_weights) | ||
| iterations = n_iterations if n_iterations > 0 else 3 |
There was a problem hiding this comment.
Where did the number 3 come from?
|
Also can you point this branch at networkit/networkit#1422 (and then |
This PR adds NetworKit's ParallelLeiden as a new flavor option for
scanpy.tl.leiden, enabling multithreaded Leiden community detection viascanpy.tl.leiden(adata, flavor="networkit"). The motivation behind is that scanpy's current Leiden backends (igraph, leidenalg) are single-threaded. For atlas-scale datasets (500k+ cells), clustering becomes a bottleneck. NetworKit's ParallelLeiden is a C++ parallel implementation available via pip with no additional compilation, making it the lowest-friction path to parallel Leiden in scanpy.