Skip to content

Use random-shuffle list shuffling instead of random >= 1.3#169

Merged
mchav merged 6 commits intoDataHaskell:mainfrom
jisantuc:maint/js/downgrade-random
Feb 27, 2026
Merged

Use random-shuffle list shuffling instead of random >= 1.3#169
mchav merged 6 commits intoDataHaskell:mainfrom
jisantuc:maint/js/downgrade-random

Conversation

@jisantuc
Copy link
Contributor

Overview

This PR downgrades random from >= 1.3 to between 1.2 and 1.3. random 1.3+ plays poorly with some packages in the ecosystem.
Instead of uniformShuffleListM, it uses shuffle' from random-shuffle, which was already a test dependency.

@jisantuc
Copy link
Contributor Author

Per the contributing guidelines, I tried to add a label (I assumed that was what "A tag (usually feat, documentation, refactor etc)" meant?) but it seems like I'm not allowed to do that 🤔


shuffledIndices :: (RandomGen g) => g -> Int -> VU.Vector Int
shuffledIndices pureGen k = VU.fromList (fst (uniformShuffleList [0 .. (k - 1)] pureGen))
shuffledIndices pureGen k = VU.fromList (shuffle' [0 .. (k - 1)] k pureGen)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty unclear to me how to test this. I thought about a shuffle/un-shuffle identity test, but unshuffle didn't get me very far in search results 😅

Any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of things I would test then.

  1. test that the shuffling doesn't do anything else than shuffle. So basically sort the shuffled and unshuffled and see if it's equal to the same thing. This ensures that shuffling isn't doing anything else than permuting the indices.

  2. check that shuffling with equivalent seeds result in the same shuffle.

That's about all I can think of

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. No need to round trip. Checking that shuffling preserves length (even when there are duplicates) is probably important. Plus that different seeds are different shuffle orders.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also on second thought the intermediate list allocation is wasteful. I'll add it as a GSOC task to implement fisher yates here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that task shouldn't be GSOC, it should be anyone! Also mwc-random does that I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? It seems simple and self contained enough since it's reading the algorithm and implementing it through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there could be some task pipeline for people who are not interested only in GSOC, but more interested generally in contributing!

@jisantuc
Copy link
Contributor Author

I used this branch's package in the plot survey branch I started with no problems other than needing to roll my own list generator with replicateM and state jisantuc/goofing-off@7c9b310#diff-206b9ce276ab5971a2489d75eb1b12999d4bf3843b7988cbe8d687cfde61dea0R4

@mchav
Copy link
Member

mchav commented Feb 27, 2026

@jisantuc i think there's an issue with the test module name. Once that's fixed this will be good to go.

@jisantuc
Copy link
Contributor Author

@mchav yeah I forgot to rename the module after I understood the test naming convention a little better. Fixed now though

@mchav mchav merged commit 03e34ab into DataHaskell:main Feb 27, 2026
7 checks passed
@jisantuc
Copy link
Contributor Author

Per the contributing guidelines, I tried to add a label (I assumed that was what "A tag (usually feat, documentation, refactor etc)" meant?) but it seems like I'm not allowed to do that 🤔

🤦🏻 @mchav I just noticed other commits on main -- you meant a tag in the commit header itself, like in the conventional commits style?

@jisantuc jisantuc deleted the maint/js/downgrade-random branch February 27, 2026 05:30
@Ai-Ya-Ya
Copy link

Ai-Ya-Ya commented Feb 27, 2026

@mchav What's your timeline on the next Hackage release? nixpkgs CI should pull from there, the hope being it'll be unbroken.

@mchav
Copy link
Member

mchav commented Feb 27, 2026

@Ai-Ya-Ya released: https://hackage.haskell.org/package/dataframe-0.5.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants