Conversation
| invalid_categories = { | ||
| cat for cat in harm_categories if cat not in self.HARM_CATEGORIES} | ||
| if invalid_categories: | ||
| raise ValueError( |
There was a problem hiding this comment.
I think we likely still want to load these; should we use a default harm category here?
There was a problem hiding this comment.
I agree. The current implementation will just crash the dataset loading if any invalid categories are detected though, so I think we have three options if we find invalid categories:
Option 1: Load them anyway, along with whatever else is in the dataset.
Option 2: Don't load them. Just pass them over and return whatever is in an accepted harm category.
Option 3: Set a default harm category, and if we find anything invalid, only return what matches the default harm category.
I feel like option 3 is the best one and the one you were describing, so I'm going to implement that unless you have any objections
| modalities: list[SeedDatasetModality] = [SeedDatasetModality.TEXT] | ||
| size: SeedDatasetSize = SeedDatasetSize.LARGE # 504 seeds | ||
| # "default" means included in curated set | ||
| tags: set[str] = {"default", "safety"} |
There was a problem hiding this comment.
I don't like mixing the pieces with tags. Would "default" actually be part of "ranks"?
There was a problem hiding this comment.
Agreed, this needs a refactor.
| class SeedDatasetSize(Enum): | ||
| """Ordinal size (by bucket) of the dataset.""" | ||
|
|
||
| TINY = "tiny" # < 10 |
There was a problem hiding this comment.
I recommend adjusting these values, some are truly gigantic. One thing to note though is that it's not necessarily correlated to number of datasets. WDYT if we write a script that can base this on time? That way we can bucket these on the time it takes to load.
So we could change this to SeedDatasetApproximateLoadTime
There was a problem hiding this comment.
There are two concerns I think we should separate in line with my other comments: the literal size, which is more useful for dataset filtering and for setting up AtomicAttacks, etc., and the performance, which is more useful for runtime concerns.
I actually think reviving my original idea to have a SeedDatasetMetadataUtilities class that holds logic for performance and metadata population would be better than a script, but I think it would be better to have as a follow-up because there are a lot of other design choices that aren't relevant to the metadata structuring or filtering logic.
| HUGE = "huge" # >= 5000 | ||
|
|
||
|
|
||
| class SeedDatasetLoadingRank(Enum): |
There was a problem hiding this comment.
What do you think of this being the one class attribute that isn't optional?
There was a problem hiding this comment.
I think it makes a lot of sense to me, but I also want to make sure new datasets can be added easily without users spending too much time adding metadata. Suggestion: loading_rank: SeedDatasetLoadingRank is included in the SeedDatasetProvider base class with the slowest/lowest default value?
There was a problem hiding this comment.
Also a thought that just came to me. The suggestion I just made lets us make rank non-optional while scoping the actual determination of which datasets are faster/slower for a follow-up PR.
| # Metadata | ||
| harm_categories: list[str] = ["cybercrime", "illegal", "harmful", "chemical_biological", "harassment"] | ||
| modalities: list[SeedDatasetModality] = [SeedDatasetModality.TEXT] | ||
| size: SeedDatasetSize = SeedDatasetSize.LARGE # 504 seeds |
There was a problem hiding this comment.
I mention this in another comment, but harmbench actually loads super fast, which makes me thing we may want a better measure
There was a problem hiding this comment.
This is the role I imagined for SeedDatasetLoadingRank, since that's more of a performance measurement than a literal description of the dataset. I'll change it a bit to make that more obvious. What do you think of using SeedDatasetLoadingRank?
Description
Features:
filtersargument to get_all_dataset_names, which rejects datasets that don't meet filter criteria.filtershas typeSeedDatasetFilters.SeedDatasetProviders have two options for storing static metadata (dynamic metadata, like derived attributes at runtime, has been scoped out of this PR):harm_categories), and use types likeSeedDatasetFoobar.*.promptfile as tags, and extracted from it.SeedDatasetMetadataacts as a unified schema and ground truth for logic related to parsing metadata, and we expect only a few class attributes to count as metadata.SeedDatasetMetadatadataclass contains:tags: set[str]; size: SeedDatasetSize; modalities: list[SeedDatasetModality]; source: SeedDatasetSourceType; rank: SeedDatasetLoadingRank; harm_categories: list[str].tags = {"all"}, all filtering logic is bypassed.Issues: reviewers, please provide your opinion on these.
import SeedDatasetMetadata, SeedDatasetSize, ...). This could be fixed by adding constructors that build out the actual metadata enum types from primitive types, but this adds a layer of indirection.SeedDatasetProvidersince the logic of filtering is different from the filtering fields itself, but this may be more confusing.SeedDatasetLoadingRankandSeedDatasetSourceTypeseem like they could be excluded.Potential Follow-Up PRs:
SeedDatasetMetadataUtilitiesclass that could parse the datasets in PyRIT and extract metadata from them programmatically, but this feels like it could be worth a second PR given that the parsing logic makes a lot of implicit decisions.SeedDatasetProvidertoCentralMemoryto allow for complex operations across datasets. For example, consider a user that wants to get all text prompts with string "harm" from two datasets. Something like a SQLJOINwould be ideal in this situation.filtersandCentralMemoryqueries.Tests and Documentation
test_seed_dataset_metadata.pyunder unit tests.test_seed_dataset_provider_integration.pyintests.integration.datasetsto account for local and remote metadata population.