Skip to content

Commit f071ea7

Browse files
egrace479gwtaylor
andcommitted
Clarify license and citation guidance for Hugging Face
Pull from Collab Guide [PR 51](Imageomics/Collaborative-distributed-science-guide#51) * Clarify license section, include link to policy clear up confusion over need for file or use of MIT license for datasets * Remove 'Imageomics' in front of policy, no need for that specification here * Add pro-tip to use data/model card checklists * Clarify license recommendations/references in templates aligns with repo guide page clarification * Add citation clarification in note under standard files * Add choose-a-license link back in still a good reference for both datasets and models * Clarify license not supported, as it's more about how the system works include also the links to the repo card templates * fix: correct broken link and abbreviation in dataset card template Remove extra opening parenthesis in the Digital Products Release and Licensing Policy link that broke the markdown rendering. Also fix "eg." to "e.g." on the same line for correctness. --------- Co-authored-by: Graham Taylor <gwtaylor@gmail.com>
1 parent 8abc799 commit f071ea7

4 files changed

Lines changed: 23 additions & 10 deletions

File tree

docs/wiki-guide/HF_DatasetCard_Template_ABC.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,8 @@ description: # Add a short description (summary) of your dataset, this will rend
1717
1818
NOTE: Add more tags (your particular animal, type of model and use-case, etc.).
1919
20-
As with your GitHub Project repo, it is important to choose an appropriate license for your dataset. The default license is [CC0](https://creativecommons.org/publicdomain/zero/1.0/) (public domain dedication, see [Dryad's explanation of why to use CC0](https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-reuse-with-cc0-licensing/)). Alongside the appropriate stakeholders (eg., your PI, co-authors), select a license that is [Open Source Initiative](https://opensource.org/licenses) (OSI) compliant.
21-
For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com) and [A Quick Guide to Software Licensing for the Scientist-Programmer](https://doi.org/10.1371/journal.pcbi.1002598) by A. Morin, et al.
22-
See the [ABC Global Center policy for licensing](https://docs.google.com/document/d/1SlITG-r7kdJB6C8f4FCJ9Z7o7ccwldZoSRJKjhRAWVA/edit#heading=h.c1sxg0wsiqru) for more information.
20+
As with your GitHub Project repo, it is important to choose an appropriate license for your dataset. The default license is [CC0](https://creativecommons.org/publicdomain/zero/1.0/) (public domain dedication, see [Dryad's explanation of why to use CC0](https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-reuse-with-cc0-licensing/)). Alongside the appropriate stakeholders (eg., your PI, co-authors), select a license that is following the guidelines set forth in the [ABC Digital Products Release and Licensing Policy](https://ABC-Center.github.io/ABC-guide/wiki-guide/Digital-products-release-licensing-policy/). For Datasets, this would be public domain or terms no more restrictive than requiring attribution (e.g., [CC-BY](https://creativecommons.org/licenses/by/4.0/)).
21+
For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com). List of [HF license identifiers](https://huggingface.co/docs/hub/en/repositories-licenses) (for yaml).
2322
2423
See more options for the above information by clicking "edit dataset card" on your repo.
2524

docs/wiki-guide/HF_DatasetCard_Template_mkdocs.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
Below is the Dataset Card template for ABC. You can download or copy the dataset card content and paste it into a new Markdown file to create a README for your dataset.
44

5+
!!! tip "Pro tip"
6+
Use the [Data Card Checklist](Data-Checklist.md) to help keep track of your progress.
7+
58
<details open>
69
<summary>ABC</summary>
710
</br>

docs/wiki-guide/HF_ModelCard_Template_mkdocs.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
Below is the Model Card template for ABC. You can download or copy the model card content and paste it into a new Markdown file to create a README for your model repo.
44

5+
!!! tip "Pro tip"
6+
Use the [Model Card Checklist](Model-Checklist.md) to help keep track of your progress.
7+
58
<details open>
69
<summary>ABC</summary>
710
</br>

docs/wiki-guide/Hugging-Face-Repo-Guide.md

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,33 +6,41 @@ Need a repository to store your data or model? You've come to the right place! B
66

77
### Standard Files
88

9-
For each repository, include the following files in the root directory as soon as possible; a license can (and should) be instantiated when you create a new repository, and the standard `.gitattributes` will be generated for you. On the [ABC HF](https://huggingface.co/ABC-Center) select `New` and pick which type of repository you need.
9+
For each repository, include the following files and metadata in the root directory as soon as possible; a license can (and should) be instantiated with the Dataset or Model card (`README.md`), and the standard `.gitattributes` will be generated for you. On the [ABC HF](https://huggingface.co/ABC-Center) select `New` and pick which type of repository you need.
1010

1111
- [README.md](#readme)
12-
- [LICENSE.md](#license)
12+
- [License](#license)
1313
- [.gitignore](#gitignore)
1414
- [.gitattributes](#gitattributes)
1515

16+
!!! note
17+
Hugging Face does not support the use of a `CITATION.cff`. Instead, citation guidance is provided in the Citation Section of the [Dataset](HF_DatasetCard_Template_mkdocs.md) or [Model](HF_ModelCard_Template_mkdocs.md) card. When [generating a DOI on Hugging Face](DOI-Generation.md##1-generate-a-doi-on-hugging-face), author names must be added manually in the intended order for them to be displayed in the DOI "Cite this dataset" link.
18+
1619
#### README
1720

1821
The README.md file is generally referred to as either a Dataset or Model Card and is what everyone will notice first when they open your repository on Hugging Face. Choose the appropriate ABC-specific HF template ([model](HF_ModelCard_Template_mkdocs.md) or [dataset](HF_DatasetCard_Template_mkdocs.md)) to get started. Be sure to include a brief description and as much information as possible at the beginning. You can update this file as you go, so don't remove the recommended sections prior to completion. The templates include descriptions of many fields, ABC grant information, citation formatting, and some notes on HF-flavored markdown to get you started.
1922

2023
Once you've created your repo, populate your README (you can do this online by selecting "Create Dataset/Model Card" and pasting in the appropriate ABC HF template, then filling in your info). Editing your README in the browser allows you to preview the formatting of the file before committing changes.
2124

22-
#### LICENSE
25+
#### License
2326

2427
##### 1. Select a license
2528

26-
Alongside the appropriate stakeholders, select a license that is [Open Source Initiative](https://opensource.org/licenses) (OSI) compliant.
29+
Alongside the appropriate stakeholders, select a license following the guidelines set forth in the [Digital Products Release and Licensing Policy](Digital-products-release-licensing-policy.md). For Datasets, this would be public domain or terms no more restrictive than requiring attribution (e.g., [CC-BY](https://creativecommons.org/licenses/by/4.0/)); Models and Spaces should be released under a license that is [Open Source Initiative](https://opensource.org/licenses) (OSI) compliant.
2730

2831
!!! note "Remember"
2932
A public repository on Hugging Face with no license can be viewed and accessed by others, but unless the author associates a license, it is unclear what others are allowed to do with it legally. Adding an OSI license can help others feel comfortable building off your work!
3033

31-
For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com) and [A Quick Guide to Software Licensing for the Scientist-Programmer](https://doi.org/10.1371/journal.pcbi.1002598) by A. Morin, et al.
34+
For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com). Keep in mind that your available license options may also be limited by your data sources or base model. Data should not be republished where not explicitly warranted or required.[^1]
35+
36+
[^1]: For instance, when working with images aggregated from multiple sources, a catalog of all images used with URLs to access the images and download instructions ([cautious-robot](Helpful-Tools-for-your-Workflow.md#cautious-robot) can help with this) respects the original source data producers interests. However, if you have processed the images in a resource-intensive pipeline and the image licenses allow, the _processed_ images should be published for ease of re-use. In this case, it is important to provide the citation for the source data as well.
37+
38+
##### 2. Add a license to the repository
3239

33-
##### 2. Add LICENSE.md to the repository
40+
Once a license has been chosen (if not initialized with one), add the appropriate license identifier in the `yaml` portion of the README (the web UI generates a dropdown of recommendations under "Edit dataset/model card", [license identifiers](https://huggingface.co/docs/hub/en/repositories-licenses)).
3441

35-
Once a license has been chosen (if not initialized with one), add the appropriate license label in the `yaml` portion of the README (the web UI generates a dropdown of recommendations under "Edit dataset/model card").
42+
!!! note
43+
Unlike in GitHub, a `LICENSE.md` file is not supported. Instead, the license for the digital object is added through the `yaml` (for ease of API access) and further clarifications can be included in the License Section of the [Dataset](HF_DatasetCard_Template_mkdocs.md) or [Model](HF_ModelCard_Template_mkdocs.md) card.
3644

3745
#### gitignore
3846

0 commit comments

Comments
 (0)