You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/release-notes/6.10-release-notes.md
+54Lines changed: 54 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,7 @@ Highlights for Dataverse 6.10 include:
12
12
- Optionally require embargo reason
13
13
- Harvesting improvements
14
14
- Croissant support now built in
15
+
- Archiving, OAI-ORE, and BagIt export improvements
15
16
- Support for REFI-QDA Codebook and Project files
16
17
- Review datasets
17
18
- New and improved APIs
@@ -53,6 +54,44 @@ Both Croissant and Schema.org JSON-LD formats can become quite large when the da
53
54
54
55
See also #11254, #12123, #12130, and #12191.
55
56
57
+
### Archiving, OAI-ORE, and BagIt Export Improvements
58
+
59
+
This release includes multiple updates to the OAI-ORE metadata export and the process of creating archival bags, improving performance, fixing bugs, and adding significant new functionality. See #12144, #12129, #12122, #12104, #12103, #12101, and #12213.
60
+
61
+
#### General Archiving Improvements
62
+
63
+
- Multiple performance and scaling improvements have been made for creating archival bags for large datasets, including:
64
+
- The duration of archiving tasks triggered from the version table or API are no longer limited by the transaction time limit.
65
+
- Temporary storage space requirements have increased by `1/:BagGeneratorThreads` of the zipped bag size. (Often this is by half because the default value for `:BagGeneratorThreads` is 2.) This is a consequence of changes to avoid timeout errors on larger files/datasets.
66
+
- The size of individual data files and the total dataset size that will be included in an archival bag can now be limited. Admins can choose whether files above these limits are transferred along with, but outside, the zipped bag (creating a complete archival copy) or are just referenced (using the concept of a "holey" bag and just listing the oversized files and the Dataverse URLs from which they can be retrieved in a `fetch.txt` file). In the holey bag case, an active service on the archiving platform must retrieve the oversized files (using appropriate credentials as needed) to make a complete copy.
67
+
- Superusers can now see a pending status in the dataset version table while archiving is active.
68
+
- Workflows are now triggered outside the transactions related to publication, assuring that workflow locks and status updates are always recorded.
69
+
- Potential conflicts between archiving/workflows, indexing, and metadata exports after publication have been resolved, avoiding cases where the status/last update times for these actions were not recorded.
70
+
- A bug has been fixed where superusers would incorrectly see the "Submit" button to launch archiving from the dataset page version table.
71
+
- The local, S3, and Google archivers have been updated to support deleting existing archival files for a version to allow re-creating the bag for a given version.
72
+
- For archivers that support file deletion, it is now possible to recreate an archival bag after "Update Current Version" has been used (replacing the original bag). By default, Dataverse will mark the current version's archive as out-of-date, but will not automatically re-archive it.
73
+
- A new "obsolete" status has been added to indicate when an archival bag exists for a version but it was created prior to an "Update Current Version" change.
74
+
- Improvements have been made to file retrieval for bagging, including retries on errors and when download requests are being throttled.
75
+
- A bug causing `:BagGeneratorThreads` to be ignored has been fixed, and the default has been reduced to 2.
76
+
- Retrieval of files for inclusion in an archival bag is no longer counted as a download.
77
+
- It is now possible to require that all previous versions have been successfully archived before archiving of a newly published version can succeed. This is intended to support use cases where deduplication of files between dataset versions will be done and is a step towards supporting the Oxford Common File Layout (OCFL).
78
+
- The pending status has changed to use the same JSON format as other statuses.
79
+
80
+
#### OAI-ORE Export Updates
81
+
82
+
- The export now uses URIs for checksum algorithms, conforming with JSON-LD requirements.
83
+
- A bug causing failures with deaccessioned versions has been fixed. This occurred when the deaccession note ("Deaccession Reason" in the UI) was null, which is permissible via the API.
84
+
- The `https://schema.org/additionalType` has been updated to "Dataverse OREMap Format v1.0.2" to reflect format changes.
85
+
86
+
#### Archival Bag (BagIt) Updates
87
+
88
+
- The `bag-info.txt` file now correctly includes information for dataset contacts, fixing a bug where nothing was included when multiple contacts were defined. (Multiple contacts were always included in the OAI-ORE file in the bag; only the baginfo file was affected).
89
+
- Values used in the `bag-info.txt` file that may be multi-line (i.e. with embedded CR or LF characters) are now properly indented and wrapped per the BagIt specification (`Internal-Sender-Identifier`, `External-Description`, `Source-Organization`, `Organization-Address`).
90
+
- The dataset name is no longer used as a subdirectory within the `data/` directory to reduce issues with unzipping long paths on some filesystems.
91
+
- For dataset versions with no files, the empty `manifest-<alg>.txt` file will now use the algorithm from the `:FileFixityChecksumAlgorithm` setting instead of defaulting to MD5.
92
+
- A new key, `Dataverse-Bag-Version`, has been added to `bag-info.txt` with the value "1.0" to allow for tracking changes to Dataverse's archival bag generation over time.
93
+
- When using the `holey` bag option discussed above, the required `fetch.txt` file will be included.
94
+
56
95
### Support for REFI-QDA Codebook and Project Files
57
96
58
97
.qdc and .qdpx files are now detected as [REFI-QDA standard](https://www.qdasoftware.org) Codebook and Project files, respectively, for qualitative data analysis, which allows them to be used with the new REFI QDA Previewers. See https://github.com/gdcc/dataverse-previewers/pull/137 for screenshots.
@@ -124,6 +163,16 @@ This release contains important security updates. If you are not receiving secur
124
163
125
164
Generally speaking, see the [API Changelog](https://guides.dataverse.org/en/latest/api/changelog.html) for a list of backward-incompatible API changes.
126
165
166
+
### Archival Zip Filename Change
167
+
168
+
The filename of the archival zipped bag produced by the `LocalSubmitToArchiveCommand` archiver now has a "." character before the "v" (for version number) to mirror the filename used by other archivers. For example, the filename will look like
169
+
170
+
`doi-10-5072-fk2-fosg5q.v1.0.zip`
171
+
172
+
rather than
173
+
174
+
`doi-10-5072-fk2-fosg5qv1.0.zip`.
175
+
127
176
### Dataset Types Must Be Allowed, Per-Collection, Before Use
128
177
129
178
In previous releases of Dataverse, as soon as additional dataset types were added (such as "software", "workflow", etc.), they could be used by all users when creating datasets (via API only). As of this release, on a per-collection basis, superusers must allow these dataset types to be used. See #12115 and #11753.
@@ -153,11 +202,16 @@ See [dataverse.auth.oidc.hidden-jsf](https://guides.dataverse.org/en/6.10/instal
153
202
### New JVM Options (MicroProfile Config Settings)
0 commit comments