docs: document the backup/restore ZIP archive format#604
docs: document the backup/restore ZIP archive format#604irfanuddinahmad wants to merge 3 commits into
Conversation
Adds a reference page describing the TOML-based ZIP format produced by `create_zip_file` / `lp_dump` and consumed by `load_learning_package` / `lp_load`. Covers the full archive layout, every TOML file schema with field-level descriptions and annotated examples drawn from the test fixtures, the XBlock XML placement convention, and quick-start usage snippets for both the management commands and the Python API. Closes openedx#492 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for the pull request, @irfanuddinahmad! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
There was a problem hiding this comment.
Pull request overview
This PR adds official documentation for the ZIP-based learning-package backup/restore format used by the backup_restore applet, and links it into the openedx_content docs section so operators and developers can understand and inspect archives produced/consumed by lp_dump / lp_load.
Changes:
- Add a new reference page documenting the archive layout and TOML/XML schemas used in backup ZIPs.
- Include export/restore quick-start examples for both management commands and the Python API.
- Link the new page from the
docs/openedx_contentindex.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| docs/openedx_content/index.rst | Adds the new backup/restore format page to the openedx_content docs toctree. |
| docs/openedx_content/backup_restore.rst | New documentation page describing the backup ZIP layout and file formats. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Overview: clarify only draft+published versions exported, not full history - origin_server: free-form string, not validated hostname - [learning_package] heading: note key may be overridden, updated not restored - updated field: mark as reference-only, not applied during restore - [entity.published]: always present (empty table with comment when unpublished) - [[version]]: at most 2 entries — draft first, then published if different - Example: fix version order to draft (v5) first, then published (v4) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@ormsbee did you say that you had a pending review on this, or should I do a close read? |
ormsbee
left a comment
There was a problem hiding this comment.
Thank you for your patience on this review.
| Backup / Restore Format | ||
| ======================= | ||
|
|
||
| The ``backup_restore`` applet lets you export a learning package (V2 content |
There was a problem hiding this comment.
| The ``backup_restore`` applet lets you export a learning package (V2 content | |
| The ``backup_restore`` applet lets you back up a learning package (V2 content |
We're intentionally trying to use "backup/restore" to distinguish it between incremental import/export functionality that we plan to add in the future.
There was a problem hiding this comment.
Updated to use "back up" consistently. That distinction from future incremental import/export will matter.
| published versions are exported — the full version history is not preserved. | ||
|
|
||
| The archive uses `TOML <https://toml.io>`_ for all metadata files and keeps the | ||
| actual XBlock content as XML (the same ``block.xml`` format Studio has always |
There was a problem hiding this comment.
| actual XBlock content as XML (the same ``block.xml`` format Studio has always | |
| component XBlock content as XML (the same OLX format Studio has always |
In modulestore, the XML files are not named block.xml. Also, the old XML format is being kept for components (e.g. problems, videos), but not for structural container types like units and subsections.
There was a problem hiding this comment.
Also, it's probably worth noting that the naming is different--in courses, each component would be exported with it's block_id as the name of the file. That's usually a machine-generated ID (since that's the default in Split) but sometimes it's a meaningful identifier when authored by hand. For our export format, it the OLX is always block.xml, and it's the metadata in the parent TOML file that gives the identifier.
There was a problem hiding this comment.
I'll add a note in the block.xml section clarifying that unlike the old modulestore OLX export (where each component file was named by its block_id), this format always uses block.xml with the identifier recorded in the parent TOML. That should help readers familiar with the old format understand the difference.
There was a problem hiding this comment.
Applied — "OLX format" is more precise and the "component" qualifier correctly limits the claim to XBlocks, not structural containers.
| -------- | ||
|
|
||
| A backup ZIP is a self-contained snapshot of one learning package. It captures | ||
| every component, collection, container (sections / subsections / units), and |
There was a problem hiding this comment.
| every component, collection, container (sections / subsections / units), and | |
| every component, collection, container (section / subsection / unit), and |
| Overview | ||
| -------- | ||
|
|
||
| A backup ZIP is a self-contained snapshot of one learning package. It captures |
There was a problem hiding this comment.
We should clarify the difference between a Learning Package and a Library. Namely, that a Library has one and only one Learning Package where it stores its content, but Learning Packages can also stand alone. The restore process creates a temporary Learning Package that can be reviewed by the user, and then later associates that Learning Package with a newly created Library.
There was a problem hiding this comment.
The doc was using the two interchangeably — I'll add a note to the Overview explaining: a Library holds exactly one Learning Package; Learning Packages can also exist independently. The restore flow reflects this — it first creates a standalone Learning Package for inspection, then the user associates it with a new Library.
| When provided it overrides the ``key`` stored in ``package.toml``, which | ||
| is useful when importing a library under a new reference. |
There was a problem hiding this comment.
We should use stronger language here. It's really dangerous to trust the archive for either the package_ref or the user, and callers should explicitly pass those to load_learning_package unless they really, really know what they're doing.
There was a problem hiding this comment.
Updated — I'll add an explicit warning that callers should always pass package_ref rather than relying on the key in the archive, since trusting untrusted archive content is a security risk.
| title = "Text" | ||
| version_num = 4 | ||
|
|
||
| Container entity TOML (``entities/<slug>.toml``) |
There was a problem hiding this comment.
We should explain what a <slug> is: This is the last part of the entity_ref if there is no collision, but if the last parts of the entity_ref collide (e.g. a Unit and an HTMLBlock that are both "intro"), then a short hash gets appended.
There was a problem hiding this comment.
I'll add an explanation: <slug> is derived from the last segment of the entity_ref; when two entities share the same last segment (e.g. a Unit and an HTMLBlock both named "intro"), a short hash is appended to keep filenames unique.
| XBlock content (``component_versions/v<N>/block.xml``) | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| Standard XBlock XML, identical to what Studio stores internally. Static assets |
There was a problem hiding this comment.
There is a difference in HTMLBlock storage. Namely, we don't currently support storing a separate HTML file, so we inline the HTML with CDATA. In courses, we'd have a tiny XML file for the HTMLBlock that pointed to the HTML file.
This is a limitation of our XBlock serialization, but one I hope we can fix before Willow.
There was a problem hiding this comment.
I'll add a caveat noting that HTMLBlock content is currently serialized inline (CDATA in the XML) rather than as a separate .html file, which differs from old course OLX exports. I'll flag it as a known limitation to be addressed.
- Use "back up" consistently to distinguish from future import/export - Fix "OLX format" and "component" qualifier (containers don't use OLX) - Clarify Library vs Learning Package relationship in Overview - Add security warning: always pass package_ref explicitly, don't trust archive - Explain <slug> derivation and hash-collision disambiguation - Note modulestore naming difference (block_id vs block.xml + parent TOML) - Note HTMLBlock CDATA limitation vs separate .html file in old course OLX - Fix singular: section / subsection / unit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
docs/openedx_content/backup_restore.rst— a full reference page for the TOML-based ZIP format produced bylp_dump/create_zip_fileand consumed bylp_load/load_learning_package.docs/openedx_content/index.rst.Closes #492
Test plan
cd docs && make html(ormake dirhtml) — confirms RST renders without Sphinx warningstests/openedx_content/applets/backup_restore/fixtures/library_backup/lp_dumpon a real library and compare the output ZIP layout to the documented structure🤖 Generated with Claude Code