Skip to content

Documentation: dataset teminology consistency#1411

Draft
zhen0427 wants to merge 4 commits into
mainfrom
feature/documentation-consistency
Draft

Documentation: dataset teminology consistency#1411
zhen0427 wants to merge 4 commits into
mainfrom
feature/documentation-consistency

Conversation

@zhen0427
Copy link
Copy Markdown
Member

This PR updates and aligns dataset terminology across the documentation and user-facing API comments.

Main changes :

  • introducing consistent terminology for:
    • buffer type (row-based / columnar)
    • buffer representation (dense / sparse)
    • component data uniformity (uniform / non-uniform)
    • serialization representation (compact_list / named_map)
  • removing legacy terminology such as:
    • scenario homogeneity
    • attribute homogeneity
    • IDs match
  • updating Serialization.md
  • updating dataset terminology documentation
  • updating Python API docstrings for dense/sparse batch representations and indptr behavior

This PR only contains documentation and comment updates. No functional code changes are included.

Signed-off-by: zhen0427 <Zhen.Wang@alliander.com>
@zhen0427 zhen0427 added the documentation Improvements or additions to documentation label May 28, 2026
Copy link
Copy Markdown
Member

@mgovers mgovers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good progress. here's a preliminary review because i noticed the (in)homogeneous thingy

Comment thread docs/user_manual/dataset-terminology.md
Comment on lines +45 to 48
It is required when a component uses `DenseComponentData`, since dense representation relies on a fixed attribute order.

It may be omitted for components that only use `SparseComponentData`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct. Both dense and sparse component data may or may not use Attributes. Instead, the Attributes section provides the list of attributes that is used when using use_compact_list=True (see also https://github.com/PowerGridModel/power-grid-model/pull/1411/changes#r3318722900 ).

Homogeneous / inhomogeneous is a different distinction, independent of sparse vs dense

Comment on lines -127 to +133
A [`ComponentData`](#json-schema-component-data-object) object is either a
[`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) object or an
[`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object) object
A [`ComponentData`](#json-schema-component-data-object) represents the data of a single component instance.

It can be stored in either dense or sparse representation:

- [`ComponentData`](#json-schema-component-data-object):
[`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) |
[`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object)
- [`DenseComponentData`](#json-schema-component-data-object-dense-representation)
- [`SparseComponentData`](#json-schema-component-data-object-sparse-representation)

#### JSON schema homogeneous component data object
#### JSON schema component data object (dense representation)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and below, same as before: this is a different way of slicing.

Comment thread src/power_grid_model/_core/power_grid_model.py
Co-authored-by: Martijn Govers <martijn.govers@alliander.com>
Signed-off-by: Zhen Wang <Zhen.Wang@alliander.com>
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants