- Field-level
outputSchemaoverride to keep generated JSON Schema in sync whencallbackchanges the output type/shape.
- Field-level
callbackhook (typed schema viaField(callback=...)and raw spec viacallback: ...) to post-process the extracted field value.
- Object and
array<object>fields now supporttransform(applied after nested field extraction).
callbackvalidation now raises anExtractErrorwhen a non-callable value is provided (instead of silently ignoring it).- Fixed object extraction with
cssselectors and scalar extraction when no nodes match (avoid unbound local errors).
- Scalar text extraction now includes element tail text (text after child elements), matching DOM
textContent/ jQuery.text()semantics more closely. - Added
attr: "ownText"to extract only the current node's direct text (excluding descendant text).
attr: "innerHTML"for HTML content extraction (replaces the legacyhtml: trueflag).
typeis now required for every field.- Arrays must be declared with a typed generic:
type: "array<string>",type: "array<integer>",type: "array<number>",type: "array<boolean>", ortype: "array<object>"(plaintype: "array"is rejected). - String outputs are stripped automatically;
innerHTMLis never stripped. - Numeric/boolean coercion is driven by
type(no conversion transforms). defaultValueis a field-level key (same level ascss/type/transform), not part of transforms.
- Removed legacy field flags
text: trueandhtml: true. - Removed
list: trueanditems: { type: ... }in favor oftype: "array<...>". - Removed
options.defaults/text_transform. - Removed transforms:
strip,to_int,to_float, and{ default: ... }. - Python typed spec: removed
Field.listand theDefaulttransform helper.
INIT