Skip to content

Consider a clearer optional-functionality API #97

@PeterJCLaw

Description

@PeterJCLaw

Currently ASTTokens serves a number of different roles depending on what arguments its passed. With #93 this is growing further and the workarounds to type checking being added in that PR perhaps suggest that the limits of the current approach are being reached. #95 will add another construction, though not another role.
I understand the goals of these different roles are around avoiding redundant parsing/tree-building work where not needed, which is a reasonable goal. I'm not suggesting a removal of functionality here, rather that the functionality be rearranged for clarity and to aid type checking.

There are at least the following modes of use:

atok = ASTTokens(source, parse=True)
assert atok.tree, "Placate type check even though we know this is always valid"

atok = ASTTokens(source, tree=existing_tree)
assert atok.tree, "Placate type check even though we know this is always valid"

There is also currently ASTTokens(source, tree=None, parse=False), though it's not clear what use that has (#94 (comment)).

With #93 there's also this case:

atok = ASTTokens(source, init_tokens=False)
atok.get_token_from_offset(42)  # Fail at runtime, but no type-check warnings

These cases are somewhat "obvious", however if the construction of the ASTTokens instance is far from the point of use they could easily be very non-obvious.

We can observe that ASTTokens contains broadly three classes of functionality:

  • tokens-only
  • tree-only
  • both together

With the goal of enabling efficient use of the related functionalities I can see a couple of paths forwards:

  • lazy evaluation of the relevant structured data from the source
  • failures when using functionality with missing data

For the former I would advocate for moving all the public-member types to being non-optional and using something like https://pypi.org/project/cached-property/ (or functools.cached_property in Python 3.8+) to hide the evaluations. This would still support users passing in pre-computed values if they want but means that all usages will "just work":

atok = ASTTokens(source)
tree: ast.AST = atok.tree  # no error now that `.tree` is non-optional

atok = ASTTokens(source, init_tokens=False)  # request lazy tokens initialisation
atok.get_token_from_offset(42)  # works due to lazy parsing

I can see that this approach may be undesirable in some cases, perhaps where there are reasons to prefer not doing the parsing at all.

This leads to the other approach (errors for missing functionality), for which I propose using separate classes:

# ASTSource is a thin wrapper around LineNumbers. (Very open to better names.)
# Does not take a tree or tokens at all, but works with un-marked AST nodes
asrc = ASTSource(source)
asrc.tree  # type check error: no such member
asrc.get_text(node)  # type check error: no such member
asrc.get_token_from_offset(42)  # type check error: no such member
asrc.get_text_unmarked(node)  # works fine

# TokensSource is a new type which contains the tokens/source utils currently in ASTTokens
# It wraps LineNumbers similarly to ASTSource, but is otherwise unrelated
# It does not have a tree
tsrc = TokensSource(source)  # parses tokens if not passed
tsrc.tree  # type check error: no such member
tsrc.get_text(node)  # type check error: no such member
tsrc.get_token_from_offset(42)  # works fine
tsrc.get_text_unmarked(node)  # type check error: no such member

# ASTTokens now always has a tree and tokens, creating them both if not required
# It probably inherits from TokensSource and drops the `get_text*_unmarked` methods as they no longer make sense here
atok = ASTTokens(source)
tree: ast.AST = atok.tree  # no error now that `.tree` is non-optional
atok.get_text(node)  # works fine
atok.get_token_from_offset(42)  # also works fine
atok.get_text_unmarked(node)  # type check error: no such member

Aside from the question of whether ASTTokens keeps its get_text*_unmarked methods, the usages here are essentially compatible with existing code. The constructor signature might change, though keeping the existing arguments for compatibility is probably possible; if so then ASTTokens(source, tree=None, parse=False) would now be at least a runtime error (with @overloads it could be a type error too).

Note also that this approach is compatible with another class, perhaps LazyASTTokens which looks like ASTTokens but has lazy evaluation. This could easily be added later (or not at all) depending on whether it ends up being useful.

The main gain here is that each of these specialised classes addresses a specific use-case and data-requirements, which remove the need for conditional presence checks (or asserts) throughout both the library code and client code. This will make the code clearer, easier to maintain and (due to the type checker support) more robust as well as offering the performance gains desired.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions