Consider a clearer optional-functionality API

Currently `ASTTokens` serves a number of different roles depending on what arguments its passed. With https://github.com/gristlabs/asttokens/pull/93 this is growing further and the workarounds to type checking being added in that PR perhaps suggest that the limits of the current approach are being reached. https://github.com/gristlabs/asttokens/pull/95 will add another construction, though not another role.
I understand the goals of these different roles are around avoiding redundant parsing/tree-building work where not needed, which is a reasonable goal. I'm not suggesting a removal of functionality here, rather that the functionality be rearranged for clarity and to aid type checking.

There are at least the following modes of use:
``` python
atok = ASTTokens(source, parse=True)
assert atok.tree, "Placate type check even though we know this is always valid"

atok = ASTTokens(source, tree=existing_tree)
assert atok.tree, "Placate type check even though we know this is always valid"
```

There is also currently `ASTTokens(source, tree=None, parse=False)`, though it's not clear what use that has (https://github.com/gristlabs/asttokens/issues/94#issuecomment-1252862533).

With #93 there's also this case:
``` python
atok = ASTTokens(source, init_tokens=False)
atok.get_token_from_offset(42)  # Fail at runtime, but no type-check warnings
```

These cases are somewhat "obvious", however if the construction of the `ASTTokens` instance is far from the point of use they could easily be very non-obvious.

We can observe that `ASTTokens` contains broadly three classes of functionality:
- `tokens`-only
- `tree`-only
- both together

With the goal of enabling efficient use of the related functionalities I can see a couple of paths forwards:
- lazy evaluation of the relevant structured data from the source
- failures when using functionality with missing data

For the former I would advocate for moving all the public-member types to being non-optional and using something like https://pypi.org/project/cached-property/ (or [`functools.cached_property`](https://docs.python.org/3/library/functools.html#functools.cached_property) in Python 3.8+) to hide the evaluations. This would still support users passing in pre-computed values if they want but means that all usages will "just work":
``` python
atok = ASTTokens(source)
tree: ast.AST = atok.tree  # no error now that `.tree` is non-optional

atok = ASTTokens(source, init_tokens=False)  # request lazy tokens initialisation
atok.get_token_from_offset(42)  # works due to lazy parsing
```

I can see that this approach may be undesirable in some cases, perhaps where there are reasons to prefer not doing the parsing _at all_.

This leads to the other approach (errors for missing functionality), for which I propose using separate classes:

``` python
# ASTSource is a thin wrapper around LineNumbers. (Very open to better names.)
# Does not take a tree or tokens at all, but works with un-marked AST nodes
asrc = ASTSource(source)
asrc.tree  # type check error: no such member
asrc.get_text(node)  # type check error: no such member
asrc.get_token_from_offset(42)  # type check error: no such member
asrc.get_text_unmarked(node)  # works fine

# TokensSource is a new type which contains the tokens/source utils currently in ASTTokens
# It wraps LineNumbers similarly to ASTSource, but is otherwise unrelated
# It does not have a tree
tsrc = TokensSource(source)  # parses tokens if not passed
tsrc.tree  # type check error: no such member
tsrc.get_text(node)  # type check error: no such member
tsrc.get_token_from_offset(42)  # works fine
tsrc.get_text_unmarked(node)  # type check error: no such member

# ASTTokens now always has a tree and tokens, creating them both if not required
# It probably inherits from TokensSource and drops the `get_text*_unmarked` methods as they no longer make sense here
atok = ASTTokens(source)
tree: ast.AST = atok.tree  # no error now that `.tree` is non-optional
atok.get_text(node)  # works fine
atok.get_token_from_offset(42)  # also works fine
atok.get_text_unmarked(node)  # type check error: no such member
```

Aside from the question of whether `ASTTokens` keeps its `get_text*_unmarked` methods, the usages here are essentially compatible with existing code. The constructor signature might change, though keeping the existing arguments for compatibility is probably possible; if so then `ASTTokens(source, tree=None, parse=False)` would now be at least a runtime error (with `@overload`s it could be a type error too).

Note also that this approach is compatible with another class, perhaps `LazyASTTokens` which looks like `ASTTokens` but has lazy evaluation. This could easily be added later (or not at all) depending on whether it ends up being useful.

The main gain here is that each of these specialised classes addresses a specific use-case and data-requirements, which remove the need for conditional presence checks (or asserts) throughout both the library code and client code. This will make the code clearer, easier to maintain and (due to the type checker support) more robust as well as offering the performance gains desired.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider a clearer optional-functionality API #97

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider a clearer optional-functionality API #97

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions