Skip to content

Add access to the source bytes of the Tree #180

@Emmeral

Description

@Emmeral

Currently when you parse some source code, you can only provide it as a String. The Parser#parse function then converts it to a byte[] before it is passed to tree-sitter. Also, the Tree class only exposes the underlying source code as a String. It stores the content as a byte[]. But every call to Tree#getText converts it to a String.

Those conversion create computational overhead as Java stores its string in UTF-16 while the input encoding for the source code in the byte array can be UTF-8.

In my use case, I parse some text and then want to edit it. I check for some nodes and remove their contents from the text (e.g. by replacing them with whitespace). To avoid those conversions, I currently need to store the input text two times (once in the Tree instance and once separately for the byte level editing). If one could access the byte[] of the tree this could be avoided. I understand, that we do not want the internal byte[] to be edited. Thus one could also expose it as a read only ByteBuffer using ByteBuffer#asReadOnlyBuffer

I proposed a similar change already in #65 but it was closed. In fact, I've been maintaining a private fork of this repo since then, to have the functionality available. I'd really like to have this possibility in the official bindings as I think it could also benefit others.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions