Currently when you parse some source code, you can only provide it as a String. The Parser#parse function then converts it to a byte[] before it is passed to tree-sitter. Also, the Tree class only exposes the underlying source code as a String. It stores the content as a byte[]. But every call to Tree#getText converts it to a String.
Those conversion create computational overhead as Java stores its string in UTF-16 while the input encoding for the source code in the byte array can be UTF-8.
In my use case, I parse some text and then want to edit it. I check for some nodes and remove their contents from the text (e.g. by replacing them with whitespace). To avoid those conversions, I currently need to store the input text two times (once in the Tree instance and once separately for the byte level editing). If one could access the byte[] of the tree this could be avoided. I understand, that we do not want the internal byte[] to be edited. Thus one could also expose it as a read only ByteBuffer using ByteBuffer#asReadOnlyBuffer
I proposed a similar change already in #65 but it was closed. In fact, I've been maintaining a private fork of this repo since then, to have the functionality available. I'd really like to have this possibility in the official bindings as I think it could also benefit others.
Currently when you parse some source code, you can only provide it as a
String. TheParser#parsefunction then converts it to abyte[]before it is passed to tree-sitter. Also, theTreeclass only exposes the underlying source code as aString. It stores the content as abyte[]. But every call toTree#getTextconverts it to aString.Those conversion create computational overhead as Java stores its string in
UTF-16while the input encoding for the source code in the byte array can beUTF-8.In my use case, I parse some text and then want to edit it. I check for some nodes and remove their contents from the text (e.g. by replacing them with whitespace). To avoid those conversions, I currently need to store the input text two times (once in the
Treeinstance and once separately for the byte level editing). If one could access thebyte[]of the tree this could be avoided. I understand, that we do not want the internalbyte[]to be edited. Thus one could also expose it as a read onlyByteBufferusingByteBuffer#asReadOnlyBufferI proposed a similar change already in #65 but it was closed. In fact, I've been maintaining a private fork of this repo since then, to have the functionality available. I'd really like to have this possibility in the official bindings as I think it could also benefit others.