30 Mar 2023

More on tree-sitter and consult

Here's a few things that I've gotten help with figuring out during the last few days. Both things are related to my playing with tree-sitter that I've written about earlier, here and here.

You might also be interested in the two repositories where the full code is. (I've linked to the specific commits as of this writing.)

Anonymous nodes and matching in tree-sitter

In the grammar for Cabal I have a rule for sections that like this

sections: $ => repeat1(choice(
    $.benchmark,
    $.common,
    $.executable,
    $.flag,
    $.library,
    $.source_repository,
    $.test_suite,
)),

where each section followed this pattern

benchmark: $ => seq(
    repeat($.comment),
    'benchmark',
    field('name', $.section_name),
    field('properties', $.property_block),
),

This made it a little bit difficult to capture the relevant parts of each section to implement consult-cabal. I thought a pattern like this ought to work

(cabal
 (sections
  (_ _ @type
     name: (section_name)? @name)))

but it didn't; I got way too many things captured in type. Clearly I had misunderstood something about the wildcards, or the query syntax. I attempted to add a field name to the anonymous node, i.e. change the sections rules like this

benchmark: $ => seq(
    repeat($.comment),
    field('type', 'benchmark'),
    field('name', $.section_name),
    field('properties', $.property_block),
),

It was accepted by tree-sitter generate, but the field type was nowhere to be found in the parse tree.

Then I changed the query to list the anonymous nodes explicitly, like this

(cabal
 (sections
  (_ ["benchmark" "common" "executable" ...] @type
     name: (section_name)? @name)))

That worked, but listing all the sections like that in the query didn't sit right with me.

Luckily there's a discussions area in tree-sitters GitHub so a fairly short discussion later I had answers to why my query behaved like it did and a solution that would allow me to not list all the section types in the query. The trick is to wrap the string in a call to alias to make it a named node. After that it works to add a field name to it as well, of course. The section rules now look like this

benchmark: $ => seq(
    repeat($.comment),
    field('type', alias('benchmark', $.section_type)),
    field('name', $.section_name),
    field('properties', $.property_block),
),

and the final query looks like this

(cabal
 (sections
  (_
   type: (section_type) @type
   name: (section_name)? @name)))

With that in place I could improve on the function that collects all the items for consult-cabal so it now show the section's type and name instead of the string representation of the tree-sitter node.

State in a consult source for preview of lines in a buffer

I was struggling with figuring out how to make a good state function in order to preview the items in consult-cabal. The GitHub repo for consult doesn't have discussions enabled, but after a discussion in an issue I'd arrived at a state function that works very well.

The state function makes use of functions in consult and looks like this

(defun consult-cabal--state ()
  "Create a state function for previewing sections."
  (let ((state (consult--jump-state)))
    (lambda (action cand)
      (when cand
        (let ((pos (get-text-property 0 'section-pos cand)))
          (funcall state action pos))))))

The trick here was to figure out how the function returned by consult--jump-state actually works. On the surface it looks like it takes an action and a candidate, (lambda (action cand) ...). However, the argument cand shouldn't be the currently selected item, but rather a postion (ideally a marker), so I had to attach another text property on the items (section-pos, which is fetched in the inner lambda). This position is then what's passed to the function returned by consult--jump-state.

In hindsight it seems so easy, but I was struggling with this for an entire evening before finally asking the question the morning after.

Tags: consult emacs tree-sitter
Comment here.