Advanced Usage

The topics presented here are less often needed but are still very useful.

Locate a Node

Since Baron produces a tree, a path is sufficient to locate univocally a node in the tree. A common task where a path is involved is when translating a position in a file (a line and a column) into a node of the FST.

Baron provides 2 helper functions for that:

  • position_to_node(fst, line, column)
  • position_to_path(fst, line, column)

Both take a FST tree as first argument, then the line number and the column number. Line and column numbers start at 1, like in a text editor.

position_to_node returns an FST node. This is okay if you only want to know which node it is but not enough to locate the node in the tree. Indeed, there can be mutiple identical nodes within the tree.

That’s where position_to_path is useful. It returns a list of int and strings which represent either the key to take in a Node or the index in a ListNode. For example: ["target", "value", 0])

Let’s first see the difference between the two functions:

In [1]: from baron import parse

In [2]: from baron.path import position_to_node, position_to_path

In [3]: from baron.helpers import show_node

In [4]: some_code = """from baron import parse\nfrom baron.helpers import show_node\nfst = parse("a = 1")\nshow_node(fst)"""

In [5]: print some_code
from baron import parse
from baron.helpers import show_node
fst = parse("a = 1")
show_node(fst)

In [6]: tree = parse(some_code)

In [7]: node = position_to_node(tree, (3, 8))

In [8]: show_node(node)
"parse"

In [9]: path = position_to_path(tree, (3, 8))

In [10]: path
Out[10]: [4, 'value', 'value', 0, 'value']

The first one gives the node and the second one the node’s path in the tree. The latter tells you that to get to the node, you must take the 4th index of the root ListNode, followed twice by the “value” key of first the “assignment” Node and next the “atomtrailers” Node. Finally, take the 0th index in the resulting ListNode:

In [11]: show_node(tree[4]["value"]["value"][0])
{
    "type": "name", 
    "value": "parse"
}

Neat. This is so common that there is a function to do that:

In [12]: from baron.path import path_to_node

In [13]: show_node(path_to_node(tree, path))
"parse"

With the two above, that’s a total of three functions to locate a node.

You can also locate easily a “constant” node like a left parenthesis in a funcdef node:

In [14]: from baron.path import position_to_path

In [15]: fst = parse("a(1)")

In [16]: position_to_path(fst, (1, 1))
Out[16]: [0, 'value', 0, 'value']

In [17]: position_to_path(fst, (1, 2))
Out[17]: [0, 'value', 1, '(']

In [18]: position_to_path(fst, (1, 3))
Out[18]: [0, 'value', 1, 'value', 0, 'value', 'value']

In [19]: position_to_path(fst, (1, 4))
Out[19]: [0, 'value', 1, ')']

By the way, out of bound positions are handled gracefully:

In [20]: print(position_to_node(fst, (-1, 1)))
None

In [21]: print(position_to_node(fst, (1, 0)))
None

In [22]: print(position_to_node(fst, (1, 5)))
None

In [23]: print(position_to_node(fst, (2, 4)))
None

Bounding Box

Sometimes you want to know what are the left most and right most position of a rendered node or part of it. It is not a trivial task since you do not know easily each rendered line’s length. That’s why baron provides two helpers:

  • node_to_bounding_box(fst)
  • path_to_bounding_box(fst, path)

Examples are worth a thousand words so:

In [24]: from baron.path import node_to_bounding_box, path_to_bounding_box

In [25]: from baron import dumps

In [26]: fst = parse("a(1)\nb(2)")

In [27]: fst
Out[27]: 
[{'type': 'atomtrailers',
  'value': [{'type': 'name', 'value': 'a'},
   {'first_formatting': [],
    'fourth_formatting': [],
    'second_formatting': [],
    'third_formatting': [],
    'type': 'call',
    'value': [{'first_formatting': [],
      'second_formatting': [],
      'target': {},
      'type': 'call_argument',
      'value': {'section': 'number', 'type': 'int', 'value': '1'}}]}]},
 {'formatting': [], 'indent': '', 'type': 'endl', 'value': '\n'},
 {'type': 'atomtrailers',
  'value': [{'type': 'name', 'value': 'b'},
   {'first_formatting': [],
    'fourth_formatting': [],
    'second_formatting': [],
    'third_formatting': [],
    'type': 'call',
    'value': [{'first_formatting': [],
      'second_formatting': [],
      'target': {},
      'type': 'call_argument',
      'value': {'section': 'number', 'type': 'int', 'value': '2'}}]}]}]

In [28]: print dumps(fst)
a(1)
b(2)

In [29]: node_to_bounding_box(fst)
Out[29]: BoundingBox (Position (1, 1), Position (2, 4))

In [30]: path_to_bounding_box(fst, [])
Out[30]: BoundingBox (Position (1, 1), Position (2, 4))

In [31]: fst[0]
Out[31]: 
{'type': 'atomtrailers',
 'value': [{'type': 'name', 'value': 'a'},
  {'first_formatting': [],
   'fourth_formatting': [],
   'second_formatting': [],
   'third_formatting': [],
   'type': 'call',
   'value': [{'first_formatting': [],
     'second_formatting': [],
     'target': {},
     'type': 'call_argument',
     'value': {'section': 'number', 'type': 'int', 'value': '1'}}]}]}

In [32]: print dumps(fst[0])
a(1)

In [33]: node_to_bounding_box(fst[0])
Out[33]: BoundingBox (Position (1, 1), Position (1, 4))

In [34]: path_to_bounding_box(fst, [0])
Out[34]: BoundingBox (Position (1, 1), Position (1, 4))

In [35]: fst[0]["value"]
Out[35]: 
[{'type': 'name', 'value': 'a'},
 {'first_formatting': [],
  'fourth_formatting': [],
  'second_formatting': [],
  'third_formatting': [],
  'type': 'call',
  'value': [{'first_formatting': [],
    'second_formatting': [],
    'target': {},
    'type': 'call_argument',
    'value': {'section': 'number', 'type': 'int', 'value': '1'}}]}]

In [36]: print dumps(fst[0]["value"])
a(1)

In [37]: node_to_bounding_box(fst[1])
Out[37]: BoundingBox (Position (1, 1), Position (2, 0))

In [38]: path_to_bounding_box(fst, [1])
Out[38]: BoundingBox (Position (1, 5), Position (2, 0))

In [39]: fst[0]["value"][1]
Out[39]: 
{'first_formatting': [],
 'fourth_formatting': [],
 'second_formatting': [],
 'third_formatting': [],
 'type': 'call',
 'value': [{'first_formatting': [],
   'second_formatting': [],
   'target': {},
   'type': 'call_argument',
   'value': {'section': 'number', 'type': 'int', 'value': '1'}}]}

In [40]: print dumps(fst[0]["value"][1])
(1)

In [41]: node_to_bounding_box(fst[0]["value"][1])
Out[41]: BoundingBox (Position (1, 1), Position (1, 3))

In [42]: path_to_bounding_box(fst, [0, "value", 1])
Out[42]: BoundingBox (Position (1, 2), Position (1, 4))

In [43]: fst[0]["value"][1]["value"]
Out[43]: 
[{'first_formatting': [],
  'second_formatting': [],
  'target': {},
  'type': 'call_argument',
  'value': {'section': 'number', 'type': 'int', 'value': '1'}}]

In [44]: print dumps(fst[0]["value"][1]["value"])
1

In [45]: node_to_bounding_box(fst[0]["value"][1]["value"])
Out[45]: BoundingBox (Position (1, 1), Position (1, 1))

In [46]: path_to_bounding_box(fst, [0, "value", 1, "value"])
Out[46]: BoundingBox (Position (1, 3), Position (1, 3))

The bounding box’s top_left and bottom_right positions follow the same convention as for when locating a node: the line and column start at 1.

As you can see, the major difference between the two functions is that node_to_bounding_box will always give a left position of (1, 1) since it considers you want the bounding box of the whole node while path_to_bounding_box takes the location of the node in the fst into account.