Advanced Usage¶
The topics presented here are less often needed but are still very useful.
Locate a Node¶
Since Baron produces a tree, a path is sufficient to locate univocally a node in the tree. A common task where a path is involved is when translating a position in a file (a line and a column) into a node of the FST.
Baron provides 2 helper functions for that:
position_to_node(fst, line, column)
position_to_path(fst, line, column)
Both take a FST tree as first argument, then the line number and the column number. Line and column numbers start at 1, like in a text editor.
position_to_node
returns an FST node. This is okay if you only
want to know which node it is but not enough to locate the node in the
tree. Indeed, there can be mutiple identical nodes within the tree.
That’s where position_to_path
is useful. It returns a list of
int and strings which represent either the key to take in a Node or the
index in a ListNode. For example: ["target", "value", 0]
)
Let’s first see the difference between the two functions:
In [1]: from baron import parse
In [2]: from baron.path import position_to_node, position_to_path
In [3]: from baron.helpers import show_node
In [4]: some_code = """from baron import parse\nfrom baron.helpers import show_node\nfst = parse("a = 1")\nshow_node(fst)"""
In [5]: print some_code
from baron import parse
from baron.helpers import show_node
fst = parse("a = 1")
show_node(fst)
In [6]: tree = parse(some_code)
In [7]: node = position_to_node(tree, (3, 8))
In [8]: show_node(node)
"parse"
In [9]: path = position_to_path(tree, (3, 8))
In [10]: path
Out[10]: [4, 'value', 'value', 0, 'value']
The first one gives the node and the second one the node’s path in the tree. The latter tells you that to get to the node, you must take the 4th index of the root ListNode, followed twice by the “value” key of first the “assignment” Node and next the “atomtrailers” Node. Finally, take the 0th index in the resulting ListNode:
In [11]: show_node(tree[4]["value"]["value"][0])
{
"type": "name",
"value": "parse"
}
Neat. This is so common that there is a function to do that:
In [12]: from baron.path import path_to_node
In [13]: show_node(path_to_node(tree, path))
"parse"
With the two above, that’s a total of three functions to locate a node.
You can also locate easily a “constant” node like a left parenthesis in
a funcdef
node:
In [14]: from baron.path import position_to_path
In [15]: fst = parse("a(1)")
In [16]: position_to_path(fst, (1, 1))
Out[16]: [0, 'value', 0, 'value']
In [17]: position_to_path(fst, (1, 2))
Out[17]: [0, 'value', 1, '(']
In [18]: position_to_path(fst, (1, 3))
Out[18]: [0, 'value', 1, 'value', 0, 'value', 'value']
In [19]: position_to_path(fst, (1, 4))
Out[19]: [0, 'value', 1, ')']
By the way, out of bound positions are handled gracefully:
In [20]: print(position_to_node(fst, (-1, 1)))
None
In [21]: print(position_to_node(fst, (1, 0)))
None
In [22]: print(position_to_node(fst, (1, 5)))
None
In [23]: print(position_to_node(fst, (2, 4)))
None
Bounding Box¶
Sometimes you want to know what are the left most and right most position of a rendered node or part of it. It is not a trivial task since you do not know easily each rendered line’s length. That’s why baron provides two helpers:
node_to_bounding_box(fst)
path_to_bounding_box(fst, path)
Examples are worth a thousand words so:
In [24]: from baron.path import node_to_bounding_box, path_to_bounding_box
In [25]: from baron import dumps
In [26]: fst = parse("a(1)\nb(2)")
In [27]: fst
Out[27]:
[{'type': 'atomtrailers',
'value': [{'type': 'name', 'value': 'a'},
{'first_formatting': [],
'fourth_formatting': [],
'second_formatting': [],
'third_formatting': [],
'type': 'call',
'value': [{'first_formatting': [],
'second_formatting': [],
'target': {},
'type': 'call_argument',
'value': {'section': 'number', 'type': 'int', 'value': '1'}}]}]},
{'formatting': [], 'indent': '', 'type': 'endl', 'value': '\n'},
{'type': 'atomtrailers',
'value': [{'type': 'name', 'value': 'b'},
{'first_formatting': [],
'fourth_formatting': [],
'second_formatting': [],
'third_formatting': [],
'type': 'call',
'value': [{'first_formatting': [],
'second_formatting': [],
'target': {},
'type': 'call_argument',
'value': {'section': 'number', 'type': 'int', 'value': '2'}}]}]}]
In [28]: print dumps(fst)
a(1)
b(2)
In [29]: node_to_bounding_box(fst)
Out[29]: BoundingBox (Position (1, 1), Position (2, 4))
In [30]: path_to_bounding_box(fst, [])
Out[30]: BoundingBox (Position (1, 1), Position (2, 4))
In [31]: fst[0]
Out[31]:
{'type': 'atomtrailers',
'value': [{'type': 'name', 'value': 'a'},
{'first_formatting': [],
'fourth_formatting': [],
'second_formatting': [],
'third_formatting': [],
'type': 'call',
'value': [{'first_formatting': [],
'second_formatting': [],
'target': {},
'type': 'call_argument',
'value': {'section': 'number', 'type': 'int', 'value': '1'}}]}]}
In [32]: print dumps(fst[0])
a(1)
In [33]: node_to_bounding_box(fst[0])
Out[33]: BoundingBox (Position (1, 1), Position (1, 4))
In [34]: path_to_bounding_box(fst, [0])
Out[34]: BoundingBox (Position (1, 1), Position (1, 4))
In [35]: fst[0]["value"]
Out[35]:
[{'type': 'name', 'value': 'a'},
{'first_formatting': [],
'fourth_formatting': [],
'second_formatting': [],
'third_formatting': [],
'type': 'call',
'value': [{'first_formatting': [],
'second_formatting': [],
'target': {},
'type': 'call_argument',
'value': {'section': 'number', 'type': 'int', 'value': '1'}}]}]
In [36]: print dumps(fst[0]["value"])
a(1)
In [37]: node_to_bounding_box(fst[1])
Out[37]: BoundingBox (Position (1, 1), Position (2, 0))
In [38]: path_to_bounding_box(fst, [1])
Out[38]: BoundingBox (Position (1, 5), Position (2, 0))
In [39]: fst[0]["value"][1]
Out[39]:
{'first_formatting': [],
'fourth_formatting': [],
'second_formatting': [],
'third_formatting': [],
'type': 'call',
'value': [{'first_formatting': [],
'second_formatting': [],
'target': {},
'type': 'call_argument',
'value': {'section': 'number', 'type': 'int', 'value': '1'}}]}
In [40]: print dumps(fst[0]["value"][1])
(1)
In [41]: node_to_bounding_box(fst[0]["value"][1])
Out[41]: BoundingBox (Position (1, 1), Position (1, 3))
In [42]: path_to_bounding_box(fst, [0, "value", 1])
Out[42]: BoundingBox (Position (1, 2), Position (1, 4))
In [43]: fst[0]["value"][1]["value"]
Out[43]:
[{'first_formatting': [],
'second_formatting': [],
'target': {},
'type': 'call_argument',
'value': {'section': 'number', 'type': 'int', 'value': '1'}}]
In [44]: print dumps(fst[0]["value"][1]["value"])
1
In [45]: node_to_bounding_box(fst[0]["value"][1]["value"])
Out[45]: BoundingBox (Position (1, 1), Position (1, 1))
In [46]: path_to_bounding_box(fst, [0, "value", 1, "value"])
Out[46]: BoundingBox (Position (1, 3), Position (1, 3))
The bounding box’s top_left and bottom_right positions follow the same convention as for when locating a node: the line and column start at 1.
As you can see, the major difference between the two functions is that
node_to_bounding_box
will always give a left position of
(1, 1)
since it considers you want the bounding box of the whole
node while path_to_bounding_box
takes the location of the node
in the fst into account.