Python’s Unparser

The source for python includes an unparser, which can render ASTs to source.

Rendering

The code provides an Unparser class, which contains a method for each type of node. The methods render their own node’s code, and call sub-nodes’s methods sending in the respective nodes.

Running the unpraser on itself

Ideally the unparser should produce itself as output, given itself as input. Let’s see how it does.

Running

Run the unparser, with the commands:

$ python2.7 unparse.py unparse.py > unparse.out.py
$ diff unparse.py unparse.out.py

Some differences are apparent

  1. Comments are stripped

    This is expected. To my mind the problem here is that unparse.py includes any comments (but that may be a less common opinion).

  2. Docstrings are changed from double-quotes to single quotes.

    Some docstrings in unparse.py use single double-quotes, others use the (PEP 8 recommendation of) triple double-quotes. In the output both have been reduced to single-quotes

    This feature is also apparent in other strings.

  3. Blank lines are not maintained

  4. Extraneous parentheses are introduced

    Some of these seem improvident, such as those introduced around conditions of (some) if statements. Others seem harmless, or even helpful, such as introducing parentheses around a literal tuple. Introducing parentheses around expressions after a print statement is dubious, given that I used python 2, not 3.

  5. Extra indentation is introduced at control structures

    In particular this is noticeable at if/else statements, where the original uses compound statements for both if and else, but the unparsed version splits them over two lines. Once again the unparsed version seems more correct.

  6. Re-formatting of a multi-line dictionary to a single line

  7. Re-formatting expressions to introduce spaces around operators

Most of the above issues seem to be matters of opinion and should be deferred to PEP 8. The unparser’s output does seem closer to PEP8 than is its code.

Pepping

To check that the output can be run through the pep8 tool. I’ve already noticed that blank lines and spaces around some operators aren’t handled optimally, so I’ll ask pep8 to ignore errors 302 and 203:

$ pep8 --ignore=E302,E203 unparse.out.py
unparse.out.py:22:80: E501 line too long (150 > 79 characters)
unparse.out.py:25:80: E501 line too long (95 > 79 characters)
unparse.out.py:255:80: E501 line too long (86 > 79 characters)
unparse.out.py:383:80: E501 line too long (82 > 79 characters)
unparse.out.py:408:80: E501 line too long (180 > 79 characters)
unparse.out.py:416:80: E501 line too long (150 > 79 characters)
unparse.out.py:573:23: W292 no newline at end of file

This shows up two more problems

  1. Some lines are too long
  2. The final line is not correctly terminated

Conclusion

This is not a test of the unparser, just an indication of where some of its strengths/weaknesses may lie. It did not show up any fatal flaws, but does suggest the unparser needs some (small) work to get it’s own code to meet PEP 8 standards.

This quick run through also shows up the fact that “what code should look like” is a very opinionated subject. Python is lucky to have PEP 8 as a standard to refer to, but even that allows for variations (only “new code” need use spaces, for example). Which shows up one further bug in the unparser: all its formatting choices are hard-coded, whereas we may need greater flexibility in such choices.