Malthe Borch’s sourcecodegen

sourcecodegen is a project by Malthe Borch which can generate source code from ASTs.

Summary

sourcecodegen is designed to work with older versions of Python, specifically “2.4 and below”, and is not compatible with the new ast module in Python 2.6.

It is released under the BSD license, so can be copied for investigation.

Rendering

The code provides an ASTVisitor class, which contains a method for each type of node. The methods render their own node’s code, and visit sub-nodes’s methods sending in the respective nodes.

This concept of a visitor class which visits each node of a tree is a classic way of handling ASTs.

Running sourcecodegen on itself

Ideally the unparser should produce itself as output, given itself as input. Let’s see how it does.

Running

Run the unparser, with the commands:

$ python2.5
>>> from compiler import parse
>>> tree = parse(file('visitor.py').read())
>>> from sourcecodegen import ModuleSourceCodeGenerator
>>> generator = ModuleSourceCodeGenerator(tree)
>>> visitor_out = file('visitor.py', 'w')
>>> print >> visitor_out, generator.getSourceCode()
>>> visitor_out.close()
>>> exit()
$ git commit "run sourcecodegen on itself" visitor.py

Some differences are apparent in the commit.

  1. Blank lines are not maintained
    In at least one case this has resulted in incorrect code, with two lines being incorrectly joined.
  2. Indentation is changed from spaces to tabs
  3. Extraneous parentheses are introduced
    Some of these seem improvident, such as those introduced around conditions of (some) if statements. Some return statements also get extra parentheses, but not others. Some introduced parentheses seem harmless, or even helpful, such as introduing parentheses around a tuple.
  4. Re-formatting of multi-line arguments for method calls to single lines
  5. Re-formatting of strings to remove redundant escaping.

Most of the above issues seem to be matters of opinion and should be deferred to PEP 8. The unparser’s output does seem closer to PEP8 than is its code.

Pepping

To check that the output can be run through the pep8 tool. I’ve already noticed that blank lines are removed, so we can ignore errors E301 and E302. And ignore W191 because we know indentation uses tabs:

$ pep8 --ignore=W191,E301,E302 visitor.py
visitor.py:8:80: E501 line too long (114 > 79 characters)
visitor.py:84:26: E712 comparison to False should be 'if cond is False:' or 'if not cond:'
visitor.py:108:80: E501 line too long (84 > 79 characters)
visitor.py:469:80: E501 line too long (85 > 79 characters)
visitor.py:479:80: E501 line too long (98 > 79 characters)
visitor.py:480:3: E112 expected an indented block
visitor.py:529:1: W391 blank line at end of file

This shows up some more problems

  1. Some lines are too long
  2. The final line is not correctly terminated
  3. The code uses ” == False”, but that is not introduced by sourcecodegen itself, as it was in the original.
  4. Some lines are joined, in particular pep8 notices a problem resulting from the joining of the lines before line 480.

Conclusion

This is not a test of sourcecodegen, just an indication of where some of its strengths/weaknesses may lie. It has one flaw (badly joined lines) which can result in unusable code, but otherwise seems robust