Formatting for the Masses
Reading source code happens way more often than writing it. And reading code is a lot easier when it is well formatted. Linebreaks and indentations are guiding the reader and it is more obvious in which context things are happening. Doing formatting completely manually is simply not an option, especially for larger changes. For that reason general purpose languages, like Java, do have many different implementations for automatic code formatters or “pretty printers”.
All implementations come with a default behaviour that works, but this is not enough. Formatting has always been a matter of taste and therefore users want to customize all the different settings to their liking. Therefore professional formatting tools like Eclipse JDT offer a gazillion number of options.
These options can be stored as preferences and shared together with the code. Sounds cool, but it is still not enough for “everybody”. People want to adjust those settings or when, they come to the point where the tool does not offer the 'right' options, they will use tag-comments in JDT. This will make the formatter shut up for a specific area.
The formatter can be explicitly switched on and off to manually format the code. This is a workaround and not a solution. The problem is that “everybody” wants to format their code differently. Some language designers try to circumvent this problem by mandating the correct formatting on the syntax level as Python does it, but this is oftentimes limited to the indentation level and not to the overall formatting rules.
For most general purpose languages, there are a gazillion tools to format the code in different ways with different options and the user has the choice – the choice of the tool, the choice of options – everything can be tailored to specific needs.
Formatting code for DSLs with Xtext
For domain specific languages (DSL) there is nothing like that coming out of the box. Looking at frameworks like Eclipse Xtext, it is really a breeze to create languages, but there is no formatter coming for free. Of course, the framework is offering a powerful API to define formatters. Decisions can be made on grammar level for all instances of the DSL and in addition you have access to a specific part of the abstract syntax tree so can make decisions based on the structure and values of your model, too. Having the option to format code based on syntax and structure sounds really cool and actually it is, but it’s a lot of code that needs to be written and it’s not “that” easy.
After you are satisfied with your implementation and your formatter works as you wanted, your users might see that differently. As already said, formatting is a matter of taste and they might come up with the need for options. An additional newline here, no space there... these options do add a lot of complexity to the formatter code. And that's certainly not for free. Time to define the canonical formatting for your DSL? Or can we do better? How about formatters that take examples and learn how to format similar documents? Intelligent formatters that nobody needs to write manually and that are driven by your coding style? Can this be done?
We started to ask ourselves the same questions and looked at existing libraries out there. As we use AntLR in Xtext heavily, we looked in that direction, too. And we found a very promising project named Codebuff, that is driven by Terence Parr, the mastermind behind AntLR. It aims to do formatting by example and the only thing you need is a grammar and a bunch of examples.
We have played around with it and it really looks promising. In our talk at the EclipseCon Europe at Wednesday, October 25, 2017, 02:45 pm to 03:20 pm Sebastian and I will give you an overview of what we have figured out, how it works and how it might integrate with Xtext.
See you there!