Debugging Xtext grammars – what to do when your language is ambiguous
Xtext uses ANTLR to generate a lexer and parser out of your grammar. Technically an LL(*) parser gets generated. This means it cannot deal with left recursion and has an infinite lookahead. You might know what that means, but to make it easier you could think about LL(*) parsers like this: A parser gets an ordered list of things (called tokens) to collect in a labyrinth. When it’s not clear which way to go it stands still and tries to figure out the ends to all directions. As soon as it is obvious where to go, it continues walking and collecting.
There is no way back – so decisions should be correct. Sometimes this is not the case and the parser can't make a clear decision. In this situation it gets tricky to understand where the problem is and how to resolve it. Mostly shown errors and warnings are not that meaningful.
What do you normally do when the Xtext workflow reports warnings or errors while the parser is being generated? Obviously errors can’t be ignored since the parser will not be generated and the workflow fails – but what about warnings? Do you try to solve them by staring at the grammar and try to think like a parser? Or do you ignore them because it seems to work? Really?
In projects we have seen people dealing with such problems in various ways. Ignoring warnings is not a good idea, since ANTLR switches off alternatives and you do not know which one. We have seen people consequently ignoring such warnings, because they cannot figure out the real cause and things got complex. However ignoring those warnings should not be an option.
I have seen that a large group of Xtext users do not know about ANTLRWorks or that it can help here. So let’s make two trivial examples to see how to use the tool with Xtext.
Let’s make a trivial example where it is really obvious what the problem is:
We have two parser rules (Element1 and Element2) that look identically except that there are different parts optional. During the workflow runs it reports the following warnings:
warning(200): ../com.itemis.blog.antlrworks.dsl/src-gen/com/itemis/blog/antlrworks/parser/antlr/internal/InternalDsl.g:114:2: Decision can match input such as "'element' 'id' RULE_ID 'int' RULE_INT" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): ../com.itemis.blog.antlrworks.dsl.ide/src-gen/com/itemis/blog/antlrworks/ide/contentassist/antlr/internal/InternalDsl.g:156:1: Decision can match input such as "'element' 'id' RULE_ID 'int' RULE_INT" using multiple alternatives: 1, 2
In this case the parser gets generated, but it tells us that there were different alternatives for the same input and that ANTLR decided to disable 2 of them. It does not tell us which once. Let’s find out what the cause is.
ANTLRWorks comes as an executable jar – running it should not be a problem as long as java is installed. If you want to open a grammar it expects an *.g file. Xtext should have generated one in the src-gen folder. In our example there is a .g file located here: com.itemis.blog.antlrworks.dsl/src-gen/com/itemis/blog/antlrworks/parser/antlr/internal/InternalDsl.g
The grammar looks a bit strange and there is a lot of Java stuff in the grammar. ANTLRWorks will fail to compile the grammar… damn.
Ok, the generated ANTLR grammar cannot be directly used in ANTLRWorks since it is modified to the needs of Xtext. To generate a so-called debugable grammar you need to modify the workflow a bit like this.
Now you’ll find a .g file in src-gen that carries the name DebugInternal*.g. This file can be easily used with ANTLRWorks.
After you have started ANTLRWorks click on File->Open and select the DebugInternal*.g file.
ANTLRWorks will open the grammar and you’ll see the different rules. So far no warnings are shown. To let the tool do it’s job click on debug – the button looks like a bug. After doing that the ruleElement is marked read. By clicking on the rule you’ll see the problem that caused the warnings and the different alternatives. To really see what the disabled alternatives are you could enable them as shown in the next picture. The read arrows will show the disabled alternatives.
As already said this is a very trivial example and the tool just points out what we already know. In typical projects we have far more complex scenarios where it is nearly impossible to get the cause of a warning without ANTLRWorks. Especially when a lot of parts are optional it gets tricky. To really understand what the parser does there is the possibility to debug the grammar with a given input and see how the parsetree is constructed. Clicking on debug once more brings up a window to define the input.
Debugging in ANTLRWorks works similar like you know it from Eclipse and you can step forward and backward. At the end the parsetree will show that the parser went into the rule “Element1” instead of “Element2”. From a grammar point of view both rules would be valid but ANTLR switched of the alternative. Otherwise no clear decision could be made.
If we add a second line as an input for the debugger and leave out the intValue the parsetree looks like this:
As the intValue is mandatory in the ruleElement1 the parser will go into ruleElement2 for the second entry. This is the only case where “Element2” is picked. In this trivial example it’s not that hard to guess what the parser will do. In more complex example this debugging feature will really bring a big benefit to solve your ambiguities.
What about errors? Do you know what to do when the workflow reports that a rule has a non-LL(*) decision? What the hell is left-refactoring, syntactic predicates and why should I use backtracking – should I really? We’ll handle that in another blogpost, but for now let’s have a look at a simple language that has some expressions.
When we try to run the Xtext workflow the generator will report the following error:
error(211): ../com.itemis.blog.antlrworks.dsl/src-gen/com/itemis/blog/antlrworks/parser/antlr/internal/InternalDsl.g:114:2: [fatal] rule ruleExpression has non-LL(*) decision due to recursive rule invocations reachable from alts 1,3. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
Various exceptions are show below the error, but Xtext will generate the .g file anyway – so there is a chance to find out what the problem is. You might already know what’s wrong, but let’s try to use ANTLRWorks. The compiler in ANTLRWorks will show the very same error, but after ignoring that the rule element is marked red. The different alternatives are marked red.
In this case left-recursion is not our problem. ANTLRWorks shows us, that ruleBlockExpression and ruleListLiteral are the cause.
After having a closer look it is obvious that the syntax is equal if there is only one expression inside – that makes our grammar ambiguous. Do we really want a ListLiteral to exist on the same level as a BlockExpression? Do we really want a BlockExpression contain various other BlockExpressions?
After considering these questions a refactored grammar looks like this:
After doing that the workflow will run through and we are good to have a second look in ANTLRWorks to see the parsetree for a simple expression:
These examples are very trivial to make it obvious where the problem is. The intension was to let you know how to use ANTLRWorks with Xtext. Don't ignore warnings anymore – you might not know the implications. The parsetree might look different as you though.
Stay tuned for another post about syntactic predicates, left-recursion / left-refactoring and why backtracking is not an option. And if you've got any questions regarding Xtext – don't hesitate to contact us!