As a language engineer, you essentially have to deal with the parsing and the semantic analysis of (programming) languages. You get your hands dirty with carefully constructing compilers, interpreters, code generators, refactoring tools, and static code analyzers.
Powerful language workbenches, e.g. Xtext or MPS will help you a lot with these precious tasks by offering the generation of:
Hence, you are very well supported for all the necessary business that comes after parsing. But what do you do if you do not want to face the mammoth task of learning these tools? Or if you have to work in a technical space with limited resources?
This article showcases the embedded parsing and evaluation applicable to small embedded Domain Specific Languages (DSLs) by utilizing three concepts.
Firstly, a pure embedding in a powerful, statically typed host language Scala. This allows for high interoperability not just with existing general-purpose Scala code, but also with other languages running on the Java Virtual Machine (JVM). No need for an own pre-processor or language extension.
Secondly, parser combinators. They allow for a compact language-integrated scanning and parsing. Furthermore, handy parsers for Java or often-used regular expressions are provided, the syntax with very few parser operators is quite simple, and they can be extended quite easily via inheritance. On the downside, performance may not be top-notch due to the usage of backtracking while parsing.
And last but not least, Reference Attribute Grammars (RAGs) are utilized. This elderly concept got stuck in academia somehow (as a nice introduction, the interested reader is advised to read, e.g., this paper) but is useful when trying to reason about languages. Attribute grammars extend context-free grammars with declarative equations that relate the values of attributes of grammar symbols to each other. Most attribute grammar systems translate the equations into an implementation written in a general purpose programming language. The translation makes decisions about attribute evaluation order and storage, removing the need for manual tree traversal techniques such as visitors. Therefore, the developer is freed to focus on the language properties that are represented by the attributes. On the implementation side, Kiama is used. Kiama’s attribute grammar library supports a range of RAG features including cached, uncached, higher order, parameterized, and circular attributes. Kiama also implements techniques for dynamic extension and variation of attribute equations.
Overall, the Scala programming language is used because of its support for domain-specific notations and its emphasis on scalability. Unlike generators with specialized notation, Kiama attribute grammars use standard
Scala concepts such as case classes, pattern-matching functions for equations, traits and mixins for composition, and implicit parameters for forwarding.
In the following, the basic concepts for implementing an embedded parser using parser combinators and evaluating the parsed result are shown.
Simple mathematical expressions, e.g., 1 + 2 + 3 * 0
should be correctly parsed and evaluated. Kiama utilized plain Scala case classes for building the AST and representing the meta-model in one wash-up:
sealed abstract class Exp extends Product case class Num(i: Double) extends Exp case class Add(l: Exp, r: Exp) extends Exp case class Sub(l: Exp, r: Exp) extends Exp case class Mul(l: Exp, r: Exp) extends Exp case class Div(l: Exp, r: Exp) extends Exp
As a first step, parsing is now done with the following:
import org.bitbucket.inkytonik.kiama.parsing.{Error, Failure, Parsers, Success} import org.bitbucket.inkytonik.kiama.util.{Positions, StringSource} class MathParser extends Parsers(new Positions) { // The entry point. Construct an Exp out of a String def parse(in: String): Exp = parse(expr, StringSource(in)) match { case Success(e, _) => e case Failure(m, _) => throw new RuntimeException("Parsing failure: " + m) case Error(m, _) => throw new RuntimeException("Parsing error: " + m) } lazy val expr: Parser[Exp] = term ~ rep("[+-]".r ~ term) ^^ { case t ~ ts => ts.foldLeft(t) { case (t1, "+" ~ t2) => Add(t1, t2) case (t1, "-" ~ t2) => Sub(t1, t2) } } lazy val term = factor ~ rep("[*/]".r ~ factor) ^^ { case t ~ ts => ts.foldLeft(t) { case (t1, "*" ~ t2) => Mul(t1, t2) case (t1, "/" ~ t2) => Div(t1, t2) } } lazy val factor = "(" ~> expr <~ ")" | num lazy val num = regex("[0-9]+".r) ^^ (s => Num(s.toDouble)) }
Executable grammars are constructed from the composition of parsers. These compositions are described as functional mappings. The compositional integrity of the parser types is guaranteed by implementing their classes using monads.
With this, a parser combinator is typically a set of production rules. They are using the operators defined in the Scala Parser class for composing parser results as shown in the following table:
Operator | Purpose | Example from above |
---|---|---|
"..." |
Definition of a Literal |
"+" |
"...".r |
Definition of a Regex |
"[+-]".r |
P <~ Q |
Composition of parser P and Q, only the result of P is kept |
expr <~ ")" |
P ~> Q |
Composition of parser P and Q, only the result of Q is kept |
"(" ~> expr |
P ~ Q |
Composition of parser P and Q, both the results of P and Q are kept |
"[*/]".r ~ factor |
P | Q |
Alternative of parser P and Q |
"(" ~> expr <~ ")" | num |
P ^^ f |
Converting the result of parser P with applying function f |
regex("[0-9]+".r) ^^ (s => Num(s.toDouble)) |
As the second step, the evaluation is now purely simple:
import org.bitbucket.inkytonik.kiama.attribution.Attribution class MathExample extends Attribution { val value: CachedAttribute[Exp, Double] = attr { case Num(i) => i case Add(l, r) => value(l) + value(r) case Sub(l, r) => value(l) - value(r) case Mul(l, r) => value(l) * value(r) case Div(l, r) => value(l) / value(r) } }
The higher-ordered function attr
from the Kiama library automatically caches the evaluation. Hence, computation-intense calculations on the constructed AST run at most once for each element. Furthermore, the developer does not need to think about the details of AST traversal, reference resolution, and forwarding. This is all done by Kiama and the clever construction of the AST from of case classes (supporting all types of collections as fields) with an overlay graph.
Rewriting in the domain of RAGs permits model updates in terms of graph rewrites that in turn update analyses. E.g., one may want to optimize expressions, like, 3 * 0
to 0
or 0 - 1
to -1
before actually evaluating it. This can be achieved with the following extension to the class MathExample
shown above:
import org.bitbucket.inkytonik.kiama.attribution.Attribution import org.bitbucket.inkytonik.kiama.rewriting.Rewriter class MathExample extends Attribution with Rewriter[Exp, Exp] { val value: CachedAttribute[Exp, Double] = attr { case Num(i) => i case Add(l, r) => value(l) + value(r) case Sub(l, r) => value(l) - value(r) case Mul(l, r) => value(l) * value(r) case Div(l, r) => value(l) / value(r) } def optimise(e: Exp): Exp = rewrite(optimiser)(e) lazy val optimiser = bottomup(attempt(simplifier)) lazy val simplifier = rule[Exp] { case Add(Num(0), e) => e case Add(e, Num(0)) => e case Sub(Num(0), Num(v)) => Num(-v) case Sub(e, Num(0)) => e case Mul(Num(1), e) => e case Mul(e, Num(1)) => e case Mul(z@Num(0), _) => z case Mul(_, z@Num(0)) => z case Div(e, Num(1)) => e case Div(_, Num(0)) => throw new IllegalArgumentException("Division by 0!") } }
Different rewriting and traversal strategies (e.g., bottomup
, attempt
) can be utilized. This abstracts from actually having to think about traversing the DSL-specific AST.
Set-up a small Scala project with SBT and add the required library:
libraryDependencies += "org.bitbucket.inkytonik.kiama" % "kiama_2.12" % "2.2.0"
Then, make it run:
val someMath = new MathExample() val resultFunc: Exp => Double = someMath.value val ast: Exp = new MathParser().parse("1+2+3*0") println("AST: " + ast) var result = resultFunc(ast) println("Result before optimization: " + result) val optimizedAst: Exp = someMath.optimise(ast) println("Optimized AST: " + optimizedAst) result = resultFunc(optimizedAst) println("Result after optimization: " + result) assert(3 == result)
Remove Kiama and all library calls to it for even more pureness. What is required to retain caching of attribute evaluation and rewriting? What classic design-patterns could be used and what are their drawbacks?
Scala makes it easy to build embedded Domain Specific Languages. Reference Attribute Grammars are a handy tool to reason about the resulting model. Monad composition is awesome.
A while ago, during the application process for a German department store chain, I had the task to write a small log analyzer with Scala. It should consume the logs and produce statistics. A success log entry has the following structure:
HH:mm:ss:SS LEVEL - fetch http://www.server/endpoint.json took: timeInMilliseconds ms
A failure log entry has the following structure:
HH:mm:ss:SS LEVEL - error while fetching http://www.server/endpoint.json after timeInMilliseconds ms
The output should look like this:
http://srv1.host.de: avg: 215ms 478x success: 79% endpoints: products avg: 217ms count: 168x success: 81% (32 of 168 failed) teaser avg: 210ms count: 154x success: 76% (37 of 154 failed) profile avg: 220ms count: 156x success: 81% (31 of 156 failed) http://srv2.host.de: ...
I did not want to go with plain regular expressions and nested evaluation functions for the statistics as this is neither readable, extendable nor testable.
Hence, I came up with the following meta-model and AST:
object LogTree { case class Log(entries: List[LogEntry]) extends Product sealed abstract class LogEntry extends Product { val level: String val msg: String val date: String val server: String val endpoint: String val took: String val url: String = s"http://$server/$endpoint.json" override def toString: String = s"$date $level - $msg" } case class Info(date: String, server: String, endpoint: String, took: String) extends LogEntry { val level: String = "INFO" val msg: String = s"fetch $url took: $took ms" } case class Warn(date: String, server: String, endpoint: String, took: String) extends LogEntry { val level: String = "WARN" val msg: String = s"error while fetching $url after $took ms" } }
import java.net.URL import java.text.SimpleDateFormat import org.bitbucket.inkytonik.kiama.parsing.{Error, Failure, Parsers, Success} import org.bitbucket.inkytonik.kiama.util.{Positions, StringSource} class LogParser extends Parsers(new Positions) { private val Info = "INFO" private val Warn = "WARN" private val Separator = "-" private val Fetch = "fetch" private val Took = "took:" private val ErrorWhileFetching = "error while fetching" private val After = "after" private val DateFormat = "HH:mm:ss:SS" private val Json = ".json" private val Url = "[a-zA-Z0-9:/\\.]*\\.json" private val Ms = "ms" private val Date = "[a-zA-Z0-9:]*" private val Number = "[0-9]*" import LogTree._ def parseAll(in: List[String]): Log = Log(in.map(parse)) private def parse(in: String): LogEntry = parse(expr, StringSource(in)) match { case Success(s, _) => s case Failure(f, _) => throw new RuntimeException("Parsing failure: " + f) case Error(e, _) => throw new RuntimeException("Parsing error: " + e) } private lazy val expr: Parser[LogEntry] = date ~ level ~ separator ~ msg ^^ { case d ~ Info ~ Separator ~ m => new Info(d, m._1, m._2, m._3) case d ~ Warn ~ Separator ~ m => new Warn(d, m._1, m._2, m._3) } private lazy val separator: Parser[String] = whitespace ~> Separator <~ whitespace private lazy val level: Parser[String] = whitespace ~> (Warn | Info) private lazy val msg: Parser[(String, String, String)] = (fetchR | errR) ~ url ~ (tookR | afterR) ~ took ^^ { case Fetch ~ u ~ Took ~ t => (u._1, u._2, t) case ErrorWhileFetching ~ u ~ After ~ t => (u._1, u._2, t) } private lazy val fetchR: Parser[String] = Fetch <~ whitespace private lazy val errR: Parser[String] = ErrorWhileFetching <~ whitespace private lazy val tookR: Parser[String] = whitespace ~> Took <~ whitespace private lazy val afterR: Parser[String] = whitespace ~> After <~ whitespace private lazy val took: Parser[String] = Number.r <~ whitespace ~ Ms private lazy val url: Parser[(String, String)] = Url.r ^^ (u => { val parsedUrl = new URL(u) val file = parsedUrl.getFile (parsedUrl.getHost, file.substring(1, file.lastIndexOf(Json))) }) private lazy val formatter = new SimpleDateFormat(DateFormat) private lazy val date: Parser[String] = Date.r ^^ (s => formatter.format(formatter.parse(s))) }
import LogTree.{Info, Log, LogEntry, Warn} import org.bitbucket.inkytonik.kiama.attribution.Attribution class LogTreeAnalyzer(log: Log) { def printAllStatistics(): Unit = { uniqueServers(log).foreach(s => { val serverName = s val avg = avgForServer(serverName)(log) val numEntries = numEntriesForServer(serverName)(log) val succ = succRateForServer(serverName)(log) println(s"$s: avg: ${avg}ms ${numEntries}x success: $succ%") println("\tendpoints:") val uniqueEndpoints = uniqueEndpointsForServer(serverName)(log) uniqueEndpoints.foreach(ep => { val endpointName = ep val avgEP = avgForEndpoint((serverName, endpointName))(log) val numEntriesEP = numEntriesForEndpoint((serverName, endpointName))(log) val succEP = succRateForEndpoint((serverName, endpointName))(log) val numFailedEP = numFailuresForEndpoint((serverName, endpointName))(log) println(s"\t\t$endpointName avg: ${avgEP}ms count: ${numEntriesEP}x success: $succEP% ($numFailedEP of $numEntriesEP failed)") }) }) } val uniqueServers: (Log) => List[String] = attr { l => l.entries.foldLeft(Nil: List[String]) { (acc, next) => if (acc contains next.server) acc else next.server :: acc } } val entriesForServer: (String) => (Log) => List[LogEntry] = paramAttr { server => { l => l.entries.filter(_.server == server) } } val avgForServer: (String) => (Log) => Int = paramAttr { server => { l => { val entries = entriesForServer(server)(l) entries.map(_.took.toInt).sum / entries.length } } } val numEntriesForServer: (String) => (Log) => Int = paramAttr { server => { l => entriesForServer(server)(l).size } } val succRateForServer: (String) => (Log) => Int = paramAttr { server => { l => { val totalEntries = numEntriesForServer(server)(l) val successfulEntries = entriesForServer(server)(l).count { case _: Info => true case _: Warn => false } val rate = successfulEntries * 100f / totalEntries Math.round(rate) } } } val uniqueEndpointsForServer: (String) => (Log) => List[String] = paramAttr { server => { l => { val entries = entriesForServer(server)(l) entries.foldLeft(Nil: List[String]) { (acc, next) => if (acc contains next.endpoint) acc else next.endpoint :: acc } } } } val avgForEndpoint: ((String, String)) => (Log) => Int = paramAttr { addr => { l => { val server = addr._1 val endpoint = addr._2 val entries = entriesForServer(server)(l).filter(_.endpoint == endpoint) entries.map(_.took.toInt).sum / entries.length } } } val numEntriesForEndpoint: ((String, String)) => (Log) => Int = paramAttr { addr => { l => { val server = addr._1 val endpoint = addr._2 entriesForServer(server)(l).count(_.endpoint == endpoint) } } } val numFailuresForEndpoint: ((String, String)) => (Log) => Int = paramAttr { addr => { l => { val server = addr._1 val endpoint = addr._2 entriesForServer(server)(l).count { case i: Warn if i.endpoint == endpoint => true case _ => false } } } } val succRateForEndpoint: ((String, String)) => (Log) => Int = paramAttr { addr => { l => { val server = addr._1 val endpoint = addr._2 val successfulEntries = entriesForServer(server)(l).count { case i: Info if i.endpoint == endpoint => true case _ => false } val totalEntries = numEntriesForEndpoint((server, endpoint))(l) val rate = successfulEntries * 100f / totalEntries Math.round(rate) } } } }
Dealing with parsing and the semantic analysis of languages requires the language engineer to carefully craft compilers, interpreters, code generators, refactoring tools, and static code analyzers. Language workbenches, e.g., Xtext or MPS help you with these tasks by offering the generation of important ingredients (parsers, ASTs, editors, code generators).
If you do not have the time to learn these tools in depth or are technically restricted in some way (e.g., technical resources) embedded parsing and evaluation may be applicable to your embedded Domain Specific Language.
This article showcased three powerful concepts to do so: