Writing Fast Parsers Fast in Scala
The FastParse parser-combinator scala library gives you the .rep(n)
'Repeat' method to allow you to create a new parser that attempts to parse the givenParser n
or more times. What's the canonical way to do this if I want exactly n
matches?
In my case, I want to parse a 40-character Git commit id - if it were longer than 40 characters, that's not a commit id, and it shouldn't be a match.
The closest example I've found in docs so far is:
val unicodeEscape = P( "u" ~ hexDigit ~ hexDigit ~ hexDigit ~ hexDigit )
...which matches 4 characters with simple repetition (verbose for a 40-character commit id).
These are parser-combinators, not regex, where the answer would be something like \p{XDigit}{40}
.
Source: (StackOverflow)
Consider
import util.parsing.combinator._
object TreeParser extends JavaTokenParsers {
lazy val expr: Parser[String] = decimalNumber | sum
//> expr: => TreeParser.Parser[String]
lazy val sum: Parser[String] = expr ~ "+" ~ expr ^^ {case a ~ plus ~ b => s"($a)+($b)"}
//> sum: => TreeParser.Parser[String]
println(parseAll(expr, "1 + 1")) //> TreeParser.ParseResult[String] = [1.3] failure: string matching regex
//| `\z' expected but `+' found
}
The same story with fastparse
import fastparse.all._
val expr: P[Any] = P("1" | sum)
val sum: P[Any] = expr ~ "+" ~ expr
val top = expr ~ End
println(top.parse("1+1")) // Failure(End:1:2 ..."+1")
Parsers are great to discover that taking the first literal is a bad idea but do not try to fall back to the sum production. Why?
I understand that parser takes the first branch that can successfully eat up a part of input string and exits. Here, "1" of expression matches the first input char and parsing completes. In order to grab more, we need to make sum the first alternative. However, plain stupid
lazy val expr: Parser[String] = sum | "1"
endы up with stack overflow. The library authors therefore approach it from another side
val sum: P[Any] = P( num ~ ("+".! ~/ num).rep )
val top: P[Any] = P( sum ~ End )
Here, we start sum with terminal, which is fine but this syntax is more verbose and, furthermore, it produces a terminal, followed by a list, which is good for a reduction operator, like sum, but is difficult to map to a series of binary operators.
What if your language defines expression, which admits a binary operator? You want to match every occurrence of expr op expr
and map it to a corresponding tree node
expr ~ "op" ~ expr ^^ {case a ~ _ ~ b => BinOp(a,b)"}
How do you do that?
Source: (StackOverflow)