EzDevInfo.com

fastparse

Writing Fast Parsers Fast in Scala

How to match exactly 'n' given characters with FastParse

The FastParse parser-combinator scala library gives you the .rep(n) 'Repeat' method to allow you to create a new parser that attempts to parse the givenParser n or more times. What's the canonical way to do this if I want exactly n matches?

In my case, I want to parse a 40-character Git commit id - if it were longer than 40 characters, that's not a commit id, and it shouldn't be a match.

The closest example I've found in docs so far is:

val unicodeEscape = P( "u" ~ hexDigit ~ hexDigit ~ hexDigit ~ hexDigit )

...which matches 4 characters with simple repetition (verbose for a 40-character commit id).

These are parser-combinators, not regex, where the answer would be something like \p{XDigit}{40}.

Source: (StackOverflow)

Why doesn't parser combinator backtrack?

Consider

import util.parsing.combinator._
object TreeParser extends JavaTokenParsers {

    lazy val expr: Parser[String] = decimalNumber | sum
                                                  //> expr: => TreeParser.Parser[String]
    lazy val sum: Parser[String] = expr ~ "+" ~ expr ^^ {case a ~ plus ~ b => s"($a)+($b)"}
                                                  //> sum: => TreeParser.Parser[String]
    println(parseAll(expr, "1 + 1"))                       //> TreeParser.ParseResult[String] = [1.3] failure: string matching regex 
                                              //| `\z' expected but `+' found
}

The same story with fastparse

import fastparse.all._
val expr: P[Any] = P("1" | sum)
val sum: P[Any] = expr ~ "+" ~ expr
val top = expr ~ End
println(top.parse("1+1")) // Failure(End:1:2 ..."+1")

Parsers are great to discover that taking the first literal is a bad idea but do not try to fall back to the sum production. Why?

I understand that parser takes the first branch that can successfully eat up a part of input string and exits. Here, "1" of expression matches the first input char and parsing completes. In order to grab more, we need to make sum the first alternative. However, plain stupid

lazy val expr: Parser[String] = sum | "1"

endы up with stack overflow. The library authors therefore approach it from another side

val sum: P[Any] = P( num ~ ("+".! ~/ num).rep )
val top: P[Any]   = P( sum ~ End )

Here, we start sum with terminal, which is fine but this syntax is more verbose and, furthermore, it produces a terminal, followed by a list, which is good for a reduction operator, like sum, but is difficult to map to a series of binary operators.

What if your language defines expression, which admits a binary operator? You want to match every occurrence of expr op expr and map it to a corresponding tree node

expr ~ "op" ~ expr ^^ {case a ~ _ ~ b => BinOp(a,b)"}

How do you do that?

Source: (StackOverflow)

P[Node].rep produces P[String] rather than array of nodes

I would expect the result for plusses to be some kind of array

case class Plus()
val plus: P[Plus] = P ("+") map {_ => Plus()}
val plusses: P[List[Plus]] = P ( plus.rep.! )  // type mismatch; found: Parser[String]  required: Parser[List[Plus]]

but compiler says

type mismatch; found : Parser[String] required: Parser[List[Plus]]

Source: (StackOverflow)