EzDevInfo.com

pegjs

PEG.js: Parser generator for JavaScript PEG.js – Parser Generator for JavaScript peg.js is a parser generator for javascript based on the parsing expression grammar formalism.

Peg left recursion removing

I have this pegjs grammar. How can I remove the left recursion from?

atom   = term
    /  "^"
    /  "_"
    /  "\\"
    /  atom "."
    /  atom "." label
    /  atom ".(" labels ")"
term = [a-zA-Z0-9]+
labels = label ("|" label)*
label  = ("+" / "-")* [A-Za-z0-9]+

Source: (StackOverflow)

Generate TextMate language grammar from PEG.js grammar

Is there a tool that translates a PEG.js grammar to a TextMate grammar?

I am building my own language and would like to have syntax highlighting for it in my preferred editor, TextMate. The grammar of my language is built with PEG.js. According to the TextMate documentation for this use-case, I have to write the TextMate grammar in a form that is incompatible with PEG.js.

I started writing a new TextMate grammar, but I quickly noticed that it takes quite a while to translate the whole grammar, or even the subset relevant for an acceptable syntax highlighting. Since I am incredibly lazy and don't want to do all this tedious work, I thought about automating this task.

Can anyone give me any clues how to automate, or at least speed up, the generation of TextMate grammar from a PEG.js grammar?


Source: (StackOverflow)

Advertisements

How do I parse this with peg grammar?

I'm trying to make a parser using pegjs. I need to parse something like:

blah blah START Lorem ipsum 
dolor sit amet, consectetur 
adipiscing elit END foo bar 
etc.

I have trouble writing the rule to catch the text from "START" to "END".


Source: (StackOverflow)

Lambda expressions in PEG.js

I have PEG grammar problem with lambda expressions, they work if I use syntax:

x:{y:{x+y}}(20)(30)

which is equivalent of

(function(x) { return function(y) { return x+y; }; })(20)(30);

but this don't work

f:{f(10)}(x:{x*x})

which is equivalent of:

(function(f) { return f(10); })(function(x) { return x*x; })

Is it possible to make that second function work with PEG.js?


Source: (StackOverflow)

Trouble with PEG.js end of input

I am trying to write a simple grammer for PEG.js that would match something like this:

some text;
arbitrary other text that can also have µnicode; different expression;
let's escape the \; semicolon, and \not recognized escapes are not a problem;
possibly last expression not ending with semicolon

So basically these are some texts separated by semicolons. My simplified grammer looks like this:

start
= flow:Flow

Flow
= instructions:Instruction*

Instruction
= Empty / Text

TextCharacter
= "\\;" /
.

Text
= text:TextCharacter+ ';' {return text.join('')}

Empty
= Semicolon

Semicolon "semicolon"
= ';'

The problem is that if I put anything other than a semicolon in the input, I get:

SyntaxError: Expected ";", "\\;" or any character but end of input found.

How to solve this? I've read that PEG.js is unable to match end of input.


Source: (StackOverflow)

Eliminate Left Recursion on this PEG.js grammar

(Note: I've read other questions like this, but I haven't been able to figure this out).

I wrote this grammar:

start = call

ident = [a-z]+
spaces = [ ]+

call = f:ident spaces g:(call / ident) {
    return f + "(" + g + ")";
}

With this input

a b c d

it returns

"a(b(c(d)))"

And I want

"a(b)(c)(d)"

I think this left recursive rule can give me something like that, but PEG.js doesn't support left recursion.

call = f:(call / ident) spaces g:ident {
    return f + "(" + g + ")";
}

How can I eliminate the left recursion in this case?

PS: You can test this on the online PEG.js demo


Source: (StackOverflow)

Peg.js in AngularJS webapp

I have an AngularJS web application.

I'd like to use peg.js in my application. I've just written a peg.js grammar: CriteriaValue.pegjs and generated the parser with the command line: pegjs CriteriaValue.pegjs, which generated CriteriaValue.js.

Could someone explain to me how to use the parser ?

var result = parser.parse('my string'); doesn't work.

I've created a plunker: http://plnkr.co/edit/Ae05SeZAjKOQ75B3lvLc?p=preview


Source: (StackOverflow)

How do you build a left-associative operator tree using PEG.js?

How do you build an AST (Abstract Syntax Tree) for left-associative operators using PEG.js?

I've tried to write some code based on the information I found on the internet, but I seem to have made a mistake.

The code I wrote generates an incorrect AST for most expressions.

Expression

12-6-4-2*1-1

Expected AST

{
    "left": {
        "left": {
            "left": {
                "left": 12,
                "operator": "-",
                "right": 6
            },
            "operator": "-",
            "right": 4
        },
        "operator": "-",
        "right": {
            "left": 2,
            "operator": "*",
            "right": 1
        }
    },
    "operator": "-",
    "right": 1
}

Generated AST

{
   "left": {
      "left": {
         "left": 12,
         "operator": "-",
         "right": 6
      },
      "operator": "-",
      "right": 4
   },
   "operator": "-",
   "right": {
      "left": 2,
      "operator": "*",
      "right": {
         "left": 1,
         "operator": "-",
         "right": 1
      }
   }
}

Code

{

    function operator(first, rest) {
        if (rest.length === 0) return first;

        return { left: first, right: rest };
    };

    function makeOperator(left, operator, right) {
        return { left: left, operator: operator[0], right: clean(right[1]) };
    };

    function clean(expression) {
        if (!expression.right) return expression;

        var result = makeOperator(expression.left, expression.right[0], expression.right[0]);

        for (var counter = 1, len = expression.right.length; counter < len; counter++) {
            result = makeOperator(result, expression.right[counter], expression.right[counter]);
        }

        return result;
    };

}

Start = E

E
  = expression:E1

    { return clean(expression); }

E1
  = expression:E2 rest:(("+" / "-") E2)*

    { return operator(expression, rest); }

E2
  = expression:Value rest:(("*" / "/") E1)*

    { return operator(expression, rest); }


Value
  = Number
  / BracketedExpression

Number
  = [1-9][0-9]*

    { return parseInt(text(), 10); }

BracketedExpression
  = "(" expression:E1 ")"

    { return expression; }

I would really appreciate any help or example code on how to build ASTs for both left-associative and right-associative operators.

Edit: As @Bergi pointed out, the problem was that E2 used E1 as the expression for the rest of the operator list instead of Value. However, the code that Bergi wrote is much simpler than mine.


Source: (StackOverflow)

Parsing complete mathematical expressions with PEG.js

I'm trying to extend the example grammar of PEG.js for parsing mathematical expressions with all the 4 operators for my online BASIC interpreter experiment:

http://www.dantonag.it/basicjs/basicjs.html

but not all the expressions are parsed correctly.

This is my PEG grammar:

expression = additive

additive = left:multiplicative atag:("+" / "-") right:additive { return {tag: atag, left:left, right:right}; } / multiplicative

multiplicative = left:primary atag:("*" / "/") right:multiplicative { return {tag: atag, left:left, right:right}; } / primary

primary = number / "(" additive:additive ")" { return additive; }

number = digits:[0-9]+ { return parseInt(digits.join(""), 10); }

It parses correctly expressions like 2*3+1 (giving 7), but not an expression like 2-1-1, that gives 2 instead of 0.

Can you help me improving and debugging this?

Thanks in advance.

Edit: I've added the "number" rule to the grammar. And yes, my grammar gives as output a recursive structure that is analogue to a parse tree.


Source: (StackOverflow)

Parsing boolean expression without left hand recursion

I'm trying to match this

f(some_thing) == 'something else'
  • f(some_thing) is a function call, which is an expression
  • == is a boolean operator
  • 'something else' is a string, which also is an expression

so the boolean expression should be

expression operator expression

The problem is I can't figure out how to do that without left recursion These are my rules

expression 
  = 
  bool_expression
  / function_call
  / string
  / real_number
  / integer
  / identifier

bool_expression
  = l:expression space* op:bool_operator space* r:expression 
  { return ... }

Using grammar notation, I have

O := ==|<=|>=|<|>|!=  // operators
E := B|....           // expression, many non terminals
B := EOE

Because my grammar is EOE I don't know how to use the left hand algorithm which is

A := Ab|B
transforms into
A := BA'
A':= e|bA

Where e is empty and b is a terminal


Source: (StackOverflow)

Simple parsing questions using PEG.js

I'm trying to wrap my head around PEG by entering simple grammars into the PEG.js playground.

Example 1:

  • Input: "abcdef1234567ghijklmn8901opqrs"
  • Desired output: ["abcdef", "1234567", "ghijklmn", "8901", "opqrs"]

  • Actual output: ["abcdef", ["1234567", ["ghijklmn", ["8901", ["opqrs", ""]]]]]

This example pretty much works, but can I get PEG.js to not nest the resulting array to a million levels? I assume the trick is to use concat() instead of join() somewhere, but I can't find the spot.

start
  = Text

Text
  = Numbers Text
  / Characters Text
  / EOF

Numbers
  = numbers: [0-9]+ {return numbers.join("")}

Characters
  = text: [a-z]+ {return text.join("")}

EOF
  = !.

Example 2:

Same problem and code as Example 1, but change the Characters rule to the following, which I expected would produce the same result.

Characters
  = text: (!Numbers .)+ {return text.join("")}

The resulting output is:

[",a,b,c,d,e,f", ["1234567", [",g,h,i,j,k,l,m,n", ["8901", [",o,p,q,r,s", ""]]]]]

Why do I get all these empty matches?

Example 3:

Last question. This doesn't work at all. How can I make it work? And for bonus points, any pointers on efficiency? For example, should I avoid recursion if possible?

I'd also appreciate a link to a good PEG tutorial. I've read (http://www.codeproject.com/KB/recipes/grammar_support_1.aspx), but as you can see I need more help ...

  • Input: 'abcdefghijklmnop"qrstuvwxyz"abcdefg'
  • Desired output: ["abcdefghijklmnop", "qrstuvwxyz", "abcdefg"]
  • Actual output: "abcdefghijklmnop\"qrstuvwxyz\"abcdefg"
start
  = Words

Words
  = Quote
  / Text
  / EOF

Quote
  = quote: ('"' .* '"') Words {return quote.join("")}

Text
  = text: (!Quote . Words) {return text.join("")}

EOF
  = !.

Source: (StackOverflow)

Pegjs: Don't allow reserved keywords as a variable name

I am writing my language in Pegjs and as usual, my language has some keywords, like true, false, if, else and today for instance. Now, I want to declare a variable, but apparently, the variable name cannot be one of the reserved keywords. It can be any alpha followed by an alpha-numeric, with the exception of the language keywords.

I did the following (testable in Pegjs Online):

variable = c:(alpha alphanum*)
{
 var keywords = ["true", "false", "if", "else", "today"];

  var res = c[0]
  for (var i = 0; i<c[1].length; i++) {
    res=res+c[1][i]
  }

  if(keywords.indexOf(res)>=0) {
    return error('\'' + res + '\'' + ' is a keyword and cannot be used as a variable name.');
  }

  return { 'dataType' : 'variable', 'dataValue' : res };
}

alpha = [a-zA-Z]
alphanum = [a-zA-Z0-9_]

boolean = v: ("true" / "false")
{
  return { 'dataType' : 'boolean', 'dataValue': v};
}

Now true is illegal, but true1 is not. This is fine. However, since I have defined the boolean structure somewhere else in my language, is it not possible to re-use that definition instead of manually re-defining the non-allowed keywords inside my variable definition?

You can imagine why my solution is error-prone. I tried several things but they did not work.

Thanks for your help!


Source: (StackOverflow)

Left Recursion Error in Peg.JS

I am currently making a programming language for a science fair.

This is my PEG.js grammar:

start
  = s:Statements
    { return ['Program', {}].concat(s); }
  / _

Statements
  = s:Statement ";"
    { return s; }
  / ss:Statements s:Statement ";"
    { return ss; ss.push(s); }
  / _

Statement
  = SetVar

SetVar
  = i:Ident "=" e:Expr
    { return ['SetVarStmt', {}, i, e]; }

Expr
  = Ident
  / Number

Number
  = n:[0-9]+
    { return ['Number', { val: parseInt(n.join(""), 10) }]; }

Ident
  = i:[a-zA-Z._]*
    { return ['Ident', { name: i.join("") }]; }

_ = [ \t\r\n]*

I get the following error: "Left recursion detected for rule 'Statements'." But I cannot figure out why this is occurring.


Source: (StackOverflow)

How do you parse nested comments in pegjs?

I was wondering how do you parse comments (say, a la Haskell), in pegjs.

The goal:

{-
    This is a comment and should parse.
    Comments start with {- and end with -}.
    If you've noticed, I still included {- and -} in the comment.
    This means that comments should also nest
    {- even {- to -} arbitrary -} levels
    But they should be balanced
-}

For example, the following should not parse:

{- I am an unbalanced -} comment -}

But you should also have an escape mechanism:

{- I can escape comment \{- characters like this \-} -}

This sorta seems like parsing s-expressions, but with s-expressions, it's easy:

sExpression = "(" [^)]* ")"

Because the close parens is just one character and I can "not" it with the carrot. As an aside, I'm wondering how one can "not" something that is longer than a single character in pegjs.

Thanks for your help.


Source: (StackOverflow)

PEGjs: Fallback (backtrack?) to string if floating point rule fail

I have an atom rule that tries to parse everything as either a number or a quoted string first, if that fails, then treat the thing as a string.

Everything parses fine except one particular case that is this very specific string:

DUD 123abc

Which fails to parse with Expected " ", "." or [0-9] but "a" found. error.

What I expect: it should parse successfully and return string "123abc" as a string atom. You can see several of my unsuccessful attempts commented out in the grammar content below.

Any help/tips/pointers/suggestions appreciated!


You can try the grammar on the online PEG.js version. I'm using node v0.8.23 and pegjs 0.7.0

Numbers that parses correctly:

  • `123
  • `0
  • `0.
  • `1.
  • `.23
  • `0.23
  • `1.23
  • `0.000
  • . <--- as string, not number and not error

I want 123abc to be parsed as a string, is this possible?


This is my entire grammar file:

start = lines:line+ { return lines; }

// --------------------- LINE STRUCTURE
line = command:command eol { return command; }

command = action:atom args:(sep atom)*
{
  var i = 0, len = 0;

  for (var i = 0, len = args.length; i < len; i++) {
    // discard parsed separator tokens
    args[i] = args[i][1];
  }

  return [action, args];
}

sep = ' '+
eol = "\r" / "\n" / "\r\n"

atom = num:number { return num; }
     / str:string_quoted { return str; }
     / str:string { return str; }

// --------------------- COMMANDS

// TODO:

// --------------------- STRINGS
string = chars:([^" \r\n]+) { return chars.join(''); }

string_quoted = '"' chars:quoted_chars* '"' { return chars.join(''); }
quoted_chars = '\\"' { return '"'; }
             / char:[^"\r\n] { return char; }

// --------------------- NUMBERS
number = integral:('0' / [1-9][0-9]*) fraction:("." [0-9]*)?
{
  if (fraction && fraction.length) {
    fraction = fraction[0] + fraction[1].join('');
  } else {
    fraction = '';
  }

  integral = integral instanceof Array ?
    integral[0] + integral[1].join('') :
    '0';

  return parseFloat(integral + fraction);
}
        / ("." / "0.") fraction:[0-9]+
{
  return parseFloat("0." + fraction.join(''));
}

/*
float = integral:integer? fraction:fraction { return integral + fraction; }

fraction = '.' digits:[0-9]* { return parseFloat('0.' + digits.join('')); }

integer = digits:('0' / [1-9][0-9]*)
{
  if (digits === '0') return 0;
  return parseInt(digits[0] + digits[1].join(''), 10);
}

*/

Source: (StackOverflow)