pegjs
PEG.js: Parser generator for JavaScript
PEG.js – Parser Generator for JavaScript peg.js is a parser generator for javascript based on the parsing expression grammar formalism.
I have this pegjs grammar.
How can I remove the left recursion from?
atom = term
/ "^"
/ "_"
/ "\\"
/ atom "."
/ atom "." label
/ atom ".(" labels ")"
term = [a-zA-Z0-9]+
labels = label ("|" label)*
label = ("+" / "-")* [A-Za-z0-9]+
Source: (StackOverflow)
Is there a tool that translates a PEG.js grammar to a TextMate grammar?
I am building my own language and would like to have syntax highlighting for it in my preferred editor, TextMate. The grammar of my language is built with PEG.js. According to the TextMate documentation for this use-case, I have to write the TextMate grammar in a form that is incompatible with PEG.js.
I started writing a new TextMate grammar, but I quickly noticed that it takes quite a while to translate the whole grammar, or even the subset relevant for an acceptable syntax highlighting. Since I am incredibly lazy and don't want to do all this tedious work, I thought about automating this task.
Can anyone give me any clues how to automate, or at least speed up, the generation of TextMate grammar from a PEG.js grammar?
Source: (StackOverflow)
I'm trying to make a parser using pegjs. I need to parse something like:
blah blah START Lorem ipsum
dolor sit amet, consectetur
adipiscing elit END foo bar
etc.
I have trouble writing the rule to catch the text from "START"
to "END"
.
Source: (StackOverflow)
I have PEG grammar problem with lambda expressions, they work if I use syntax:
x:{y:{x+y}}(20)(30)
which is equivalent of
(function(x) { return function(y) { return x+y; }; })(20)(30);
but this don't work
f:{f(10)}(x:{x*x})
which is equivalent of:
(function(f) { return f(10); })(function(x) { return x*x; })
Is it possible to make that second function work with PEG.js?
Source: (StackOverflow)
I am trying to write a simple grammer for PEG.js that would match something like this:
some text;
arbitrary other text that can also have µnicode; different expression;
let's escape the \; semicolon, and \not recognized escapes are not a problem;
possibly last expression not ending with semicolon
So basically these are some texts separated by semicolons. My simplified grammer looks like this:
start
= flow:Flow
Flow
= instructions:Instruction*
Instruction
= Empty / Text
TextCharacter
= "\\;" /
.
Text
= text:TextCharacter+ ';' {return text.join('')}
Empty
= Semicolon
Semicolon "semicolon"
= ';'
The problem is that if I put anything other than a semicolon in the input, I get:
SyntaxError: Expected ";", "\\;" or any character but end of input found.
How to solve this? I've read that PEG.js is unable to match end of input.
Source: (StackOverflow)
(Note: I've read other questions like this, but I haven't been able to figure this out).
I wrote this grammar:
start = call
ident = [a-z]+
spaces = [ ]+
call = f:ident spaces g:(call / ident) {
return f + "(" + g + ")";
}
With this input
a b c d
it returns
"a(b(c(d)))"
And I want
"a(b)(c)(d)"
I think this left recursive rule can give me something like that, but PEG.js doesn't support left recursion.
call = f:(call / ident) spaces g:ident {
return f + "(" + g + ")";
}
How can I eliminate the left recursion in this case?
PS: You can test this on the online PEG.js demo
Source: (StackOverflow)
I have an AngularJS web application.
I'd like to use peg.js in my application.
I've just written a peg.js grammar: CriteriaValue.pegjs and generated the parser with the command line:
pegjs CriteriaValue.pegjs
, which generated CriteriaValue.js.
Could someone explain to me how to use the parser ?
var result = parser.parse('my string'); doesn't work.
I've created a plunker:
http://plnkr.co/edit/Ae05SeZAjKOQ75B3lvLc?p=preview
Source: (StackOverflow)
How do you build an AST (Abstract Syntax Tree) for left-associative operators using PEG.js?
I've tried to write some code based on the information I found on the internet, but I seem to have made a mistake.
The code I wrote generates an incorrect AST for most expressions.
Expression
12-6-4-2*1-1
Expected AST
{
"left": {
"left": {
"left": {
"left": 12,
"operator": "-",
"right": 6
},
"operator": "-",
"right": 4
},
"operator": "-",
"right": {
"left": 2,
"operator": "*",
"right": 1
}
},
"operator": "-",
"right": 1
}
Generated AST
{
"left": {
"left": {
"left": 12,
"operator": "-",
"right": 6
},
"operator": "-",
"right": 4
},
"operator": "-",
"right": {
"left": 2,
"operator": "*",
"right": {
"left": 1,
"operator": "-",
"right": 1
}
}
}
Code
{
function operator(first, rest) {
if (rest.length === 0) return first;
return { left: first, right: rest };
};
function makeOperator(left, operator, right) {
return { left: left, operator: operator[0], right: clean(right[1]) };
};
function clean(expression) {
if (!expression.right) return expression;
var result = makeOperator(expression.left, expression.right[0], expression.right[0]);
for (var counter = 1, len = expression.right.length; counter < len; counter++) {
result = makeOperator(result, expression.right[counter], expression.right[counter]);
}
return result;
};
}
Start = E
E
= expression:E1
{ return clean(expression); }
E1
= expression:E2 rest:(("+" / "-") E2)*
{ return operator(expression, rest); }
E2
= expression:Value rest:(("*" / "/") E1)*
{ return operator(expression, rest); }
Value
= Number
/ BracketedExpression
Number
= [1-9][0-9]*
{ return parseInt(text(), 10); }
BracketedExpression
= "(" expression:E1 ")"
{ return expression; }
I would really appreciate any help or example code on how to build ASTs for both left-associative and right-associative operators.
Edit: As @Bergi pointed out, the problem was that E2
used E1
as the expression for the rest of the operator list instead of Value
. However, the code that Bergi wrote is much simpler than mine.
Source: (StackOverflow)
I'm trying to extend the example grammar of PEG.js for parsing mathematical expressions with all the 4 operators for my online BASIC interpreter experiment:
http://www.dantonag.it/basicjs/basicjs.html
but not all the expressions are parsed correctly.
This is my PEG grammar:
expression = additive
additive = left:multiplicative atag:("+" / "-") right:additive { return {tag: atag, left:left, right:right}; } / multiplicative
multiplicative = left:primary atag:("*" / "/") right:multiplicative { return {tag: atag, left:left, right:right}; } / primary
primary = number / "(" additive:additive ")" { return additive; }
number = digits:[0-9]+ { return parseInt(digits.join(""), 10); }
It parses correctly expressions like 2*3+1 (giving 7), but not an expression like 2-1-1, that gives 2 instead of 0.
Can you help me improving and debugging this?
Thanks in advance.
Edit: I've added the "number" rule to the grammar. And yes, my grammar gives as output a recursive structure that is analogue to a parse tree.
Source: (StackOverflow)
I'm trying to match this
f(some_thing) == 'something else'
- f(some_thing) is a function call, which is an expression
- == is a boolean operator
- 'something else' is a string, which also is an expression
so the boolean expression should be
expression operator expression
The problem is I can't figure out how to do that without left recursion
These are my rules
expression
=
bool_expression
/ function_call
/ string
/ real_number
/ integer
/ identifier
bool_expression
= l:expression space* op:bool_operator space* r:expression
{ return ... }
Using grammar notation, I have
O := ==|<=|>=|<|>|!= // operators
E := B|.... // expression, many non terminals
B := EOE
Because my grammar is EOE I don't know how to use the left hand algorithm which is
A := Ab|B
transforms into
A := BA'
A':= e|bA
Where e is empty and b is a terminal
Source: (StackOverflow)
I'm trying to wrap my head around PEG by entering simple grammars into the PEG.js playground.
Example 1:
- Input:
"abcdef1234567ghijklmn8901opqrs"
Desired output: ["abcdef", "1234567",
"ghijklmn", "8901", "opqrs"]
Actual output: ["abcdef", ["1234567", ["ghijklmn", ["8901", ["opqrs", ""]]]]]
This example pretty much works, but can I get PEG.js to not nest the resulting array to a million levels? I assume the trick is to use concat()
instead of join()
somewhere, but I can't find the spot.
start
= Text
Text
= Numbers Text
/ Characters Text
/ EOF
Numbers
= numbers: [0-9]+ {return numbers.join("")}
Characters
= text: [a-z]+ {return text.join("")}
EOF
= !.
Example 2:
Same problem and code as Example 1, but change the Characters rule to the following, which I expected would produce the same result.
Characters
= text: (!Numbers .)+ {return text.join("")}
The resulting output is:
[",a,b,c,d,e,f", ["1234567", [",g,h,i,j,k,l,m,n", ["8901", [",o,p,q,r,s", ""]]]]]
Why do I get all these empty matches?
Example 3:
Last question. This doesn't work at all. How can I make it work? And for bonus points, any pointers on efficiency? For example, should I avoid recursion if possible?
I'd also appreciate a link to a good PEG tutorial. I've read (http://www.codeproject.com/KB/recipes/grammar_support_1.aspx), but as you can see I need more help ...
- Input:
'abcdefghijklmnop"qrstuvwxyz"abcdefg'
- Desired output:
["abcdefghijklmnop", "qrstuvwxyz",
"abcdefg"]
- Actual output:
"abcdefghijklmnop\"qrstuvwxyz\"abcdefg"
start
= Words
Words
= Quote
/ Text
/ EOF
Quote
= quote: ('"' .* '"') Words {return quote.join("")}
Text
= text: (!Quote . Words) {return text.join("")}
EOF
= !.
Source: (StackOverflow)
I am writing my language in Pegjs and as usual, my language has some keywords, like true
, false
, if
, else
and today
for instance. Now, I want to declare a variable, but apparently, the variable name cannot be one of the reserved keywords. It can be any alpha followed by an alpha-numeric, with the exception of the language keywords.
I did the following (testable in Pegjs Online):
variable = c:(alpha alphanum*)
{
var keywords = ["true", "false", "if", "else", "today"];
var res = c[0]
for (var i = 0; i<c[1].length; i++) {
res=res+c[1][i]
}
if(keywords.indexOf(res)>=0) {
return error('\'' + res + '\'' + ' is a keyword and cannot be used as a variable name.');
}
return { 'dataType' : 'variable', 'dataValue' : res };
}
alpha = [a-zA-Z]
alphanum = [a-zA-Z0-9_]
boolean = v: ("true" / "false")
{
return { 'dataType' : 'boolean', 'dataValue': v};
}
Now true
is illegal, but true1
is not. This is fine. However, since I have defined the boolean
structure somewhere else in my language, is it not possible to re-use that definition instead of manually re-defining the non-allowed keywords inside my variable
definition?
You can imagine why my solution is error-prone. I tried several things but they did not work.
Thanks for your help!
Source: (StackOverflow)
I am currently making a programming language for a science fair.
This is my PEG.js grammar:
start
= s:Statements
{ return ['Program', {}].concat(s); }
/ _
Statements
= s:Statement ";"
{ return s; }
/ ss:Statements s:Statement ";"
{ return ss; ss.push(s); }
/ _
Statement
= SetVar
SetVar
= i:Ident "=" e:Expr
{ return ['SetVarStmt', {}, i, e]; }
Expr
= Ident
/ Number
Number
= n:[0-9]+
{ return ['Number', { val: parseInt(n.join(""), 10) }]; }
Ident
= i:[a-zA-Z._]*
{ return ['Ident', { name: i.join("") }]; }
_ = [ \t\r\n]*
I get the following error: "Left recursion detected for rule 'Statements'."
But I cannot figure out why this is occurring.
Source: (StackOverflow)
I was wondering how do you parse comments (say, a la Haskell), in pegjs.
The goal:
{-
This is a comment and should parse.
Comments start with {- and end with -}.
If you've noticed, I still included {- and -} in the comment.
This means that comments should also nest
{- even {- to -} arbitrary -} levels
But they should be balanced
-}
For example, the following should not parse:
{- I am an unbalanced -} comment -}
But you should also have an escape mechanism:
{- I can escape comment \{- characters like this \-} -}
This sorta seems like parsing s-expressions, but with s-expressions, it's easy:
sExpression = "(" [^)]* ")"
Because the close parens is just one character and I can "not" it with the carrot. As an aside, I'm wondering how one can "not" something that is longer than a single character in pegjs.
Thanks for your help.
Source: (StackOverflow)
I have an atom
rule that tries to parse everything as either a number or a quoted string first, if that fails, then treat the thing as a string.
Everything parses fine except one particular case that is this very specific string:
DUD 123abc
Which fails to parse with Expected " ", "." or [0-9] but "a" found.
error.
What I expect: it should parse successfully and return string "123abc" as a string atom. You can see several of my unsuccessful attempts commented out in the grammar content below.
Any help/tips/pointers/suggestions appreciated!
You can try the grammar on the online PEG.js version. I'm using node v0.8.23 and pegjs 0.7.0
Numbers that parses correctly:
- `123
- `0
- `0.
- `1.
- `.23
- `0.23
- `1.23
- `0.000
.
<--- as string, not number and not error
I want 123abc
to be parsed as a string, is this possible?
This is my entire grammar file:
start = lines:line+ { return lines; }
// --------------------- LINE STRUCTURE
line = command:command eol { return command; }
command = action:atom args:(sep atom)*
{
var i = 0, len = 0;
for (var i = 0, len = args.length; i < len; i++) {
// discard parsed separator tokens
args[i] = args[i][1];
}
return [action, args];
}
sep = ' '+
eol = "\r" / "\n" / "\r\n"
atom = num:number { return num; }
/ str:string_quoted { return str; }
/ str:string { return str; }
// --------------------- COMMANDS
// TODO:
// --------------------- STRINGS
string = chars:([^" \r\n]+) { return chars.join(''); }
string_quoted = '"' chars:quoted_chars* '"' { return chars.join(''); }
quoted_chars = '\\"' { return '"'; }
/ char:[^"\r\n] { return char; }
// --------------------- NUMBERS
number = integral:('0' / [1-9][0-9]*) fraction:("." [0-9]*)?
{
if (fraction && fraction.length) {
fraction = fraction[0] + fraction[1].join('');
} else {
fraction = '';
}
integral = integral instanceof Array ?
integral[0] + integral[1].join('') :
'0';
return parseFloat(integral + fraction);
}
/ ("." / "0.") fraction:[0-9]+
{
return parseFloat("0." + fraction.join(''));
}
/*
float = integral:integer? fraction:fraction { return integral + fraction; }
fraction = '.' digits:[0-9]* { return parseFloat('0.' + digits.join('')); }
integer = digits:('0' / [1-9][0-9]*)
{
if (digits === '0') return 0;
return parseInt(digits[0] + digits[1].join(''), 10);
}
*/
Source: (StackOverflow)