EzDevInfo.com

jison

Bison in JavaScript. Jison

How to get Abstract Syntax Tree (AST) out of JISON parser?

So I have generated a parser via JISON:

// mygenerator.js
var Parser = require("jison").Parser;

// a grammar in JSON
var grammar = {
    "lex": {
        "rules": [
           ["\\s+", "/* skip whitespace */"],
           ["[a-f0-9]+", "return 'HEX';"]
        ]
    },

    "bnf": {
        "hex_strings" :[ "hex_strings HEX",
                         "HEX" ]
    }
};

// `grammar` can also be a string that uses jison's grammar format
var parser = new Parser(grammar);

// generate source, ready to be written to disk
var parserSource = parser.generate();

// you can also use the parser directly from memory

// returns true
parser.parse("adfe34bc e82a");

// throws lexical error
parser.parse("adfe34bc zxg");

My question is, how do I retrieve the AST now? I can see that I can run the parser against input, but it just returns true if it works or fails if not.

For the record, I am using JISON: http://zaach.github.com/jison/docs/


Source: (StackOverflow)

Looking for examples of Jison grammars that use indentation for block-structure

Has anyone got a simple example of how to define a grammar that parses python-like indentation for blocks using Jison?


Source: (StackOverflow)

Advertisements

Jison global variables

In previous versions of Jison, it was possible to have a Flex-like feature that allowed defining variables accessible in both the lexer and parser contexts, such as:

%{
var chars = 0;
var words = 0;
var lines = 0;
%}

%lex
%options flex

%%
\s
[^ \t\n\r\f\v]+ { words++; chars+= yytext.length; }
. { chars++; }
\n { chars++; lines++ }
/lex

%%
E : { console.log(lines + "\t" + words + "\t" + chars) ; };

Ref.: Flex like features?

Although, in the latest version of Jison, this isn't valid. chars, words and lines cannot be reached from the parser context, generating an error.

Searching more about the new version, I found that it should be possible by defining output, on parser's context, inside of %{ ... %}, but it doesn't work, although it is used for multi-line statements. I'm generating code from a source to a target language and I'll prettify this code, applying the correct indentation, controlled by the scope and generating directly from parser, without building an AST.

How do global definitions currently work in Jison?


Source: (StackOverflow)

How is this grammar ambiguous?

I'm writing a simple expression parser in Jison. Here's my grammar:

{
    "operators": [
        ["left", "+", "-"],
        ["left", "*", "/", "%"]
    ],
    "bnf": {
        "program": [
            ["statement EOF", "return $1;"]
        ],
        "statement": [
            ["expression NEWLINE", "$$ = $1 + ';';"]
        ],
        "expression": [
            ["NUMBER",                       "$$ = yytext;"],
            ["expression binary expression", "$$ = $1 + $2 + $3;"]
        ],
        "binary": [
            ["+",              "$$ = ' + ';"],
            ["-",              "$$ = ' - ';"],
            ["*",              "$$ = ' * ';"],
            ["/",              "$$ = ' / ';"],
            ["%",              "$$ = ' % ';"],
            ["binary NEWLINE", "$$ = $1;"]
        ]
    }
}

When I try to run it it gives me the following error:

Conflict in grammar: multiple actions possible when lookahead token is + in state
13
- reduce by rule: expression -> expression binary expression
- shift token (then go to state 8)
Conflict in grammar: multiple actions possible when lookahead token is - in state
13
- reduce by rule: expression -> expression binary expression
- shift token (then go to state 9)
Conflict in grammar: multiple actions possible when lookahead token is * in state
13
- reduce by rule: expression -> expression binary expression
- shift token (then go to state 10)
Conflict in grammar: multiple actions possible when lookahead token is / in state
13
- reduce by rule: expression -> expression binary expression
- shift token (then go to state 11)
Conflict in grammar: multiple actions possible when lookahead token is % in state
13
- reduce by rule: expression -> expression binary expression
- shift token (then go to state 12)

States with conflicts:
State 13
  expression -> expression binary expression . #lookaheads= NEWLINE + - * / %
  expression -> expression .binary expression
  binary -> .+
  binary -> .-
  binary -> .*
  binary -> ./
  binary -> .%
  binary -> .binary NEWLINE

However it still produces the correct output in the end. For example 2 + 3 * 5 / 7 % 11 is correctly translated to 2 + 3 * 5 / 7 % 11;.

The way I see it my grammar appears to be unambiguous, so why is Jison complaining?

Update: As @icktoofay explained it's an operator associativity problem. By parsing an operator as a non-terminal symbol operator precedence and associativity information is lost. Hence I solved the problem as follows:

{
    "operators": [
        ["left", "+", "-"],
        ["left", "*", "/", "%"]
    ],
    "bnf": {
        "program": [
            ["statement EOF", "return $1;"]
        ],
        "statement": [
            ["expression NEWLINE", "$$ = $1 + ';';"]
        ],
        "expression": [
            ["NUMBER",                          "$$ = yytext;"],
            ["expression + expression",         "$$ = $1 + ' + ' + $3;"],
            ["expression - expression",         "$$ = $1 + ' - ' + $3;"],
            ["expression * expression",         "$$ = $1 + ' * ' + $3;"],
            ["expression / expression",         "$$ = $1 + ' / ' + $3;"],
            ["expression % expression",         "$$ = $1 + ' % ' + $3;"],
            ["expression + NEWLINE expression", "$$ = $1 + ' + ' + $4;"],
            ["expression - NEWLINE expression", "$$ = $1 + ' - ' + $4;"],
            ["expression * NEWLINE expression", "$$ = $1 + ' * ' + $4;"],
            ["expression / NEWLINE expression", "$$ = $1 + ' / ' + $4;"],
            ["expression % NEWLINE expression", "$$ = $1 + ' % ' + $4;"]
        ]
    }
}

That being said this grammar only allows one optional newline to follow a binary operator. How do I rewrite it so as to allow an arbitrary number of newlines to follow a binary operator? Also there must be some way in which I don't have to write 2 rules for each operator.


Source: (StackOverflow)

Grammar spec resolving Shift/Reduce conflicts

I'm using Jison (Bison) to create a simple markup language. I'm clearly new to this, but slight variations are working very well. I just don't understand the source of the S/R conflict.

It doesn't seem matter that 'Text' is returned by two lexer actions (with different Start Conditions) and I like this because it seems to allow the grammar to have fewer rules and because the error messages to the user are consistent. I've tried making the 'Text' rule common regardless of context and I've also tried giving each token a different name, but it doesn't seem to have any effect on the S/R Conflicts when it's all together.

The parser is SUPPOSED to create a json-object with plain-text, sub-arrays, and various special nodes.

Specification:

/* lexical grammar */
%lex

%s bracketed

%%

<bracketed>(\\.|[^\\\,\[\]])+       { yytext = yytext.replace(/\\(.)/g, '$1'); return 'Text'; }
<INITIAL>(\\.|[^\\\[])+             { yytext = yytext.replace(/\\(.)/g, '$1'); return 'Text'; }
"["                                 { this.begin('bracketed'); return '['; }
"]"                                 { this.popState(); return ']'; }
","                                 return ','
<<EOF>>                             return 'END'

/lex

%start template

%%    

template
    : sentence END
    ;

sentence
    : /* empty */
    | sentence Text
    | sentence '[' ']'
    | sentence '[' dynamic ']'
    ;

dynamic
    : sentence
    /*| dynamic ',' sentence*/
    ;

Warnings:

Conflict in grammar: multiple actions possible when lookahead token is ] in state 5
- reduce by rule: sentence ->
- shift token (then go to state 6)

States with conflicts:
State 5
  sentence -> sentence [ .] #lookaheads= END Text [ ]
  sentence -> sentence [ .dynamic ] #lookaheads= END Text [ ]
  dynamic -> .sentence #lookaheads= ]
  sentence -> . #lookaheads= ] Text [
  sentence -> .sentence Text
  sentence -> .sentence [ ]
  sentence -> .sentence [ dynamic ]

Different generator algorithms have more or less trouble, but they all seem to have trouble.

Thanks!


Source: (StackOverflow)

Adding declarations to JISON

I have here an only slightly modified version of the JISON calculator example:

/* description: Parses end executes mathematical expressions. */

/* lexical grammar */
%lex
%%

\s+                   /* skip whitespace */
[0-9]+("."[0-9]+)?\b  return 'NUMBER'
"*"                   return '*'
"/"                   return '/'
"-"                   return '-'
"+"                   return '+'
"^"                   return '^'
"!"                   return '!'
"%"                   return '%'
"("                   return '('
")"                   return ')'
"PI"                  return 'PI'
"E"                   return 'E'
<<EOF>>               return 'EOF'
.                     return 'INVALID'

/lex

/* operator associations and precedence */

%left '+' '-'
%left '*' '/'
%left '^'
%right '!'
%right '%'
%left UMINUS

%start expressions

%% /* language grammar */

expressions
    : e EOF
        { typeof console !== 'undefined' ? console.log($1) : print($1);
          return $1; }
    ;

e
    : e '+' e
        {$$ = $1+$3;}
    | e '-' e
        {$$ = $1-$3;}
    | e '*' e
        {$$ = $1*$3;}
    | e '/' e
        {$$ = $1/$3;}
    | e '^' e
        {$$ = Math.pow($1, $3);}
    | e '!'
        {{
          $$ = fact($1);
        }}
    | e '%'
        {$$ = $1/100;}
    | '-' e %prec UMINUS
        {$$ = -$2;}
    | '(' e ')'
        {$$ = $2;}
    | NUMBER
        {$$ = Number(yytext);}
    | E
        {$$ = Math.E;}
    | PI
        {$$ = Math.PI;}
    ;

%%
/*why doesn't this work at runtime?
I see other examples defining declarations this way but I must be doing something wrong
I couldn't find a syntactically valid way of putting this declaration anywhere but here,
which is probably the issue*/
function fact(n) {
  var tot=1;
  for(var i=2;i<=n;++i) {
    tot*=i;
  }
  return tot;
}

Note the slight differences in the ! operator's definition. I'm trying to externally define the fact function rather than doing it inline.

As of now, it tells me at runtime fact is not defined. How can I fix this? Also, why does the calculator example use two braces around the factorial definition, {{ /*like so*/ }}?


Source: (StackOverflow)

How to get line number from an AST node (Jison)

I'm using Jison to build a simple calculator language, which includes variables. I want these variables to work similar to JavaScript, that is you have to initialise it with the var keyword the first time. In my language, I want to show an error if a variable gets re-initialise.

var myVar = 4
var myVar = 3
// Error, cannot reinitialise variable myVar on line 2

My question is, how do I get the line number for an AST node? In my grammer file, I can pass the line number from the parser to my AssignVariable object, but I'm wondering if there is a better way to do this?

stmt
    : 'PRINT' expr
        { $$ = new yy.Print($2) }
    | 'VAR' 'IDENTIFIER' 'ASSIGN' expr
        { $$ = new yy.AssignVariable($2, $4, $3); $$.lineNo = yylineno }
    | 'IDENTIFIER' 'ASSIGN' expr
        { $$ = new yy.SetVariable($1, $3, $2) }
    ;

I will also need the line number for other nodes in my compiler for other types of error checking.

A more high-level takeaway from this question could be: What's the best way to detect and handle compile time errors using Jison (or similar)?


Source: (StackOverflow)

Jison: Binary operation grammar conflict

In trying to set up my Jison grammar I had:

%left 'OR' 'AND'

%%

Expression:
    Operation
;

Operation:
    Expression Operator Expression {$$ = new yy.LogicalExpression($2, $1, $3)}
;

Operator:
    'AND'
|   'OR'
;

But that resulted in the following conflict message:

Conflict in grammar: multiple actions possible when lookahead token is OR in state 6
- reduce by rule: Operation -> Expression Operator Expression
- shift token (then go to state 5)
Conflict in grammar: multiple actions possible when lookahead token is AND in state 6
- reduce by rule: Operation -> Expression Operator Expression
- shift token (then go to state 4)

States with conflicts:
State 6
  Operation -> Expression Operator Expression . #lookaheads= $end OR AND
  Operation -> Expression .Operator Expression
  Operator -> .AND
  Operator -> .OR

When I replace eliminate the Operator non-terminal and instead write out the expression patterns directly:

%left 'OR' 'AND'

%%

Expression:
    Operation
;


Operation:
    Expression 'AND' Expression {$$ = new yy.LogicalExpression($2, $1, $3)}
|   Expression 'OR' Expression {$$ = new yy.LogicalExpression($2, $1, $3)}
;

I get no such error, why does the first grammar have a conflict, but not the second? They seem equivalent to my understanding.

Thanks in advance!


Source: (StackOverflow)

How do you match zero or more tokens in Jison?

I'm writing a simple expression parser in Jison allowing an arbitrary number of newlines to follow a binary operator in an expression. This is my grammar so far:

{
    "operators": [
        ["left", "+", "-"],
        ["left", "*", "/", "%"]
    ],
    "bnf": {
        "program": [
            ["statement EOF", "return $1;"]
        ],
        "statement": [
            ["expression newlines", "$$ = $1 + ';';"]
        ],
        "expression": [
            ["NUMBER",                           "$$ = yytext;"],
            ["expression + expression",          "$$ = $1 + ' + ' + $3;"],
            ["expression - expression",          "$$ = $1 + ' - ' + $3;"],
            ["expression * expression",          "$$ = $1 + ' * ' + $3;"],
            ["expression / expression",          "$$ = $1 + ' / ' + $3;"],
            ["expression % expression",          "$$ = $1 + ' % ' + $3;"],
            ["expression + newlines expression", "$$ = $1 + ' + ' + $4;"],
            ["expression - newlines expression", "$$ = $1 + ' - ' + $4;"],
            ["expression * newlines expression", "$$ = $1 + ' * ' + $4;"],
            ["expression / newlines expression", "$$ = $1 + ' / ' + $4;"],
            ["expression % newlines expression", "$$ = $1 + ' % ' + $4;"]
        ],
        "newlines": [
            ["NEWLINE",          ""],
            ["newlines NEWLINE", ""]
        ]
    }
}

As you can see I'm writing two rules for every binary operator. That seems to me to be very redundant. I would rather have a production which matches zero or more NEWLINE tokens (Kleene star) instead of one or more tokens (Kleene plus). How would you do this in Jison?


Source: (StackOverflow)

Debugging in Jison

I'm using Jison to write a parser. This is my grammar:

{
    "program": [
        ["statements EOF", "return $1;"]
    ],
    "statements": [
        ["statement",            "$$ = $1;"],
        ["statements statement", "$$ = $1 + '\\n' + $2;"]
    ],
    "statement": [
        ["expression NEWLINE", "$$ = $1 + ';';"]
    ],
    "expression": [
        ["NUMBER",                "$$ = yytext;"],
        ["expression expression", "$$ = $1 + ', ' + $2;"]
    ]
}

When I run it however I get the following error message:

Conflict in grammar: multiple actions possible when lookahead token is NUMBER in
state 9
- reduce by rule: expression -> expression expression
- shift token (then go to state 5)

States with conflicts:
State 9
  expression -> expression expression . #lookaheads= NEWLINE NUMBER
  expression -> expression .expression
  expression -> .NUMBER
  expression -> .expression expression

What am I supposed to make of this debug message? How would you explain this message in simple English? What does the period in expression -> expression expression . mean? What are .expression and .NUMBER? How are they different from expression and NUMBER respectively?


Source: (StackOverflow)

How to avoid conflicts in grammar

I have a grammar file — https://github.com/itrelease/fubar-script/blob/jsast/src/grammar.js but I get conflicts and I don't really know how to solve this. If someone could explain me it would be helpful.

This rules produce conflicts:

ParamVar: [
  ['Identifier', '$$ = $Identifier;'],
  ['THIS', '$$ = new yy.ThisExpression();']
],

PrimaryExpression: [
  ['THIS', '$$ = new yy.ThisExpression();'],
  ['Literal', '$$ = $Literal;'],
  ['ArrayLiteral', '$$ = $ArrayLiteral;'],
  ['Identifier', '$$ = $Identifier;'],
  ['ObjectLiteral', '$$ = $ObjectLiteral;'],
  ['( Expression )', '$$ = $Expression;']
],

Source: (StackOverflow)

jison start conditions with json format

Despite long search in documentation and forums, I still fail to get the right syntax for Jison start condition using JSON format in node.js

> ** Documentation at http://zaach.github.io/jison/docs/ says:
> // Using the JSON format, start conditions are defined with an array
> // before the rule’s 
> matcher {rules:[
>     [['expect'], '[0-9]+"."[0-9]+', 'console.log( "found a float, = " + yytext );'
>     ]]}

But unfortunately no one not provides a full working sample.

I'm trying to exclude any text that is in between two tags. In lex would use start conditions. Jison documentation says it should works. Nevertheless as Jison error messages are not very intuitive, I would be please to find a working sample to move forward.

Would any one have the solution ?

var jison    = require("jison").Parser;

grammar = {  
    "lex": {
        "rules" : [ [" +" , "/* skip whitespace */"]
            ,[['mode1'], '[0-z]+\\b'        , "return 'INFO';"]
            ,[['mode1'], '<\\/extensions>'  , "this.popState(); return 'EXTEND';"]
            ,['<extensions>'                , "this.begin('mode1'); return 'EXTSTART';"]
            ,['$'                           , "return 'EOL';"]
        ]
    },  // end Lex rules

    "bnf": { // WARNING: only one space in between TOKEN ex: "STOP EOF"
        'data': [["EOL"      , "this.cmd='EMPTY'    ; return (this);"]           
           ,['EXTSTART INFO EXTEND EOL'  ,"this.cmd='EXTEN';this.value=$2;return (this);"]
           ]
    }};

  parser    = new jison(grammar);

  test= "\
    <extensions>\
      <opencpn:start></opencpn:start><opencpn:end></opencpn:end>\
      <opencpn:viz>1</opencpn:viz>\
     <opencpn:guid>714d1d6e-78be-46a0-af6e-2f3d0c505f6d</opencpn:guid>\
    </extensions>";

  data=parser.parse (test);

My current sample fail with

/node_modules/jison/node_modules/jison-lex/regexp-lexer.js:42 startConditions[conditions[k]].rules.push(i);


Source: (StackOverflow)

How can I generate a parser with Jison which deals with grammar ambiguity?

I am trying to generate a parser in JavaScript via Jison for the language ChucK, and have got off to a good start except that there are ambiguities in the language which the generated parser is unable to handle. The original ChucK compiler is generated by Bison, and that must somehow be able to resolve these ambiguities.

For the purposes of this question I've simplified the problem to a construed grammar which presents only one ambiguity. For reference I've put up a gist of all the involved files (including the generated parser). The project structure is as follows:

The grammar itself looks as follows:

grammar = {
    Program: [
        ['ProgramSection', '$$ = new yy.Program($1);']
    ],
    ProgramSection: [
        ['Expression SEMICOLON', '$$ = new yy.ExpressionStatement($1);']
    ],
    Expression: [
        ['DeclExpression', '$$ = $1;'],
        ['Expression OP DeclExpression', '$$ = new yy.ExpFromBinary($1, $2, $3);']
    ],
    DeclExpression: [
        ['TypeDecl VarDeclList', '$$ = new yy.DeclExp($1, $2, 0);'],
        ['PrimaryExpression', '$$ = $1;']
    ],
    VarDeclList: [
        ['VarDecl', '$$ = new yy.VarDeclList($1);']
    ],
    VarDecl: [
        ['ID', '$$ = new yy.VarDecl($1);']
    ],
    TypeDecl: [
        ['ID', '$$ = new yy.TypeDecl(new yy.IdList($1), 0);']
    ],
    PrimaryExpression: [
        ['ID', '$$ = new yy.ExpFromId($1);']
    ]
};

The ambiguity is that the non-terminal DeclExpression can match either TypeDecl VarDeclList or PrimaryExpression. This makes Jison emit the following warning:

States with conflicts:
State 7
  TypeDecl -> ID . #lookaheads= ID SEMICOLON OP
  PrimaryExpression -> ID . #lookaheads= ID SEMICOLON OP

And the generated parser fails to parse the test code (Type var => out;) like so:

Error: Parse error on line 1: Unexpected 'SEMICOLON'

To my understanding, it's the part after the => operator that the parser tries to match against the rule TypeDecl VarDeclList.

So, how can I generate a parser that is able to deal with this ambiguity?


Source: (StackOverflow)

Jison ignores one of my rules

I'm trying to use Jison.

Here's my grammar:

var grammar = {
lex:{
    rules:[
        ["\\s+",            ""],
        ["then",            "return 'newline';"],
        ["goto",            "return 'goto';"],
        ["http[^\\s]*",     "return 'url';"],
        ["search",          "return 'search';"],
        ["should_exist",    "return 'exist';"],
        //["#(.)*",           "return 'elementById';"],
        //["$",               "return 'EOF';"]
    ]
},
bnf:{
    lines:[
        ['lines line',          "console.log('big expression is ',$3);  return ['l2',$1, $2];"],
        ['line',                "console.log('expression is ',$1); return ['l1',$1]"],
    ],
    line:[
        ["line newline",        "console.log('line newline', $1); $$ = $1"],
        ["goto url",            "console.log('goto', $2); $$ = {cmd:'goto', url: $2 } "],
        ["elementById exist",   "$$ = {cmd:'assert', elId: $1} "]
    ]
}

};

When i try to parse goto http://www.google.com then goto http://www.bing.com i only ever get [ 'l1', { cmd: 'goto', url: 'http://www.google.com' } ] returned.

I'm expecting to get both goto commands returned.

Any help with me figuring out my grammar?


Source: (StackOverflow)

jison grammar definition leads to wrong token recognition

I recently found the project jison and modified the calculator example from its website. (http://zaach.github.io/jison/demos/calc/)

/* lexical grammar */
%lex
%%

"a"                   return 'TOKEN1'
"b"                   return 'TOKEN2'
<<EOF>>               return 'EOF'
.                     return 'INVALID'

/lex

%start letters

%% /* language grammar */

letters
    :
    | letters letter
    ;

letter
    : 'TOKEN1'
    | 'TOKEN2'
    ;

Parsing the string "aaabbbaaba" with a parser generated by the above grammar definition results in

Parse error on line 1:
aaabbbaaba
^
Expecting 'TOKEN1', 'TOKEN2', got 'INVALID'

Unfortunately I don't know why TOKEN1 isn't found correctly. Having token INVALID removed I get the parse error

Unrecognized text.

I found the description of an association error, resulting in a similar error message, on Issue with a Jison Grammar, Strange error from generate dparser but I couldn't find something similar in my code.

What is a solution for this issue?


Source: (StackOverflow)