EzDevInfo.com

treetop

A Ruby-based parsing DSL based on parsing expression grammars.

Whitespace in Treetop grammar

How explicit do I need to be when specifying were whitespace is or is not allowed? For instance would these rules:

rule lambda
  'lambda' ( '(' params ')' )? block
end

rule params
  # ...
end

rule block
  '{' # ... '}'
end

be sufficient to match

lambda {
}

Basically do I need to specify everywhere optional whitespace may appear?


Source: (StackOverflow)

Non-greedy matching in Treetop/PEG?

How would I do something like this in Treetop?

/.+?;/

It seems like the only way is to do:

[^;]+ ';'

Which is kind of ugly.. any other way? .+? doesn't seem to work..


Source: (StackOverflow)

Advertisements

PEG for Python style indentation

How would you write a Parsing Expression Grammar in any of the following Parser Generators (PEG.js, Citrus, Treetop) which can handle Python/Haskell/CoffeScript style indentation:

Examples of a not-yet-existing programming language:

square x =
    x * x

cube x =
    x * square x

fib n =
  if n <= 1
    0
  else
    fib(n - 2) + fib(n - 1) # some cheating allowed here with brackets

Update: Don't try to write an interpreter for the examples above. I'm only interested in the indentation problem. Another example might be parsing the following:

foo
  bar = 1
  baz = 2
tap
  zap = 3

# should yield (ruby style hashmap):
# {:foo => { :bar => 1, :baz => 2}, :tap => { :zap => 3 } }

Source: (StackOverflow)

Vim indenting file for Treetop (Ruby parser)

Has anyone seen a vim indent file for treetop, the Ruby parser/generator? I've found a vim syntax highlighting file, but haven't seen one for indentation.


Source: (StackOverflow)

Learning Treetop

I'm trying to teach myself Ruby's Treetop grammar generator. I am finding that not only is the documentation woefully sparse for the "best" one out there, but that it doesn't seem to work as intuitively as I'd hoped.

On a high level, I'd really love a better tutorial than the on-site docs or the video, if there is one.

On a lower level, here's a grammar I cannot get to work at all:

grammar SimpleTest

  rule num
    (float / integer)
  end

  rule float
   (
    (( '+' / '-')? plain_digits '.' plain_digits) /
    (( '+' / '-')? plain_digits ('E' / 'e') plain_digits ) /
    (( '+' / '-')? plain_digits '.') / 
    (( '+' / '-')? '.' plain_digits) 
   ) {
      def eval
        text_value.to_f
      end
   }
  end

  rule integer
    (( '+' / '-' )? plain_digits) {
      def eval
        text_value.to_i
      end
    }
  end

  rule plain_digits
    [0-9] [0-9]*      
  end

end

When I load it and run some assertions in a very simple test object, I find:

assert_equal @parser.parse('3.14').eval,3.14

Works fine, while

assert_equal @parser.parse('3').eval,3

raises the error: NoMethodError: private method `eval' called for #

If I reverse integer and float on the description, both integers and floats give me this error. I think this may be related to limited lookahead, but I cannot find any information in any of the docs to even cover the idea of evaluating in the "or" context

A bit more info that may help. Here's pp information for both those parse() blocks.

The float:

SyntaxNode+Float4+Float0 offset=0, "3.14" (eval,plain_digits):
  SyntaxNode offset=0, ""
  SyntaxNode+PlainDigits0 offset=0, "3":
    SyntaxNode offset=0, "3"
    SyntaxNode offset=1, ""
  SyntaxNode offset=1, "."
  SyntaxNode+PlainDigits0 offset=2, "14":
    SyntaxNode offset=2, "1"
    SyntaxNode offset=3, "4":
      SyntaxNode offset=3, "4"

The Integer... note that it seems to have been defined to follow the integer rule, but not caught the eval() method:

SyntaxNode+Integer0 offset=0, "3" (plain_digits):
  SyntaxNode offset=0, ""
  SyntaxNode+PlainDigits0 offset=0, "3":
    SyntaxNode offset=0, "3"
    SyntaxNode offset=1, ""

Update:

I got my particular problem working, but I have no clue why:

  rule integer
    ( '+' / '-' )? plain_digits
     {
      def eval
        text_value.to_i
      end
    }
  end

This makes no sense with the docs that are present, but just removing the extra parentheses made the match include the Integer1 class as well as Integer0. Integer1 is apparently the class holding the eval() method. I have no idea why this is the case.

I'm still looking for more info about treetop.


Source: (StackOverflow)

recognize Ruby code in Treetop grammar

I'm trying to use Treetop to parse an ERB file. I need to be able to handle lines like the following:

<% ruby_code_here %>
<%= other_ruby_code %>

Since Treetop is written in Ruby, and you write Treetop grammars in Ruby, is there already some existing way in Treetop to say "hey, look for Ruby code here, and give me its breakdown" without me having to write out separate rules to handle all parts of the Ruby language? I'm looking for a way, in my .treetop grammar file, to have something like:

rule erb_tag
  "<%" ruby_code "%>" {
    def content
      ...
    end
  }
end

Where ruby_code is handled by some rules that Treetop provides.

Edit: someone else parsed ERB using Ruby-lex, but I got errors trying to reproduce what he did. The rlex program did not produce a full class when it generated the parser class.

Edit: right, so you lot are depressing, but thanks for the info. :) For my Master's project, I'm writing a test case generator that needs to work with ERB as input. Fortunately, for my purposes, I only need to recognize a few things in the ERB code, such as if statements and other conditionals as well as loops. I think I can come up with Treetop grammar to match that, with the caveat that it isn't complete for Ruby.


Source: (StackOverflow)

How to deal with Treetop left-recursion

I have a grammar file for a new general-purpose programming language I'm trying to build. I'm trying to make the language robust and natural to use (it is heavily inspired by Ruby, among others), and in doing so I have introduced some left-recursive rules.

I've seen some examples that seem to indicate the following left-recursive rule:

rule l_recurse
  l_recurse / 'something else'
end

can be made non-left-recursive by changing it to:

rule r_recurse
  'something else' / r_recurse
end

To me this looks like it would have the a different problem and would still fail. Am I right, or will this "just work"?

The specific left-recursions I'm trying to (find and) eliminate are found in this grammar file. I'm not sure which rules are affected, but at least some were pointed out to have left-recursion. (By the way I have tried to eliminate the specific range issue he mentioned by tightening up range's rule.)


Source: (StackOverflow)

Treetop grammar infinite loop

I have had some ideas for a new programming language floating around in my head, so I thought I'd take a shot at implementing it. A friend suggested I try using Treetop (the Ruby gem) to create a parser. Treetop's documentation is sparse, and I've never done this sort of thing before.

My parser is acting like it has an infinite loop in it, but with no stack traces; it is proving difficult to track down. Can somebody point me in the direction of an entry-level parsing/AST guide? I really need something that list rules, common usage etc for using tools like Treetop. My parser grammer is on GitHub, in case someone wishes to help me improve it.

class {
  initialize = lambda (name) {
    receiver.name = name
  }

  greet = lambda {
    IO.puts("Hello, #{receiver.name}!")
  }
}.new(:World).greet()

Source: (StackOverflow)

simplest rules in treetop not working

I have a treetop grammar with only two rules:

grammar RCFAE
    rule num
        [0-9]+ <Num>
    end

    rule identifier
        [a-zA-Z] [a-zA-Z]* <ID>
    end
end

I'm trying to parse simple strings ("A" and "5"). The "5" is recognized as a Num if I put that rule first, and returns nil if i put that rule second. Similarly, "A" is recognized as an ID if I put that rule first, and returns nil if I put that rule second. I can't understand how these two rules overlap in any way. It's driving me crazy!

Is there something I'm missing or don't understand about treetop or regular expressions? Thanks in advance for your help.


Source: (StackOverflow)

Treetop SGF Parsing

I am currently trying to write a Treetop grammar to parse Simple Game Format files, and have it mostly working so far. However, there are a few questions that have come up.

  1. I am unsure how to actually access the structure Treetop generates after a parse.
  2. Is there a better way to handle capturing all characters than my chars rule?
  3. There is a case for comments that I can't seem to write correctly.

    C[player1 [4k\]: hi player2 [3k\]: hi!]

I can't wrap my head around how to deal with the nested structure of the C[] node with []'s inside them.

The following is my current progress.

sgf-grammar.treetop

grammar SgfGrammar
rule node
	'(' chunk* ')' {
		def value
			text_value
		end
	}
end

rule chunk
	';' property_set* {
		def value
			text_value
		end
	}
end

rule property_set
	property ('[' property_data ']')* / property '[' property_data ']' {
		def value
			text_value
		end
	}
end

rule property_data
	chars '[' (!'\]' . )* '\]' chars / chars / empty {
		def value
			text_value
		end
	}
end

rule property
	[A-Z]+ / [A-Z] {
		def value
			text_value
		end
	}
end

rule chars
	[a-zA-Z0-9_/\-:;|'"\\<>(){}!@#$%^&\*\+\-,\.\?!= \r\n\t]*
end

rule empty
	''
end
end

And my test case, currently excluding C[] nodes with the above mentioned nested bracket problem:

example.rb

require 'rubygems'
require 'treetop'
require 'sgf-grammar'

parser = SgfGrammarParser.new
parser.parse("(;GM[1]FF[4]CA[UTF-8]AP[CGoban:3]ST[2]
RU[Japanese]SZ[19]KM[0.50]TM[1800]OT[5x30 byo-yomi]
PW[stoic]PB[bojo]WR[3k]BR[4k]DT[2008-11-30]RE[B+2.50])")

Source: (StackOverflow)

CSS/HSS Parser in Treetop and Nested Stylesheet Rules

I'm new to Treetop and attempting to write a CSS/HSS parser. HSS augments the basic functionality of CSS with nested styles, variables and a kind of mixin functionality.

I'm pretty close - the parser can handle CSS - but I fall down when it comes to implementing a style within a style. e.g:

#rule #one {
  #two {
    color: red;
  }
  color: blue;
}

I've taken two shots at it, one which handles whitespace and one which doesn't. I can't quite get either to work. The treetop documentation is a little sparse and I really feel like I'm missing something fundamental. Hopefully someone can set me straight.

A:

 grammar Stylesheet

      rule stylesheet
        space* style*
      end

      rule style
        selectors space* '{' space* properties? space* '}' space*
      end

      rule properties
        property space* (';' space* property)* ';'?
      end

      rule property
        property_name space* [:] space* property_value
      end

      rule property_name
        [^:;}]+
      end

      rule property_value
        [^:;}]+
      end

      rule space
        [\t ]
      end

      rule selectors
        selector space* ([,] space* selector)*
      end

      rule selector
        element (space+ ![{] element)*
      end

      rule element
        class / id
      end

      rule id
        [#] [a-zA-Z-]+
      end

      rule class
       [.] [a-zA-Z-]+
      end
end

B:

grammar Stylesheet

  rule stylesheet
   style*
  end

  rule style
    selectors closure
  end

  rule closure
    '{' ( style / property )* '}'
  end

  rule property
    property_name ':' property_value ';'
  end

  rule property_name
    [^:}]+
    <PropertyNode>
  end

  rule property_value
    [^;]+
    <PropertyNode>
  end

  rule selectors
    selector ( !closure ',' selector )*
    <SelectorNode>
  end

  rule selector
    element ( space+ !closure element )*
    <SelectorNode>
  end

  rule element
    class / id
  end

  rule id
    ('#' [a-zA-Z]+)
  end

  rule class
    ('.' [a-zA-Z]+)
  end

  rule space
    [\t ]
  end

end

Harness Code:

require 'rubygems'
require 'treetop'

class PropertyNode < Treetop::Runtime::SyntaxNode
  def value
    "property:(#{text_value})"
  end
end

class SelectorNode < Treetop::Runtime::SyntaxNode
  def value
    "--> #{text_value}"
  end
end

Treetop.load('css')

parser = StylesheetParser.new
parser.consume_all_input = false

string = <<EOS
#hello-there .my-friend {
  font-family:Verdana;
  font-size:12px;
}
.my-friend, #is-cool {
  font: 12px Verdana;
  #he .likes-jam, #very-much {asaads:there;}
  hello: there;
}
EOS

root_node = parser.parse(string)

def print_node(node, output = [])
  output << node.value if node.respond_to?(:value)
  node.elements.each {|element| print_node(element, output)} if node.elements
  output
end

puts print_node(root_node).join("\n") if root_node

#puts parser.methods.sort.join(',')
puts parser.input
puts string[0...parser.failure_index] + '<--'
puts parser.failure_reason
puts parser.terminal_failures

Source: (StackOverflow)

What's wrong with my Treetop grammar?

I have the grammar file alexa_scrape.tt:

grammar AlexaScrape
  rule document
    category_listing*
  end
  rule category_listing
    category_line url_line*
  end
  rule category_line
    category "\n"
  end
  rule category
    ("/" [^/]+)+
  end
  rule url_line
    [0-9]+ ". " url "\n"
  end
  rule url
    [^\n]*
  end
end

I have a ruby file which attempts to make use of it:

#!/usr/bin/env ruby -I .
require 'rubygems'
require 'polyglot'
require 'treetop'
require 'alexa_scrape.tt'

parser = AlexaScrapeParser.new
p( parser.parse("") || parser.failure_reason )
p( parser.parse("/x\n") || parser.failure_reason )

But I'm not getting the results I expected:

SyntaxNode offset=0, ""
"Expected one of /, \n at line 2, column 1 (byte 4) after /x\n"

It parses the empty string properly (as the trivial match for document, zero category_listings), but fails to parse "/x\n" (as the document containing a single category_listing that itself has zero url_lines).

What am I doing wrong?


Source: (StackOverflow)

How can I avoid left-recursion in treetop without backtracking?

I am having trouble avoiding left-recursion in this simple expression parser I'm working on. Essentially, I want to parse the equation 'f x y' into two expressions 'f x' and '(f x) y' (with implicit parentheses). How can I do this while avoiding left-recursion and backtracking? Does there have to be an intermediate step?

#!/usr/bin/env ruby
require 'rubygems'
require 'treetop'
Treetop.load_from_string DATA.read

parser = ExpressionParser.new

p parser.parse('f x y').value

__END__
grammar Expression
   rule equation
      expression (w+ expression)*
   end
   rule expression
      expression w+ atom
   end
   rule atom
      var / '(' w* expression w* ')'
   end
   rule var
      [a-z]
   end
   rule w
      [\s\n\t\r]
   end
end

Source: (StackOverflow)

Treetop parser : Function definition syntax - n arguments

Good morning everyone,

I'm currently trying to describe some basic Ruby grammar but I'm now stuck with function definition. Indeed, I don't know how to handle 'n' argument. Here is the code I use to handle functions containing from 0 to 2 args :

  rule function_definition
    'def' space? identifier space? '(' space? expression? space? ','? expression? space? ')'
      block
    space? 'end' <FunctionDefinition>
  end  

How could I do to handle 'n' argument ? Is there any recursive way to do that ?

Thank you very much, have a nice day.

EDIT :

I wanted to highlight the fact that I need the arguments to be in the result tree. Like :

 Argument offset=42, "arg1"
 Argument offset=43, "arg2"
 Argument offset=44, "arg3"

So I need to do a cstom SyntaxNode Subclass declaration, just like I did for function_definition rule for instance.


Source: (StackOverflow)

Can I use Treetop to parse an IO?

I've got a file that I want to parse with Treetop. If I wanted to parse the entire thing, I'd use

rule document
  category_listing*
end

I don't really want to read the entire file into memory at once. I know I can set up the parser to parse one category_listing at a time (using #consume_all_input = false and #root = :category_listing), which is half the problem. However, it looks like #parse expects to be passed a String (and it certainly fails when I try to pass it a File), which makes the idea of reading and parsing category_listing by category_listing sound like a PITA.

Can Treetop only be used to parse Strings? I've been poking around the treetop docs, but haven't found anything definitive.


Source: (StackOverflow)