treetop
A Ruby-based parsing DSL based on parsing expression grammars.
How explicit do I need to be when specifying were whitespace is or is not allowed? For instance would these rules:
rule lambda
'lambda' ( '(' params ')' )? block
end
rule params
# ...
end
rule block
'{' # ... '}'
end
be sufficient to match
lambda {
}
Basically do I need to specify everywhere optional whitespace may appear?
Source: (StackOverflow)
How would I do something like this in Treetop?
/.+?;/
It seems like the only way is to do:
[^;]+ ';'
Which is kind of ugly.. any other way? .+?
doesn't seem to work..
Source: (StackOverflow)
How would you write a Parsing Expression Grammar in any of the following Parser Generators (PEG.js, Citrus, Treetop) which can handle Python/Haskell/CoffeScript style indentation:
Examples of a not-yet-existing programming language:
square x =
x * x
cube x =
x * square x
fib n =
if n <= 1
0
else
fib(n - 2) + fib(n - 1) # some cheating allowed here with brackets
Update:
Don't try to write an interpreter for the examples above. I'm only interested in the indentation problem. Another example might be parsing the following:
foo
bar = 1
baz = 2
tap
zap = 3
# should yield (ruby style hashmap):
# {:foo => { :bar => 1, :baz => 2}, :tap => { :zap => 3 } }
Source: (StackOverflow)
Has anyone seen a vim indent file for treetop, the Ruby parser/generator? I've found a vim syntax highlighting file, but haven't seen one for indentation.
Source: (StackOverflow)
I'm trying to teach myself Ruby's Treetop grammar generator. I am finding that not only is the documentation woefully sparse for the "best" one out there, but that it doesn't seem to work as intuitively as I'd hoped.
On a high level, I'd really love a better tutorial than the on-site docs or the video, if there is one.
On a lower level, here's a grammar I cannot get to work at all:
grammar SimpleTest
rule num
(float / integer)
end
rule float
(
(( '+' / '-')? plain_digits '.' plain_digits) /
(( '+' / '-')? plain_digits ('E' / 'e') plain_digits ) /
(( '+' / '-')? plain_digits '.') /
(( '+' / '-')? '.' plain_digits)
) {
def eval
text_value.to_f
end
}
end
rule integer
(( '+' / '-' )? plain_digits) {
def eval
text_value.to_i
end
}
end
rule plain_digits
[0-9] [0-9]*
end
end
When I load it and run some assertions in a very simple test object, I find:
assert_equal @parser.parse('3.14').eval,3.14
Works fine, while
assert_equal @parser.parse('3').eval,3
raises the error: NoMethodError: private method `eval' called for #
If I reverse integer and float on the description, both integers and floats give me this error. I think this may be related to limited lookahead, but I cannot find any information in any of the docs to even cover the idea of evaluating in the "or" context
A bit more info that may help. Here's pp information for both those parse() blocks.
The float:
SyntaxNode+Float4+Float0 offset=0, "3.14" (eval,plain_digits):
SyntaxNode offset=0, ""
SyntaxNode+PlainDigits0 offset=0, "3":
SyntaxNode offset=0, "3"
SyntaxNode offset=1, ""
SyntaxNode offset=1, "."
SyntaxNode+PlainDigits0 offset=2, "14":
SyntaxNode offset=2, "1"
SyntaxNode offset=3, "4":
SyntaxNode offset=3, "4"
The Integer... note that it seems to have been defined to follow the integer rule, but not caught the eval() method:
SyntaxNode+Integer0 offset=0, "3" (plain_digits):
SyntaxNode offset=0, ""
SyntaxNode+PlainDigits0 offset=0, "3":
SyntaxNode offset=0, "3"
SyntaxNode offset=1, ""
Update:
I got my particular problem working, but I have no clue why:
rule integer
( '+' / '-' )? plain_digits
{
def eval
text_value.to_i
end
}
end
This makes no sense with the docs that are present, but just removing the extra parentheses made the match include the Integer1 class as well as Integer0. Integer1 is apparently the class holding the eval() method. I have no idea why this is the case.
I'm still looking for more info about treetop.
Source: (StackOverflow)
I'm trying to use Treetop to parse an ERB file. I need to be able to handle lines like the following:
<% ruby_code_here %>
<%= other_ruby_code %>
Since Treetop is written in Ruby, and you write Treetop grammars in Ruby, is there already some existing way in Treetop to say "hey, look for Ruby code here, and give me its breakdown" without me having to write out separate rules to handle all parts of the Ruby language? I'm looking for a way, in my .treetop
grammar file, to have something like:
rule erb_tag
"<%" ruby_code "%>" {
def content
...
end
}
end
Where ruby_code
is handled by some rules that Treetop provides.
Edit: someone else parsed ERB using Ruby-lex, but I got errors trying to reproduce what he did. The rlex program did not produce a full class when it generated the parser class.
Edit: right, so you lot are depressing, but thanks for the info. :) For my Master's project, I'm writing a test case generator that needs to work with ERB as input. Fortunately, for my purposes, I only need to recognize a few things in the ERB code, such as if
statements and other conditionals as well as loops. I think I can come up with Treetop grammar to match that, with the caveat that it isn't complete for Ruby.
Source: (StackOverflow)
I have a grammar file for a new general-purpose programming language I'm trying to build. I'm trying to make the language robust and natural to use (it is heavily inspired by Ruby, among others), and in doing so I have introduced some left-recursive rules.
I've seen some examples that seem to indicate the following left-recursive rule:
rule l_recurse
l_recurse / 'something else'
end
can be made non-left-recursive by changing it to:
rule r_recurse
'something else' / r_recurse
end
To me this looks like it would have the a different problem and would still fail. Am I right, or will this "just work"?
The specific left-recursions I'm trying to (find and) eliminate are found in this grammar file. I'm not sure which rules are affected, but at least some were pointed out to have left-recursion. (By the way I have tried to eliminate the specific range issue he mentioned by tightening up range's rule.)
Source: (StackOverflow)
I have had some ideas for a new programming language floating around in my head, so I thought I'd take a shot at implementing it. A friend suggested I try using Treetop (the Ruby gem) to create a parser. Treetop's documentation is sparse, and I've never done this sort of thing before.
My parser is acting like it has an infinite loop in it, but with no stack traces; it is proving difficult to track down. Can somebody point me in the direction of an entry-level parsing/AST guide? I really need something that list rules, common usage etc for using tools like Treetop. My parser grammer is on GitHub, in case someone wishes to help me improve it.
class {
initialize = lambda (name) {
receiver.name = name
}
greet = lambda {
IO.puts("Hello, #{receiver.name}!")
}
}.new(:World).greet()
Source: (StackOverflow)
I have a treetop grammar with only two rules:
grammar RCFAE
rule num
[0-9]+ <Num>
end
rule identifier
[a-zA-Z] [a-zA-Z]* <ID>
end
end
I'm trying to parse simple strings ("A" and "5"). The "5" is recognized as a Num if I put that rule first, and returns nil if i put that rule second. Similarly, "A" is recognized as an ID if I put that rule first, and returns nil if I put that rule second. I can't understand how these two rules overlap in any way. It's driving me crazy!
Is there something I'm missing or don't understand about treetop or regular expressions? Thanks in advance for your help.
Source: (StackOverflow)
I am currently trying to write a Treetop grammar to parse Simple Game Format files, and have it mostly working so far. However, there are a few questions that have come up.
- I am unsure how to actually access the structure Treetop generates after a parse.
- Is there a better way to handle capturing all characters than my chars rule?
There is a case for comments that I can't seem to write correctly.
C[player1 [4k\]: hi player2 [3k\]: hi!]
I can't wrap my head around how to deal with the nested structure of the C[] node with []'s inside them.
The following is my current progress.
sgf-grammar.treetop
grammar SgfGrammar
rule node
'(' chunk* ')' {
def value
text_value
end
}
end
rule chunk
';' property_set* {
def value
text_value
end
}
end
rule property_set
property ('[' property_data ']')* / property '[' property_data ']' {
def value
text_value
end
}
end
rule property_data
chars '[' (!'\]' . )* '\]' chars / chars / empty {
def value
text_value
end
}
end
rule property
[A-Z]+ / [A-Z] {
def value
text_value
end
}
end
rule chars
[a-zA-Z0-9_/\-:;|'"\\<>(){}!@#$%^&\*\+\-,\.\?!= \r\n\t]*
end
rule empty
''
end
end
And my test case, currently excluding C[] nodes with the above mentioned nested bracket problem:
example.rb
require 'rubygems'
require 'treetop'
require 'sgf-grammar'
parser = SgfGrammarParser.new
parser.parse("(;GM[1]FF[4]CA[UTF-8]AP[CGoban:3]ST[2]
RU[Japanese]SZ[19]KM[0.50]TM[1800]OT[5x30 byo-yomi]
PW[stoic]PB[bojo]WR[3k]BR[4k]DT[2008-11-30]RE[B+2.50])")
Source: (StackOverflow)
I'm new to Treetop and attempting to write a CSS/HSS parser. HSS augments the basic functionality of CSS with nested styles, variables and a kind of mixin functionality.
I'm pretty close - the parser can handle CSS - but I fall down when it comes to implementing a style within a style. e.g:
#rule #one {
#two {
color: red;
}
color: blue;
}
I've taken two shots at it, one which handles whitespace and one which doesn't. I can't quite get either to work. The treetop documentation is a little sparse and I really feel like I'm missing something fundamental. Hopefully someone can set me straight.
A:
grammar Stylesheet
rule stylesheet
space* style*
end
rule style
selectors space* '{' space* properties? space* '}' space*
end
rule properties
property space* (';' space* property)* ';'?
end
rule property
property_name space* [:] space* property_value
end
rule property_name
[^:;}]+
end
rule property_value
[^:;}]+
end
rule space
[\t ]
end
rule selectors
selector space* ([,] space* selector)*
end
rule selector
element (space+ ![{] element)*
end
rule element
class / id
end
rule id
[#] [a-zA-Z-]+
end
rule class
[.] [a-zA-Z-]+
end
end
B:
grammar Stylesheet
rule stylesheet
style*
end
rule style
selectors closure
end
rule closure
'{' ( style / property )* '}'
end
rule property
property_name ':' property_value ';'
end
rule property_name
[^:}]+
<PropertyNode>
end
rule property_value
[^;]+
<PropertyNode>
end
rule selectors
selector ( !closure ',' selector )*
<SelectorNode>
end
rule selector
element ( space+ !closure element )*
<SelectorNode>
end
rule element
class / id
end
rule id
('#' [a-zA-Z]+)
end
rule class
('.' [a-zA-Z]+)
end
rule space
[\t ]
end
end
Harness Code:
require 'rubygems'
require 'treetop'
class PropertyNode < Treetop::Runtime::SyntaxNode
def value
"property:(#{text_value})"
end
end
class SelectorNode < Treetop::Runtime::SyntaxNode
def value
"--> #{text_value}"
end
end
Treetop.load('css')
parser = StylesheetParser.new
parser.consume_all_input = false
string = <<EOS
#hello-there .my-friend {
font-family:Verdana;
font-size:12px;
}
.my-friend, #is-cool {
font: 12px Verdana;
#he .likes-jam, #very-much {asaads:there;}
hello: there;
}
EOS
root_node = parser.parse(string)
def print_node(node, output = [])
output << node.value if node.respond_to?(:value)
node.elements.each {|element| print_node(element, output)} if node.elements
output
end
puts print_node(root_node).join("\n") if root_node
#puts parser.methods.sort.join(',')
puts parser.input
puts string[0...parser.failure_index] + '<--'
puts parser.failure_reason
puts parser.terminal_failures
Source: (StackOverflow)
I have the grammar file alexa_scrape.tt
:
grammar AlexaScrape
rule document
category_listing*
end
rule category_listing
category_line url_line*
end
rule category_line
category "\n"
end
rule category
("/" [^/]+)+
end
rule url_line
[0-9]+ ". " url "\n"
end
rule url
[^\n]*
end
end
I have a ruby file which attempts to make use of it:
#!/usr/bin/env ruby -I .
require 'rubygems'
require 'polyglot'
require 'treetop'
require 'alexa_scrape.tt'
parser = AlexaScrapeParser.new
p( parser.parse("") || parser.failure_reason )
p( parser.parse("/x\n") || parser.failure_reason )
But I'm not getting the results I expected:
SyntaxNode offset=0, ""
"Expected one of /, \n at line 2, column 1 (byte 4) after /x\n"
It parses the empty string properly (as the trivial match for document
, zero category_listing
s), but fails to parse "/x\n"
(as the document containing a single category_listing
that itself has zero url_line
s).
What am I doing wrong?
Source: (StackOverflow)
I am having trouble avoiding left-recursion in this simple expression parser I'm working on. Essentially, I want to parse the equation 'f x y' into two expressions 'f x' and '(f x) y' (with implicit parentheses). How can I do this while avoiding left-recursion and backtracking? Does there have to be an intermediate step?
#!/usr/bin/env ruby
require 'rubygems'
require 'treetop'
Treetop.load_from_string DATA.read
parser = ExpressionParser.new
p parser.parse('f x y').value
__END__
grammar Expression
rule equation
expression (w+ expression)*
end
rule expression
expression w+ atom
end
rule atom
var / '(' w* expression w* ')'
end
rule var
[a-z]
end
rule w
[\s\n\t\r]
end
end
Source: (StackOverflow)
Good morning everyone,
I'm currently trying to describe some basic Ruby grammar but I'm now stuck with function definition. Indeed, I don't know how to handle 'n' argument. Here is the code I use to handle functions containing from 0 to 2 args :
rule function_definition
'def' space? identifier space? '(' space? expression? space? ','? expression? space? ')'
block
space? 'end' <FunctionDefinition>
end
How could I do to handle 'n' argument ? Is there any recursive way to do that ?
Thank you very much, have a nice day.
EDIT :
I wanted to highlight the fact that I need the arguments to be in the result tree. Like :
Argument offset=42, "arg1"
Argument offset=43, "arg2"
Argument offset=44, "arg3"
So I need to do a cstom SyntaxNode Subclass declaration, just like I did for function_definition rule for instance.
Source: (StackOverflow)
I've got a file that I want to parse with Treetop. If I wanted to parse the entire thing, I'd use
rule document
category_listing*
end
I don't really want to read the entire file into memory at once. I know I can set up the parser to parse one category_listing
at a time (using #consume_all_input = false
and #root = :category_listing
), which is half the problem. However, it looks like #parse
expects to be passed a String
(and it certainly fails when I try to pass it a File
), which makes the idea of reading and parsing category_listing
by category_listing
sound like a PITA.
Can Treetop only be used to parse String
s? I've been poking around the treetop docs, but haven't found anything definitive.
Source: (StackOverflow)