pegkit
'Parsing Expression Grammar' toolkit for Cocoa/Objective-C
I would like to use one grammar definition as an extension point to my own. With Antlr you can import grammar files in your own grammar definition. Is it possible to do the same with peg kit?
Thanks,
Anwar
Source: (StackOverflow)
I have a parser created from the PEGKit (example project here).
I want to pause the parsing, without halting the main thread. Since PEGKit
has infinite backtracking and a knows where the cursor/head is at the input string, it should be possible to resume the parsing.
It would be very helpful, so that I can create a step by step parser. So the parser must wait for a ui action, like a press of a UIButton
.
How to I implement or pause and then resume the parsing?
As an example I would want to pause the parsing when a certain symbol is reached. Here is would be after a ;
(semicolon or EXPRESSIONPARSER_TOKEN_KIND_SEMI_COLON
).
So after the token ;
is should save the state, so I can return and parse from this position.
Code:
- (void)start {
[self main_];
[self matchEOF:YES];
}
- (void)__main {
while ([self speculate:^{ [self expression_]; }]) {
[self expression_];
}
[self fireDelegateSelector:@selector(parser:didMatchMain:)];
}
- (void)__expression {
if ([self speculate:...) {
if ([self predicts:...) {
[self _subExpression];
} else {
[self raise:@"No viable alternative found in rule 'expression'."];
}
}
[self match:EXPRESSIONPARSER_TOKEN_KIND_SEMI_COLON discard:NO];
[self fireDelegateSelector:@selector(parser:didMatchExpression:)];
}
Source: (StackOverflow)
Is it possible to set the grammar to match case insensitively.
so for example a rule:
checkName = 'CHECK' Word;
would match check name
as well as CHECK name
Source: (StackOverflow)
I have actions with custom objects. I'd like to not have to copy and paste all the #imports each time I generate the parser. Is this possible with some @begin
or some other directive.
for example:
mycustomRule: word {
PUSH([[MyCoolNewObject alloc] initWith:POP_STR()]);
};
It all generates perfectly but then when trying to compile obviously the generated file is missing the #import "MyCoolNewObject.h"
Source: (StackOverflow)
Suppose I have a rule:
myCoolRule:
Word
| 'myCoolToken' Word otherRule
I supply as input myCoolToken something else now
it attempts to parse it greedily matches myCoolToken as a word and then hits the something
and says uhhh I expected EOF, if I arrange the rules so it attempts to match myCoolToken
first all is good and parses perfectly, for that input.
I am wondering if it is possible for it to keep trying all the rules in that statement to see if any works. So it matches Word fails, comes back and then tries the next rule.
Here is the actual grammar rules causing problems:
columnName = Word;
typeName = Word;
//accepts CAST and cast
cast = { MATCHES_IGNORE_CASE(LS(1), @"CAST") }? Word ;
checkConstraint = 'CHECK' '('! expr ')'!;
expr = requiredExp optionalExp*;
requiredExp = (columnName
| cast '(' expr as typeName ')'
... more but not important
optionalExp ...not important
The input CHECK( CAST( abcd as defy) )
causes it to fail, even though it is valid
Is there a construct or otherwise to make it verify all rules before giving up.
Source: (StackOverflow)
I am learning how to use PEGKit, but am running into problem with creating a grammar for a script that parses lines, even when they are separated by multiple line break characters. I have reduced the problem to this grammar:
expr
@before {
PKTokenizer *t = self.tokenizer;
self.silentlyConsumesWhitespace = NO;
t.whitespaceState.reportsWhitespaceTokens = YES;
self.assembly.preservesWhitespaceTokens = YES;
}
= Word nl*;
nl = nl_char nl_char*;
nl_char = '\n'! | '\r'!;
This simple grammar to me should allow one word per line, with as many line breaks as necessary. But it only allows one word with an optional line break. Does anybody know what's wrong here? Thank you.
Source: (StackOverflow)
I am trying to build a grammar that will match on substrings of a word and am not having much. luck. I.e. I try to match on the text 'an' which succeeds, but it fails to match on the first two letters of 'and'
expr = phrase*;
phrase = an|text;
an = 'an'
text = Any;
I realize this is a basic example.
Source: (StackOverflow)
Is anyone aware of an existing grammar definition of Mscgen syntax that will work with PegKit? I had a look in the "res" folder but most of those don't seem to work.
Here is a sample
# MSC for some fictional process
msc {
hscale = "2";
a,b,c;
a->b [ label = "ab()" ] ;
b->c [ label = "bc(TRUE)"];
c=>c [ label = "process(1)" ];
c=>c [ label = "process(2)" ];
...;
c=>c [ label = "process(n)" ];
c=>c [ label = "process(END)" ];
a<<=c [ label = "callback()"];
--- [ label = "If more to run", ID="*" ];
a->a [ label = "next()"];
a->c [ label = "ac1()\nac2()"];
b<-c [ label = "cb(TRUE)"];
b->b [ label = "stalled(...)"];
a<-b [ label = "ab() = FALSE"];
}
Source: (StackOverflow)
I'm using PEGKit to generate a parser for an iOS app I am developing. To do so, I need to run a helper app (ParserGenApp) that is distributed with PEGKit to generate the parser source code. I've followed the instruction here:
https://github.com/itod/PEGKitMiniMathTutorial
But whenever I try to build/run the app, I get an error "No signing identity found!"
I have an iOS developer certificate, but not Mac developer certificate. I'm able to build/run other sample mac apps locally.
Source: (StackOverflow)
I need to pass something really simple. Imagine my string is
@"\"one two three\" four five".
I want to get the strings @"one two three"
, @"four"
and @"five"
.
If the string is @"\"invalid string"
, I want to get @"invalid"
and @"string"
.
The library PGEKit
seems to be great but complicated. Could anyone help me to achieve that?
Source: (StackOverflow)
Is it possible to generate .m and .h's for any grammar/ rules so that during parsing it creates an object that represents that rule.
So some grammar
coolObjName = Word;
could generate a class that is named coolObjName (or some variation) and has a field for the word, and generates the action:
coolObjName = Word{
CoolObjName* newName = [[CoolObjName alloc] initWithWord:POP_STR()];
PUSH(newName);
};
Then a higher level rule such as:
myhigherlevel = coolObjName Number;
would create a myHigherLevel class that has a coolObjName member and a number, which then adds the action:
myhigherlevel = coolObjName Number{
double num = POP_DOUBLE();
coolObjName* name = POP();
MyHigherLevel* higherLevel = [[MyHigherLevel alloc] init];
higherLevel.number = num;
higherLevel.name = name;
PUSH(higherLevel);
};
Empty tags turn to empty objects and *
and +
result in arrays.
Is there a tool that can do this or where would I go to create such. (seems super useful and awesome)
Source: (StackOverflow)
This is the second question related to Custom objects in ParseKit Actions
If I had a grammar rule such as:
qualifiedTableName = (databaseName '.')? tableName (('INDEXED' 'BY' indexName) | ('NOT' 'INDEXED'))?;
Is it correct to assume that the action would not be called until the rule had been matched? So in this case when the action is called to the stack could look like:
possibly:
|'INDEXED'
|'NOT'
or:
|indexName (A custom object possibly)
|'BY'
|'INDEXED
|tableName (for sure will be here)
and possibly these
|'.' (if this is here I know the database name must be here) if not push last one on?
|databaseName
--------------(perhaps more things from other rules)
Are these correct assessments? Is there any other documentation on actions? I know it is heavily based on Antlr but its the subtle differences that can really get you in trouble.
Source: (StackOverflow)
I'm working on some code which uses PegKit and I've hit something I'm not sure how to figure out. I have a syntax that looks like this (simplified):
expr = runtimeExpr | objectExpr;
runtimeExpr = is? runtimeObject;
objectExpr = runtimeObject keyPath;
runtimeObject = '[' string ']';
is = 'is';
keyPath = string;
I'm looking for the following results:
[abc] -> runtime expr.
is [abc] -> runtime expr.
[abc].def -> object expr.
However what is occurring is the generated parser code looks like this:
if ([self predicts:STLOGEXPRESSIONPARSER_TOKEN_KIND_IS, 0]) {
[self runtimeExpr_];
} else if ([self predicts:STLOGEXPRESSIONPARSER_TOKEN_KIND_OPEN_BRACKET, 0]) {
[self objectExpr_];
}
Which effective says that in order to parse a runtime expr, it has to start with 'is'. Which means that [abc]
is being passed as a object expr instead.
So what i need help with is understanding how to express this logic in the grammar syntax:
If the string starts with a 'is', followed by a runtimeObject, or is only an runtimeObject, then process it as a runtimeExpr.
Otherwise process it as an objectExpr.
Source: (StackOverflow)
I am very intrigued by the ability to add actions to ParseKit grammars. There is surprisingly little documentation on what is available in those actions. Say I have two rules like:
databaseName = Word;
createTableStmt ='CREATE' ('TEMP'| 'TEMPORARY')? 'TABLE' 'IF NOT EXISTS'? databaseName;
This obviously isn't a whole grammar but will serve as an example. When parsing i'd like to "return" a CreateTableStmt
object that has certain properties. If I understand the tool correctly i'd add an action to the rule, do stuff then push it on the assembly which will carry it around for the next rule to deal with or use.
So for example it would look like:
createTableStmt ='CREATE' ('TEMP'| 'TEMPORARY')? 'TABLE' 'IF NOT EXISTS'? databaseName;
{
AnotherObj* dbName = Pop(); //gives me the top most object
CreateTableStmt* createTable = [[CreateTableStmt alloc] initWith:dbName];
//set if it was temporary
// set 'IF NOT EXISTS'
PUSH(createTable);//push back on stack for next rule to use
}
Then when everything is parsed I can just get that root object off the stack and it is a fully instantiated custom representation of the grammar. Somewhat like building an AST if i remember correctly. I can then do stuff with that representation much easier than with the passed in string.
My question is how can I see if it matched ('TEMP' | 'TEMPORARY')
so I can set the value. Are those tokens on the stack? Is there a better way than to pop back to the 'CREATE' and see if we passed it. Should I be popping back to the bottom of the stack anyway on each match?
Also if my rule was instead
qualifiedTableName = (databaseName '.')? tableName (('INDEXED' 'BY' indexName) | ('NOT' 'INDEXED'))?;
Is it correct to assume that the action would not be called until the rule had been matched? So in this case when the action is called to the stack could look like:
possibly:
|'INDEXED'
|'NOT'
or:
|indexName (A custom object possibly)
|'BY'
|'INDEXED
|tableName (for sure will be here)
and possibly these
|'.' (if this is here I know the database name must be here) if not push last one on?
|databaseName
--------------(perhaps more things from other rules)
Are these correct assessments? Is there any other documentation on actions? I know it is heavily based on Antlr but its the subtle differences that can really get you in trouble.
Source: (StackOverflow)