Question: I want to build a parser for analyzing a large input file, but I don’t need the entire input file, only some parts of it.
For exmaple, the input file may look like this:
bla bla bla bla bla ...
EVENT: e1
type: t1
version: 1
additional-info: abc
EVENT: e2
type: t2
version: 1
uninteresting-info: def
blu blu blu blu blu ...
From this file, all I want is to have a map of event to type (e1=>t1, e2=>t2). All other information is of no interest for me.
How can I build a simple ANTLR grammar that does this?
Anwer:
You can do that by introducing a boolean flag inside your lexer that keeps track whether an event- or type-keyword has been encountered. If it has been encountered, the lexer should not skip the word, all other words should be skipped.
A small demo:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
grammar T; | |
@lexer::members { | |
private boolean ignoreWord = true; | |
} | |
parse: | |
event* EOF; | |
event: | |
Event w1=Word Type w2=Word | |
{System.out.println(“event=” + w1.text+“,type=”+w2.text);}; | |
Event: | |
'EVENT:' {ignoreWord=false;}; | |
Type: | |
'type:' {ignoreWord=false;}; | |
Word: | |
('a'..'z' | 'A'..'Z' | '0'..'9')+ {if(ignoreWord) skip();}; | |
NewLine: | |
('\r'? '\n' | '\r') {ignoreWord=true; skip();}; | |
Other: | |
.{skip();}; |
You can test the parser with the following class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String src =
"bla bla bla bla bla ... \n" +
" \n" +
"prEVENT: ... \n" +
"EVENTs: ... \n" +
" \n" +
"EVENT: e1 \n" +
"type: t1 \n" +
"version: 1 \n" +
"additional-info: abc \n" +
" \n" +
"EVENT: e2 \n" +
"type: t2 \n" +
"version: 1 \n" +
"uninteresting-info: def \n" +
" \n" +
"blu blu blu blu blu ... \n";
TLexer lexer = new TLexer(new ANTLRStringStream(src));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();
}
}
which will produce the following output:
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
event=e1, type=t1
event=e2, type=t2
From page at http://stackoverflow.com/questions/8313722/skipping-parts-of-the-input-file-in-antlr
string literal
Also, your string rule would probably be better of looking like this:
STRING_LITERAL : ‘”’ (~(‘”’ | ‘\’ | ‘\r’ | ‘\n’) | ‘\’ (‘”’ | ‘\’))* ‘”’;
In other words, the contents of your string is zero or more:
any char other than a quote, backslash or line break: ~(‘”’ | ‘\’ | ‘\r’ | ‘\n’)
or
an escaped quote or backslash ‘\’ (‘”’ | ‘\’)
OR
STRING : ‘”’ (options{greedy=false;}:( ~(‘\’|’”’) | (‘\’ ‘”’)))* ‘”’;
No comments:
Post a Comment