Datalog Parser
Note: Projects are to be completed by each student individually (not by groups of students).
Note: Projects must be passed-off on the pass-off website or with a TA to be given credit.
Write a parser that reads a datalog program from a text file, builds a data structure that represents the datalog program, and outputs the contents of the datalog program data structure.
Example Input
Schemes: snap(S,N,A,P) HasSameAddress(X,Y) Facts: snap('12345','C. Brown','12 Apple','555-1234'). snap('33333','Snoopy','12 Apple','555-1234'). Rules: HasSameAddress(X,Y) :- snap(A,X,B,C),snap(D,Y,B,E). Queries: HasSameAddress('Snoopy',Who)?
Example Output
Success! Schemes(2): snap(S,N,A,P) HasSameAddress(X,Y) Facts(2): snap('12345','C. Brown','12 Apple','555-1234'). snap('33333','Snoopy','12 Apple','555-1234'). Rules(1): HasSameAddress(X,Y) :- snap(A,X,B,C),snap(D,Y,B,E). Queries(1): HasSameAddress('Snoopy',Who)? Domain(6): '12 Apple' '12345' '33333' '555-1234' 'C. Brown' 'Snoopy'
Testing
Here are some ideas for tests.
- A program with extra tokens at the end of the file.
- A program that uses the wrong punctuation.
- A program that has missing punctuation.
- A program with empty fact or rule lists.
- A program with empty scheme or query lists.
- A program with schemes, facts, rules, queries in the wrong order.
Design
You will build a datalog interpreter in later projects. The datalog interpreter will need to access the datalog program data structure created by the parser. The data structures should be designed such that the datalog interpreter is able to easily get the lists of schemes, facts, rules and queries from the parser.
The Datalog Grammar
The Datalog Grammar defines valid datalog programs. Use the grammar to write a recursive-descent parser for datalog programs. Use the scanner from the previous project to provide input tokens for the parser.
The nonterminals in the grammar begin with a lower-case letter. Terminal symbols in the grammar are written in all upper-case letters. The word 'lambda' in the grammar represents the empty string.
Note that comments do not appear in the grammar because comments should be ignored. An easy way to ignore comments is to modify the scanner to stop outputting comments.
datalogProgram -> SCHEMES COLON scheme schemeList FACTS COLON factList RULES COLON ruleList QUERIES COLON query queryList END schemeList -> scheme schemeList | lambda factList -> fact factList | lambda ruleList -> rule ruleList | lambda queryList -> query queryList | lambda scheme -> ID LEFT_PAREN ID idList RIGHT_PAREN fact -> ID LEFT_PAREN STRING stringList RIGHT_PAREN PERIOD rule -> headPredicate COLON_DASH predicate predicateList PERIOD query -> predicate Q_MARK headPredicate -> ID LEFT_PAREN ID idList RIGHT_PAREN predicate -> ID LEFT_PAREN parameter parameterList RIGHT_PAREN predicateList -> COMMA predicate predicateList | lambda parameterList -> COMMA parameter parameterList | lambda stringList -> COMMA STRING stringList | lambda idList -> COMMA ID idList | lambda parameter -> STRING | ID
Datalog Program Data Structures
A datalog program consists of lists of schemes, facts, rules and queries. You are encouraged to use the 'vector' class from the standard library for these lists. Please do not write your own list class.
You need to store the information for the schemes, facts, rules, and queries in the lists. Because schemes, facts, and queries all have the same form as predicates, one class named Predicate can be used to represent all of these types of objects. You don't need classes named Scheme, Fact, and Query. Schemes, facts, and queries can (and should) be stored in Predicate objects. You will also find a Rule class helpful to store a head Predicate and a list of body Predicates.
How might you design the data members in the Predicate class? Here's an example of a predicate:
child('Ned', X)
What do you need to store in a Predicate object? A predicate has a name ('child' in the predicate above) and a list of parameters ('Ned' and 'X' in the predicate above).
What's an outline for the data members in a Predicate object?
class Predicate { ... string name; vector<Parameter> parameters; ... };
You are required to have the following classes: DatalogProgram, Rule, Predicate, and Parameter. You may also have others, but you must have these.
Write a 'toString' method for each of these classes so that you can easily print them out. Return a DatalogProgram object from the parser and then traverse this structure and use 'toString' as needed to print the required output.
To integrate this project with later projects, be sure to have a way to get the lists of schemes, facts, rules and queries out of the DatalogProgram.
Output Format
If the parse is successful, output 'Success!' followed by the lists of schemes, facts, rules, queries in the datalog program. Output the number of items in each list as shown in the example output. Indent each list item by 2 spaces. Output the schemes, facts, rules, and queries in the same order in which they appear in the input.
Also output the set of the domain values that appear in the datalog program. The domain values are the strings (surrounded by quotes) that appear in the facts. Don't include strings in the domain if they only appear in the rules or queries sections. The domain only includes strings found in the facts section. Note that the domain is a set of strings. (The same string is not printed more than once.) Output the domain strings in sorted order. (Sort the strings based on the characters inside the strings, not including the quotes.) You should use the 'set' class from the standard libary to hold the domain strings.
Syntax Errors
If the parse is unsuccessful, output 'Failure!' followed by the offending token. Output the type, value, and line number of the token surrounded by parentheses and indented by 2 spaces on a new line of output. Note that the parser stops after encountering the first offending token.
Example Input
Schemes: snap(S,N,A,P) NameHasID(N,S) Facts: snap('12345','C. Brown','12 Apple','555-1234'). snap('67890','L. Van Pelt','34 Pear','555-5678'). Rules: NameHasID(N,S) :- snap(S,N,A,P)? Queries: NameHasID('Snoopy',Id)?
Example Output
Failure! (Q_MARK,"?",10)
Implementation Requirements
- You must implement a deterministic top-down parser that chooses the rule to expand based on the current token.
- You must implement the parser using the recursive-descent approach.
- You must use the scanner from the previous project to read tokens from the input.
- You must create classes for DatalogProgram, Rule, Predicate, and Parameter to hold the internal representation of a datalog program. Each class must have a toString method and the output of the program must be formed by these toString methods (not by the parse methods).
- The parser must run to completion with a normal exit status for any input. Do not terminate with a non-zero exit status for any input, including inputs that have errors.
- What strings are included in the domain? The domain consists of all strings found in the facts. It does not include strings found only in the rules or queries.
Implementation Suggestions
-
What's a good way to organize your work on this project?
A good approach is to complete the project in three steps:
- First: write a parser that only checks syntax. (Does not build any data structures.)
- Second: write classes for data structures. (Rule, Predicate, Parameter, etc)
- Third: add code to the parser to create data structures. For example when a parameter is being parsed a Parameter object is created and added to the current Predicate.
-
What's a good way to handle syntax errors?
A good approach is to use exceptions to handle syntax errors. Throw an exception when the parser detects a syntax error. Catch the exception at the top level of the parser to report either success or failure. (Don't return a boolean from each parsing routine to indicate a syntax error because then each return value must be checked resulting in more complicated code.)
-
Why do you need the Parameter, Predicate, Rule, and Datalog Program classes, and what should be in these classes?
You need these classes because the next project builds on this project and these classes are needed for the next project.
To determine what needs to be in these classes, use the grammar as a guide.
For example, in the grammar a predicate is defined as: ID LEFT_PAREN parameter parameterList RIGHT_PAREN. This means a predicate object needs to store an ID and a list of parameters.
-
Should you store token types in the data structures created by the parser?
No, don't store token types in the data structures created by the parser. The data structures created by the parser should be decoupled from both the scanner and the parser.