The Relational Database

Note: Projects are to be completed by each student individually (not by groups of students).

Note: Projects must be passed-off on the pass-off website or with a TA to be given credit.

Write an interpreter that uses relational database operations to evaluate the queries in a Datalog Program. For this project, use only the facts to evaluate the queries. (The evaluation of rules will be added in the next project.)

Example Input

Schemes:
  SK(A,B)
Facts:
  SK('a','c').
  SK('b','c').
  SK('b','b').
  SK('b','c').
Rules:

Queries:
  SK(X,'c')?
  SK('b','c')?
  SK(X,X)?
  SK(X,Y)?
  SK('c','c')?

Example Output

SK(X,'c')? Yes(2)
  X='a'
  X='b'
SK('b','c')? Yes(1)
SK(X,X)? Yes(1)
  X='b'
SK(X,Y)? Yes(3)
  X='a', Y='c'
  X='b', Y='b'
  X='b', Y='c'
SK('c','c')? No

The Database Classes

The solution for this project has two main parts, (1) a simple relational database and (2) an interpreter that uses the database to evaluate queries.

A relational database is a collection of relations. A relation has a name, a scheme, and a set of tuples. A scheme is a list of attribute names. A tuple is a list of attribute values. Relations are used as operands in relational operations such as select, project, and rename.

In this project a Datalog program is represented by a database. Each scheme in the program defines a relation in the database. Each fact in the program defines a tuple in a relation.

Write classes that implement a simple relational database. Your design must include at least the following classes: Database, Relation, Scheme, and Tuple. Provide functions in the Relation class for each of the relational operations (select, project, and rename). Each of these functions operates on an existing relation and returns a new Relation (the result of the operation).

The Relation class must use a set data type to hold the tuples in the relation.

Test your database classes before combining them with the query interpreter described in the next section.

Decouple the database classes from the Datalog classes. For example, don't have Tuples that contain Parameters.

Datalog Program Evaluation

Use the relational database classes to evaluate the Datalog program using these major steps:

Evaluate the schemes.
Evaluate the facts.
Evaluate the queries.

We recommend you create a class named Interpreter. An Interpreter holds a Database object and a Datalog Program object, as well as the code for evaluating the Datalog Program using the Database.

Evaluating Schemes

Start with an empty Database. For each scheme in the Datalog program, add an empty Relation to the Database. Use the scheme name as the name of the relation and the attribute list from the scheme as the scheme of the relation.

Evaluating Facts

For each fact in the Datalog program, add a Tuple to a Relation. Use the predicate name from the fact to determine the Relation to which the Tuple should be added. Use the values listed in the fact to provide the values for the Tuple.

Evaluating Queries

For each query in the Datalog program, use a sequence of select, project, and rename operations on the Database to evaluate the query. Evaluate the queries in the order given in the input.

Get the Relation from the Database with the same name as the predicate name in the query.
Use one or more select operations to select the tuples from the Relation that match the query. Iterate over the parameters of the query: If the parameter is a constant, select the tuples from the Relation that have the same value as the constant in the same position as the constant. If the parameter is a variable and the same variable name appears later in the query, select the tuples from the Relation that have the same value in both positions where the variable name appears.
After selecting the matching tuples, use the project operation to keep only the columns from the Relation that correspond to the positions of the variables in the query. Make sure that each variable name appears only once in the resulting relation. If the same name appears more than once, keep the first column where the name appears and remove any later columns where the same name appears. (This makes a difference when there are other columns in between the ones with the same name.)
After projecting, use the rename operation to rename the scheme of the Relation to the names of the variables found in the query.

The operations must be done in the order described above: any selects, followed by a project, followed by a rename.

Query Evaluation Output

For each query, output the query and a space. If the relation resulting from evaluating the query is empty, output 'No'. If the resulting relation is not empty, output 'Yes(n)' where n is the number of tuples in the resulting relation.

If there are variables in the query, output the tuples from the resulting relation.

Output each tuple on a separate line as a comma-space-separated list of pairs. Each pair has the form N='V', where N is the attribute name from the scheme and V is the value from the tuple. Output the name-value pairs in the same order as the variable names appear in the query. Indent the output of each tuple by two spaces.

Output the tuples in sorted order. Sort the tuples alphabetically based on the values in the tuples. Sort first by the value in the first position and if needed up to the value in the nth position.

Assumptions

You may assume the following about the Datalog program:

All schemes, facts, rules, and queries with the same name will have the same number of parameters.
No two attributes in the same scheme will have the same name.
No two schemes in the scheme list will have the same name.
Every relation referenced in a fact, rule, or query will have been defined in the scheme section.

Implementation Requirements

You are required to use a relational database approach in solving the problem in this project. You must create tuples in relations and evaluate queries using select, project, and rename operations.
Your solution must include the following classes: Database, Relation, Scheme, and Tuple.
The Relation class must use a set data type to hold the tuples in the relation.
The relational operations (select, project, and rename) must be implemented as functions in the Relation class.
Queries must be evaluated using relational operations (select, project, and rename).
The relational operators must not produce relations with duplicate tuples or duplicate names in the scheme.

Implementation Suggestions

How does the select operation work?

It is useful to have two select operations.
1. One version of select finds all the tuples that have a constant value in a certain column.
  
  This select function could take two parameters, a position and a value. If the given position in a tuple is equal to the given value, include the tuple in the result.
2. Another version of select finds all the tuples that have the same value in two different columns (it doesn't matter what the value is, as long as both columns have the same value).
  
  This select function could take two parameters that are both positions. If the two given positions in a tuple have equal values, include the tuple in the result.
The select operation does not change the scheme of the relation. The scheme of the relation resulting from the select is the same as the scheme of the original relation.
How does the project operation work?

The project operation changes the number and order of columns in a relation. The resulting relation may have either the same number or fewer columns. Project changes the scheme and all the tuples in the relation.

The parameter to the project function could be a list of the positions of the columns that should be included in the result.
Why does the project need to be able to change the order of the columns?

The project function needs to be able to change the order of the columns in a relation to support evaluating rules in the next project. Changing the order of the columns is not needed for evaluating the queries in this project.
How does the rename operation work?

The rename operation changes the scheme of the relation. The resulting relation has the same tuples as the original.
Should rename change one attribute at a time, or replace the entire list of attributes at once?

Replacing the entire list of attributes is easier and avoids issues with name conflicts.
What should select, project, and rename return?

These functions should create a new relation that holds the scheme and tuples resulting from the operation. They should return this new relation. They should not modify the original relation. The new relation will have the same name as the original, but the scheme and tuples will change depending on the operation.
When a query has two variables with different names, do the values given to the variables need to be different?

No. Two variables with different names can be given the same value.

Consider the query from the example Datalog program:
```
SK(X,Y)?
```
One of the answers for this query is:
```
X='b', Y='b'
```

Computer Science 236