Computer Science 236

Datalog Scanner Lab Session


During this lab session you will write some of the code you will need to complete your Datalog Scanner. You will need to write additional code after the lab session to fully complete the Scanner. (Note that the files Token.h, Scanner.h, and main.cpp mentioned in the steps below will be created by you from scratch. There is no code provided for this lab.)


Part 1: Tokens


  1. Make a 'TokenType' enum (Token.h)
    What other names (besides COMMA) do you need to add to the enum?
    (note that additional names need to be added in place of the '...')

    enum TokenType {
      COMMA, ...
    };
    

  2. Make a 'Token' class (Token.h)
    What other variables (besides type) need to be stored in a Token?
    (note that additional variables need to be added in place of the '...')

    class Token {
     private:
      TokenType type;
      ...
    };
    

  3. Add '#pragma once' at the top of the Token.h file to avoid multiple includes of the file. You are welcome to use #include guards (rather than #pragma once) if you prefer.

    #pragma once
    

  4. Add a 'Token' class constructor (Token.h)
    (note that additional variables need to be added in place of the '...')

     public:
      Token(TokenType type, ...) : type(type), ... { }
    

  5. Add a 'toString' function to the 'Token' class (Token.h)
    (note that you need '#include' for 'string' and 'sstream')

      string toString() const {
        stringstream out;
        out << "(" << type << "," << "\"" << value << "\"" << "," << line << ")";
        return out.str();
      }
    

  6. Write a 'main' function that creates and prints a 'Token' (main.cpp)
    (note that you need '#include' for 'iostream' and 'Token.h')

    int main() {
      Token t = Token(COMMA, ",", 2);
      cout << t.toString() << endl;
    }
    

    Compile and test. The output should look something like this:

    (0,",",2)
    
  7. Fix 'TokenType' printing with a 'typeName' function (Token.h)
    Note in the previous steps that 'toString' doesn't print the string "COMMA" but prints a number instead. The purpose of 'typeName' is to return the correct string for each 'TokenType' value, so 'toString' can print the correct string. For example, if COMMA is passed to 'typeName' it should return the string "COMMA". Fill in the code for the 'typeName' function. Change 'toString' to call 'typeName'.

      string typeName(TokenType type) const {
        // return the correct string for each TokenType value
      }
    

    Compile and test. The output should look something like this:

    (COMMA,",",2)
    
  8. Take a screenshot showing your terminal and the resulting output. (You can also take a screenshot of an IDE showing similar results.)


Part 2: Scanning


  1. Make a 'Scanner' class (Scanner.h)
    What other variables (besides input) need to be stored in a Scanner?
    (note that additional variables need to be added in place of the '...')
    (Use '#pragma once' to avoid multiple includes of Scanner.h)

    class Scanner {
     private:
      string input;
      ...
    };
    

  2. Add a 'Scanner' class constructor (Scanner.h)

     public:
      Scanner(const string& input) : input(input) { }
    

  3. Add a 'scanToken' function in the 'Scanner' class (Scanner.h)
    (note this is a stub function that doesn't really scan tokens yet)

      Token scanToken() {
        TokenType type = COMMA;
        string value = ",";
        int line = 4;
        return Token(type, value, line);
      }
    

  4. Modify the 'main' function to create a 'Scanner' and print the result of calling scanToken (main.cpp)

    int main() {
      Scanner s = Scanner("  ,  ,  ");
      Token t = s.scanToken();
      cout << t.toString() << endl;
    }
    

    Compile and test. The output should look something like this:

    (COMMA,",",4)
    
  5. Add 'white space' skipping to 'scanToken' (Scanner.h)
    Add code to 'scanToken' to remove 'white space' from the front of the input string. The pseudo-code below describes how 'white space' skipping works.

        while the first character of the input is whitespace
          remove the first character from the input
    
    The library has a function 'isspace' that returns true for 'white space' characters.
    (note the '#include' file for 'isspace' is 'cctype')

    The following code returns the first character of the input.
         input.at(0)
    
    The following code returns a string with the first character removed.
         input.substr(1)
    

    Compile and test.

  6. Fix 'scanToken' to check for 'COMMA' in the input (Scanner.h)
    Add code to 'scanToken' so that it only returns a COMMA token when a comma character is read from the input. The following pseudo-code describes the steps of recognizing a COMMA token.

        if the first character of the input is a comma
          make a COMMA token (with type, value, and line number)
          remove the comma character from the input
    

    Compile and test.

  7. Take a screenshot showing your terminal and the resulting output. (You can also take a screenshot of an IDE showing similar results.)

  8. Submit your screenshots and a zip file containing the code you wrote during this session to Learning Suite.


Part 3: Complete Your Datalog Scanner


These steps are to be done as part of Project 1, they are not required as part of the lab session.

  1. Add scanning of other token types to 'scanToken' (Scanner.h)
    Use separate functions for STRING, COMMENT, ID, etc.
    Design these functions using state machines.

  2. Add code to keep track of line numbers, so each Token is created with the correct line number.

  3. Add reading from a file into the input string in the main function (main.cpp)