Lexical Analysis How do you read and process the following code? What are the natural parts in which to break the input? void quote() { print( "To err is human, but to really foul things up you need a computer." + " - Paul Ehrlich" ); } How do you read and process the following code? (Datalog is a language used in this class.) What are the natural parts in which to break the input? Schemes: childOf(X,Y) marriedTo(X,Y) Facts: marriedTo('Zed','Bea'). marriedTo('Jack','Jill'). marriedTo('Ned','Jan'). childOf('Jill','Zed'). childOf('Ned','Bea'). childOf('Tim','Jack'). childOf('Sue','Jack'). Rules: childOf(X,Y) :- childOf(X,Z), marriedTo(Y,Z). marriedTo(X,Y) :- marriedTo(Y,X). Queries: marriedTo('Bea','Zed')? childOf('Jill','Bea')? What's a Scanner (Lexical Analyzer)? What's a Parser? What's an Interpreter? What's the input to a Scanner? What's the output from a Scanner? What's the input to a Parser? What's the output from a Parser? What's Project 1? a Scanner for Datalog What's Project 2? a Parser for Datalog What are Projects 3, 4, and 5? an Interpreter for Datalog Scanner What's a Token? What are the parts of a Token? Classwork You may work with a partner. What does the Scanner output when given the input? Facts: childOf('Ned','Bea'). Regular Expressions What notation is commonly used to specify Tokens? What are the simple operations you can do with a regex? a any single symbol ab concat of two regex a|b union of two regex a* repetition of a regex [a-z] shorthand for union DEMO: load datalog.txt as test data at regex101.com or egrep -o on datalog.txt How does the Scanner recognize a Comma? What's a Regular Expression for a Comma token? DEMO: , in regex box egrep -o ',' datalog.txt :-|:|,|\.|\? Facts|Rules How does the Scanner recognize an Identifier? What's an English description for an Identifier token? a letter followed by zero or more letters and digits What's a Regular Expression for an Identifier token? DEMO: How does the Scanner recognize a String? Classwork You may work with a partner. What's an English description for a String Literal token? What's a Regular Expression for a String Literal token? DEMO: State Machines How do you implement the recognition of a token? Board: draw machines for ab, a|b, a* Give a State Machine for an Identifier token. letter (letter | digit)* Classwork You may work with a partner. Give a State Machine for a String Literal token. quote (not-quote)* quote How does the Scanner recognize a Keyword? Do you write an Expression and build a Machine for each keyword? Coding a State Machine How do you implement a State Machine in code? 1. store the state in a variable 2. encode the state in the position in the code Give the (state-in-a-variable) code for the String Literal machine. state = START; input = readNextChar(); while (state != ACCEPT) { if (state == START) { if (input == QUOTE) { state = STRING; input = readNextChar(); } } else if (state == STRING) { if (input == QUOTE) { state = ACCEPT; input = readNextChar(); } else { state = STRING; input = readNextChar(); } } } Give the (state-by-position) code for the String Literal machine. // begin in START state input = readNextChar(); if (input == QUOTE) { // now in STRING state input = readNextChar(); while (input != QUOTE) { // stay in STRING state input = readNextChar(); } // now in ACCEPT state input = readNextChar(); } Designing and Coding a Scanner How do you use these ideas to build a Scanner? 1. English Description 2. Regex 3. State Machine 4. Code Why is this better than only writing code? better understanding of the problem easier to write the code simpler code more readable code code follows a regular pattern