Spelling Checker
Note: Projects are to be completed by each student individually (not by groups of students).
The spelling checker reads the words in a document and outputs a list of misspelled words. For each misspelled word, the program outputs the word followed by a list of the line numbers where the word is found in the document.
The program decides which words are misspelled by using a dictionary of words. The program searches the dictionary for each word in the document. If a word from the document cannot be found in the dictionary, the word is added to the list of misspelled words.
Example Inputs
The spelling checker is given a dictionary and document to be checked as inputs.
Dictionary
if is in of be bugs the then them must
Document
If debugging is the process of removing bugs. Then programming must be the process of putting them in.
Example Output
The spelling checker outputs the misspelled words that were found in the document.
debugging: 1 process: 2 4 programming: 3 putting: 4 removing: 2
Testing
Here are some ideas for tests.
- An empty dictionary file.
- An empty document file.
- A document with no misspelled words.
- A misspelled word repeated multiple times on one line.
- A dictionary that contains upper-case letters.
- A document that contains hyphens or other punctuation.
Dictionary File Format
The spelling checker uses a dictionary of words. The dictionary is a text file that contains a list of words, one word per line. When comparing words in the document with words in the dictionary, use a comparison that is not case-sensitve.
Document Files
The document to be checked is just a text file. The spelling checker reads the file to be checked and extracts each word from the file. What's the definition of a word in the document file? Words are defined as sequences of letters ('a-z' and 'A-Z') that are separated by characters that are not letters.
Each word in the document is checked against the words in the dictionary in order to detect misspelled words. If a word is not found in the dictionary, it is considered to be misspelled. When comparing words in the document with words in the dictionary, use a comparison that is not case-sensitve.
A document file might contain English prose, or it could just be a list of words. The program does the same thing in either case: 1) extracts the words in the file and 2) checks to see if they're in the dictionary.
Output Format
The output of the spelling checker is a sorted list of misspelled words, one word per line.
Each misspelled word is printed in lower case and followed by a colon and a space-separated list of the line numbers where the word is found in the document. The line numbers are given in increasing order.
The output is written to a file, not to the standard output.
Implementation Requirements
- Store the dictionary in a Set.
- Store the misspelled words in a Map where the key is a word and the value is a List of line numbers.
- Store the line numbers for a given misspelled word in a List.
- Use a compare that is not case-sensitive when comparing words in the document with words in the dictionary.
- Output the misspelled words in sorted order.
- Output the misspelled words in lower case.
- Your implementation needs to run on large files in a reasonable amount of time.
Command Line
The program is run with the names of the dictionary, document, and output files given on the command-line. For example the program might be run like this:
lab2 dictionary.txt document.txt misspelled.txt