CSC 321: Data Structures
Fall 2013

HW6: Sets & Maps

This assignment is to be completed independently (with help from the instructor, as needed). No late submissions will be accepted.

A KWIC (Key Word in Context) index is a list of titles or other phrases, organized so that it can be easily searched by keyword. Before computers, KWIC indexes were commonly used in technical manuals and book/article compilations to enable searches by topic or keyword. For example, the following is the first few lines from a KWIC index of book titles.

                        ABSALOM, ABSALOM!
                 Berlin ALEXANDERPLATZ
            Things Fall APART
The Devil to Pay in the BACKLANDS
                  Gypsy BALLADS
                        BERLIN Alexanderplatz
                        BOOK of Job
                 Madame BOVARY
                    The BROTHERS Karamazov
                        .
                        .
                        .

Note that every title will appear once for each keyword in that title, excluding junk words like "the" and "of". The titles in the KWIC index are listed in alphabetical order by keyword, with the keyword displayed in all caps. If the same keyword appears more than once in a title, the title is only listed once (for that keyword) with all occurrences of the keyword capitalized. When identifying keywords, any characters other than letters or digits at the beginning or end of the word should be ignored. For example, in the first title, "Absalom, Absalom!", the keyword "Absalom" appears twice (ignoring the punctuation that appears at the ends of the words). Thus, the title appears only once in the KWIC index, with both occurrences of the keyword capitalized. Punctuation that appears internally in keywords, e.g., "isn't", should be maintained.

For this assignment, you will write a program that reads in a file containing junk words to be ignored (the junkWords.txt file from HW1 suffices) and a file of titles (the Wikipedia top 100 books list is provided). Your program should construct a KWIC index and allow the user to search that index. A showKeyword method should be provided, which takes a keyword as input and displays all of the titles that contain that keyword (in alphabetical order, with the keyword in all caps). It should also provide a showAll method, which has no inputs and displays the entire KWIC index.

To simplify this task, you do no need to align the titles so that keywords begin in the same columns. Also, you may display non-keywords in all lowercase. That is, your showAll method might display the above KWIC index as:

ABSALOM, ABSALOM!
berlin ALEXANDERPLATZ
things fall APART
the devil to pay in the BACKLANDS
gypsy BALLADS
BERLIN alexanderplatz
BOOK of job
madame BOVARY
the BROTHERS karamazov
.
.
.

FOR EXTRA CREDIT, modify your code so that the capitalization of non-keywords is maintained and titles are aligned by keyword.