| Integrating Empirical Methods into Computer Science | |
|
Project Goals Required Skills Lab Repository Related Papers Collaborators Feedback |
||
Empirical Lab RepositoryTitle: Author Identification via Letter Frequencies Author: Dave Reed, Creighton University, davereed@creighton.edu Possible Courses: CS1 Empirical Concepts Introduced: data analysis Computer Science Concepts Used: file I/O, arrays, counters, loops, character manipulation Summary:
This assignment involves analyzing patterns that may appear in works of literature
by an author. It has been shown that authors tend to follow the same patterns in
their
writing style (e.g., favoring certain words and letters over others), and these
patterns
have been used by researchers in identifying the author of uncredited works. In
this assignment, students will analyze works of literature with respect to letter
frequencies. Each work is read from a file, one character at a time, and counts
for each of the letters maintained. The absolute and relative frequencies of each
letter are then displayed in a table for analysis.
Variations: To further emphasize the role of analysis, additional questions could be asked of the students. For example, the instructor might provide several works of literature by two different authors, then provide an uncredited work and ask the student to identify (and justify) the author.
|