Empirical Lab Repository

Title: Experimentation with Word Frequencies

Author: Dave Reed, Creighton University, davereed@creighton.edu

Possible Courses: CS0, CS1

Empirical Concepts Introduced: Monte Carlo Methods, expected values, consistency, accuracy

Computer Science Concepts Used: computer-based simulation

Summary: This assignment uses an interactive Web page (containing JavaScript code) to introduce the concept of Monte Carlo methods. Using the existing Web page, students are led through a series of experiments to generate random letter sequences and count the number of English words obtained. From this data, they are asked to estimate the number of English words with certain characteristics. Accuracy, consistency, and the Law of Large numbers are all implicitly introduced in this lab (although not named as such).

This assignment does not involve any programming, and thus can be used at the beginning of an introductory course as a demonstration of using a program to solve a problem. Most students have a difficult time estimating the number of 3-letter or 4-letter words in the English language. When asked, numbers proposed by students can range from just a few hundred to tens of thousands. With the provided Web page, however, they are led through a systematic approach to estimating such counts. Students perform numerous experiments using the provided page, record their results, and estimate the number of words using a provided formula. They also expand upon the estimation method to predict the number of words using only a subset of letters.

Variations: If desired, an instructor could use the idea for this assignment later in a programming course. Instead of providing the Web page with the JavaScript code, the instructor could assign the task of generating the letter sequences. Such a program is relatively short and simple, involving repetition, string concatenation, and random selection. A further extension might involve comparing the generated sequences against an online dictionary to remove the need for human intervention.