Evaluating Interface Efficiency

Comparing results from the Keystroke-Level Model and empirical timings

For this assignment, you will be applying two methods for evaluating the efficiency of the user interface for a selected application. In this case, efficiency will be defined as the amount of time it takes a practiced user to complete a core task of the application.

The first method is Card, Moran and Newell's Keystroke-Level Model (KLM), which we have already discussed and demonstrated in class. With its theoretical analysis, this method provides a theoretical prediction of the time needed to complete a task. As you recall, the method assumes an experienced user who can perform the task without slips or mistakes.

The second method involves timing users as they complete the task. Because we will want to compare the results between the two methods, you will want to make sure that your users are experienced with the application before you time them.

Assignment procedure

Using the application assigned to you, specify a representative task. You will be evaluating the efficiency for this task using both the KLM (analytical method) and actual user timings (empirical method).

Analysis for the keystroke level model

Prepare an outline of the abstract steps that an experienced user would take when completing the task.
List the primitive physical operators (i.e. K, P, H, and R) needed to accomplish each abstract step.
Add the M operators according the KLM rules.
Total the time constants to produce the predicted time needed to complete the task.

Timing users completing the task

Create the instructions for completing the task that you will present to your test users.
For each test user:
1. Allow the user to become familiar with the application by performing similar tasks.
2. Present the task instructions to the user.
3. Time how long it takes the user to complete the task.
Collect timings of at least 12 different users and compute the following:
- Average
- Standard deviation
- 95% confidence interval of the average

Discussion questions

What are some possible reasons for why the KLM might produce an inaccurate estimate of actual task completion times?
What are some possible reasons for why the average of the collected times might produce an inaccurate estimate of how long an experienced user would take under real usage conditions?
How does the predicted time of the KLM compare to the average of the collected timings? Does the predicted time lie within the confidence interval of the average? If not, what might account for the discrepancy?
How might you better collect actual timings so that a smaller confidence interval is produced?