Shuffling the Lineup: How One Man Is Redesigning the Witness ID Process

Posted by harrison Freeman and Ben Paynter

GOOD 026, Beg Borrow or Steal, Magazine
With each step he takes down the bright hallway inside the Austin Police Department’s headquarters, the tall man in the white dress shirt and blue slacks looks more nervous. Next to him, a shorter, broad-shouldered detective named Derek Israel tries to calm him. Just look at a few pictures in a photo lineup, Israel says. “I’ll try,” the man says, “but don’t expect too much from me.”

Israel walks him to the robbery unit, a makeshift area in a sea of cubicles. Other detectives slurp coffee and raise their voices to be heard over the incessant ring of telephones. Israel seats the victim at a small desk and opens a black Dell laptop. The man’s eyes widen. Guided by onscreen prompts, Israel asks some basic questions about the conditions of the crime—weather, visibility, distance to the attacker, amount of time the assault lasted. “I don’t want to get anyone innocent in trouble,” the victim says. Israel tells him not to worry; with these preliminaries taken care of, he leaves.
 
Another detective, ruddy-faced, sits down in Israel’s vacated chair. “You know how to use a computer and a mouse and all that stuff?” he asks. The victim nods. The detective mouses the cursor to a green button at the bottom of a new screen, then motions for the victim to click Start. The computer starts to talk in a soothing female voice. “Photos will be shown one at a time, and you will be asked if the individual is familiar to you,” it says. This won’t be the classic Law & Order lineup, six people paraded in front of one-way glass. It won’t even be a lineup the way lineups are actually done these days in most law enforcement jurisdictions, a six-pack array of photographs, all roughly alike, one of them the suspect. Instead the victim will see images of potential muggers one at a time, and for each of them the computer will ask the same question: “Does this person look familiar to you?” There are three possible answers: Yes, No, and Not Sure. There’s no going back to see pictures twice, but if the answer is Yes, the lineup continues, giving the witness the chance to positively identify a different photo—to change his mind about the earlier ID.
 
When the first photo flashes onscreen, the victim cocks his head, squints his eyes and spends a full 20 seconds scrutinizing the photo before he clicks No. On the next image, he takes even longer, leaning in close before he clicks Not Sure. The program pauses. “He has a strange face,” the victim tells the detective, struggling to figure out if it is familiar. The detective shrugs noncommittally, and the pictures resume. It’s part detective work and part psych experiment. The man in the blue slacks, along with other victims in Austin, San Diego, Tucson, and Charlotte, is part of a research study, and they are all following the same protocol: They sit in front of laptops and are shown either a sequential lineup or a six-pack array. A detective unfamiliar with the case administers the lineup to avoid unconscious influence, while the computer tracks how long it takes the witness to make a choice and records everything he says. The researchers running the study hope to figure out the best way to elicit accurate recollections from people who experience horrible things—to reduce the number of times they might either wrongly accuse innocents of those crimes or mistakenly allow someone guilty to go free.
 
It’s an experimental protocol designed by Gary Wells, the guru of eyewitness reliability—or rather, unreliability. The director of social sciences at the American Judicature Society’s Center for Forensic Science and Public Policy, Wells has been working on lineups since the 1970s, but in the past 20 years exonerations of hundreds of prisoners based on DNA evidence—after many had been convicted in part based on good-faith eyewitness testimony—have made his task all the more urgent. Wells doesn’t want to merely understand witness identification. He wants to fix it.
* * *

Gary Wells kicks his feet up on a large wooden table in his conference room on the fourth floor of Iowa State University’s psychology building. He’s in his early 60s, favors professorial cardigans, and after more than three decades of research pulls no punches when criticizing police methodology. “The modern detective? His posture is my posture right here,” he says, leaning back with an attitude that could be called complacent. “He is too eager to use an imperfect tool.” In this case, that tool is haphazardly designed photo lineups.

Wells might easily have wound up with a mug shot of his own. Growing up in a small town in central Kansas, he spent most of his time fighting, drinking, and getting suspended from school. He married at 18 and had a kid less than a year later. At one point, his best-paying gig was repairing broken washing machines and dishwashers. Finally, Wells enrolled at Kansas State University with the vague idea of becoming a teacher. He majored in psychology because the subject felt more current than historical. It seemed like a field where there was still room for debate—and it turned out to be one at which he excelled. When Wells was a grad student in social psychology, a defense attorney came to his Ohio State University office with a photograph of a classic, behind-one-way-glass lineup and a description of his client, who’d been fingered. The lawyer said his guy had been misidentified, and wanted to know more about how these types of errors happen.
 
Wells didn’t have an answer, but he became fascinated by the whole idea. How could someone be identified, or misidentified, in a lineup? He kept the picture the lawyer had brought and posted it on his office bulletin board. Over the next decade, his research came to focus on eyewitness recall.
 
Wells became a specialist in lineups. He became known for staging mock crimes and then putting the witnesses in front of photo arrays, tweaking their content and sequence. Wells’ hypothesis was that while no one could control the circumstances under which people formed memories, one could control how they recalled them. The protocols for police questioning and even the composition of a lineup made a difference. In the process, Wells developed a general idea of lineup accuracy: 54 percent of people pick the suspect, 25 percent pick someone else, and the rest pick no one. Not too bad—unless you find yourself wrongly accused.
 
The real problems manifested when Wells offered “culprit absent” lineups, which left out the suspect entirely. As many as 68 percent of people picked a filler—a photo of a person not believed to have anything to do with the crime. In other words, when the cops don’t get the right guy, the witness usually fingers someone anyway, somehow rationalizing into a bad choice from the options available. So he tried something new: Rather than putting all the suspects in a single simultaneous lineup, he’d show them one at a time, sequentially. That way, victims could only compare each suspect directly against their memory itself. The data was mounting—sequential lineups reduce the number of suspect identifications by 8 percent compared with simultaneous presentations, but the sequential approach yields 22 percent fewer false IDs when a suspect is left out of the lineup altogether.
 
The catch: It all happened in a lab, not the precinct. So the law enforcement community didn’t exactly embrace these early findings. “It sort of struck a nerve,” says Jerry Murphy, the director of Homeland Security and Development at the Police Executive Research Forum, a policy group looking at how law enforcement agencies deal with eyewitness reforms. Another thing cops didn’t like, Murphy says: the tedium of showing one photo at a time.
 
Then, in the 1990s, the rise of DNA exoneration forced the issue. Objective, incontrovertible evidence was freeing prisoners who’d been convicted on the basis of eyewitness identification. Attorney General Janet Reno asked Wells to head a task force on new lineup guidelines for states, and he proposed new practices drawn from his research. All lineups should be blind, he said—the cops administering them shouldn’t know who the suspects or fillers are. There should only be one suspect per lineup. Witnesses should be clearly advised that a suspect might not be in the lineup. And statements of confidence should be recorded verbatim at the time of the pick, because witnesses with any uncertainty have been known to talk themselves into their choices as time passes. The task force was receptive to all those ideas, but in the end didn’t come to consensus on what might be the most important variable: sequential versus simultaneous mug shot presentation. Their recommendations were still a big deal; in 2009, Wells even talked about some of them on 60 Minutes.
 

A few police departments gave the new approach a shot. But when the Illinois Legislature started agitating for similar reforms in 2005, the Chicago Police Department pushed back, insisting that it would run its own study first. When the results came out, they were significantly different than Wells’ findings—sequential lineups were worse, they said. Victims using the new method seemed less accurate, picking out the prime suspect 15 percent less frequently than those using the old six-pack style. They also made more obvious mistakes, picking a filler instead of either the suspect or no one 6 percent more often. Wells and his colleagues spotted numerous methodological problems right away, but the damage was done. Police departments could point to the Chicago study and ignore Wells’ recommendations. To him, the Illinois results felt like a personal attack, revenge for the criticism he’d heaped on police practices over the years. “I felt like I was set up,” he says. Eventually he cooled off—and decided to take another crack at proving the superiority of showing photos one by one instead of in a six-pack.
 
In 2006, Wells designed a new study protocol. The tests wouldn’t just be blind but “computer blind”—the computer itself could offer prerecorded instructions to ensure lineups were done uniformly. After officers created a lineup, the photos would also be digitally shuffled so they couldn’t pass along the location of their suspect to anyone running the lineup. That eliminates the chance of lineup administrators giving off any cues—subtle nods, coughs, or the suggestion to pay closer attention to any one photo—that might be used, unconsciously, of course, to tip witnesses off to prime suspects. The computer would even randomly decide whether to run a sequential lineup or a simultaneous one.
 
Wells asked the Police Foundation and the Innocence Project to help find departments that might be willing to participate, but it was a tough sell. A dozen departments said no before four signed on. In Austin, Commander Julie O’Brien was interested in participating because the exonerations unfolding across the country prompted her to examine the processes in her own police department. Since 1989, more than 230 prisoners have been freed based on new DNA evidence, and more than 77,000 people a year become criminal defendants based on eyewitness accounts. Only a few thousand of those cases are backed up by objective evidence like hair, blood, or semen samples.
 
O’Brien, a former police officer in the Army, already mandated that detectives justify their inclusion of suspects in lineups, making sure no one was brought in on a random hunch. (Sadly, some exonerations now show that hasn’t happened in other places.) She’d also gone further, instituting nearly all of Wells’ Reno-era reform ideas, minus fully revamping the format for showing photos. Testing the sequential method in the field just made sense. “This isn’t some sort of existential debate for us,” she says. “If sequential is better, that’s what we want to do.”
 
* * *

Wells has always been confident that the sequential method would prove superior in the real-world tests. “Memory is memory,” he says. “It shouldn’t matter whether it happens in the real world or in the lab.” This fall, data finally backed him up. Researchers crunched numbers from nearly 500 total lineups. Overall, simultaneous and sequential methods proved equally (if not highly) effective: Witnesses to real crimes picked the prime suspect 26 and 27 percent of the time, respectively. That difference isn’t statistically significant. For Wells, it’s the first indication that there actually might not be any downside to the sequential method: If the suspect is there, witnesses will pick him or her out, no matter which lineup procedure gets used. Even better, while witnesses viewing simultaneous lineups chose fillers 42 percent of the time, witnesses viewing sequential lineups picked fillers only 31 percent of the time. In other words, witnesses shown sequential lineups are 25 percent less likely to rationalize their way into bad choices.

 
Unfortunately the sequential approach can also make a detective’s job harder. Back in Austin, for example, Israel, the detective, didn’t have any problem finding a suspect in his mugging. The victim had been walking home in a rough neighborhood when two men jumped out of the darkness and pinned him against a wall; one allegedly pressed a gun to the victim’s chest while the other took his wallet. Despite the fact that the victim and mugger were less than a foot apart for about 30 seconds, the victim got little more than a basic impression of the gunman—a
slim black male, early twenties, with gold teeth, a black hoodie and a black scarf over his head.
 
Someone used the victim’s credit card at two nearby Walmarts after the robbery. Israel pulled surveillance tapes showing a guy buying a couple of cases of Budweiser, a cubic zirconium ear stud, and a Nintendo Wii in a series of transactions. He even paid for an extended warranty on the video game system. The signatures on the receipts were all aliases, but most shared the same first name: Don. In paying for the booze, the suspect also provided a birth date: April 14,
1986. “This guy was not a master criminal,” Israel says. He entered those two pieces of
intel into his police report database and eventually turned up two mug shots for a guy named Donny Ray Davis. Both seemed to match the surveillance footage.
 
The hard part was putting together a photo lineup when constrained by all of Wells’ methodologies. After finding his suspect, Israel had to assemble an array of similar-looking fillers, putting the burden on the witness to pinpoint the suspect. He downloaded booking photos from the local prison, matching properties like sex, age, race, height, and weight. But there were still problems. For instance, Davis’ mug shots were from two different arrests. In the first, the guy looked fatter, and his skin color was washed out from bad lighting. The second seemed to match better, but the suspect was wearing the same jacket and undershirt he had on in the surveillance video and thus possibly also during the mugging itself—Israel deemed that unfair.
 
Israel settled on the first photo, focusing on the man’s key feature. “I’m gonna need a guy with a muscular neck,” he says. “He’s easy: no Band-Aids, no crazy haircuts.” He added a range of 30 pounds and within two years of age, and the software spit back 1,999 matches—too many to sort through. Israel reduced the weight parameters by 10 pounds, yielding 1,387. That might seem like a lot, but he needed them. In Austin, the computer randomly decides between a sequential or simultaneous lineup and shuffles the pictures before viewing. It’s up to Israel to try to balance the photos’ variations—background colors, suspect attire—at the start. The filler matches looked similar enough to the earlier, heavy-set photo, but they weren’t as close to the more recent image of the suspect. “I don’t want to say I feel 100 percent certain,” Israel says, “but there is probably no way our guy is going to be able to pick anyone out.”
 
Sure enough, the victim is stymied. No matter how long he looks at the pictures, he clicks No on each one. “I was looking for the guy with the biggest overbite,” the victim concedes afterward. Why? When he saw the attacker’s gold teeth on the night he was mugged, they seemed to highlight that feature. While the teeth might be part of a larger bling-loaded mouthpiece that could be removed, the underlying dental defect likely couldn’t be. Thus, he looked for the person who might be concealing one. When Detective Israel reappears, the victim complains that he was unable to view suspects side by side. “You either recognize the guy or you don’t,” Israel counters. Not picking out anyone is better than a filler pick, because in general choosing a police plant over the investigators’ prime suspect rules out the suspect.
 
Eventually, Israel gets his man. He arrests Davis on credit card abuse charges, and recordings of Davis’ phone calls from jail incriminate him. “It seemed like every new person he called, he confessed to the credit card abuse, the robbery, and named his accomplice,” Israel says. Davis now faces a five-to-life sentence for armed robbery.
 
* * *

For Wells, though, there’s still more to learn. Back in his Iowa State lab, he leans over a computer in a stuffy, windowless testing booth to pull up his latest video simulation. It opens from the point of view of a witness catching a plane; you cross a street and walk through a revolving door at the Des Moines airport. After checking a departure board, you follow signs to a ticket counter and get in line. You notice six people in line next to you, including a country-looking guy in a faded gray shirt and orange cap. When no one is looking, Country Guy switches his bag with the guy in front of him. It should be an easy identification—he walks by you so closely you can read the message on his T-shirt.

Afterward, a six-pack lineup appears on the screen. For some test subjects, the images will be clear. For others, Wells pixilates the images, as if they were censored, to interfere with people’s retrieval process. The crazy part is, 60 percent of people who got the blurred-face shots still tried to make an ID. “People would say things like, ‘I think I recognize his chin,” Wells says, chuckling. (Without the blur, 84 percent tried to make an ID and got it right 87 percent of the time.)
 
The experiment is designed to block ecphory, an instant-recall process that occurs when some outside stimulus, like a photograph, triggers a sudden, sure flash of memory. Already, Wells’ preliminary results from the police study show that when people don’t have that ecphoric response, they still try to remember—and usually talk themselves into incorrect picks. Ecphoric memories seem to coincide with accurate IDs, but the slow-forming judgments—“secondary process”—are born of the kind of internal deliberation that Israel’s witness engaged in when trying to picture the overbite. That can give rise to mistakes. “It would have been nice if our victim had identified the attacker,” Israel says, but he understood the difficulty. “For some people it’s like, bang! Instant recognition. If the person has to sit there and stare at it for a long time, they might not be right.”
 
Wells couldn’t hope for a better description of his work, especially because it’s finally coming not from another psychologist, but a police detective with nearly 20 years of service. “I’m hoping the detectives will talk about their experience,” Wells says. “They can be our star witnesses.”