‘Destroy America’ = Suspicion Fail

News that incautious comments on “tweeter” got British tourists excluded from the United States had Twitter alight yesterday. (Paperwork given to one of the two, on display in this news story, refers to the popular social networking site as a “Tweeter website account,” betraying some ignorance of what Twitter is.)

It’s a good chance to review how suspicion is properly—and, here, improperly—generated.

The Department of Homeland Security has been vague as yet about what actually happened. It may have been some kind of “social media analysis” like this that turned up “suspicious” Tweets leading to the exclusion, though the betting is running toward a suspicious-activity tipline. (What “turned up” the Tweets doesn’t affect my analysis here.) The boastful young Britons Tweeted about going to “destroy America” on the trip—destroy alcoholic beverages in America was almost certainly the import of that line—and dig up the grave of Marilyn Monroe.

Profoundly stilted literalism took this to be threatening language. And a failure of even brief investigation prevented DHS officials from discovering the absurdity of that literalism. It would be impossible to “dig up” Marilyn Monroe’s body, which is in a crypt at Westwood Memorial Park in Los Angeles.

I testified to the Senate Judiciary Committee in 2007 about how one might mine data for terrorists and terrorism planning, in terms that apply equally well to Twitter banter and to any criminality or wrongdoing. For valid suspicion to arise, the information collected must satisfy two criteria:

(1) It is consistent with bad behavior, such as terrorism planning or crime; and (2) it is inconsistent with innocent behavior. In . . . the classic Fourth Amendment case, Terry v. Ohio, . . .  a police officer saw Terry walking past a store multiple times, looking in furtively. This was (1) consistent with criminal planning (“casing” the store for robbery), and (2) inconsistent with innocent behavior — it didn’t look like shopping, curiosity, or unrequited love of a store clerk. The officer’s “hunch” in Terry can be described as a successful use of pattern analysis before the age of databases.

Similarly, using the phrase “destroy America” is consistent with planning to destroy America. (You want to be literal? Let’s be literal!) But it’s also consistent with talking smack, which is innocent behavior. These Tweets fail the second criterion for generating suspicion.

Twitter is nothing if not an unreliable source of people’s thinking and intentions. It’s a hotbed of irony, humor, and inside jokes. Witness this Tweet of mine from yesterday, which failed to garner the social media guffaw I sought (which is why I link to it here). Things said on Twitter will almost never be suspicious enough to justify even the briefest interrogation.

Other facts could combine with Twitter commentary to create a suspicious circumstance on extremely rare occasions, but for proper suspicion to arise, the Tweet or Tweets and all other facts must be consistent with criminal planning and inconsistent with lawful behavior. No information so far available suggests that the DHS did anything other than take Tweets literally in the face of plausible explanations by their authors that they were using hyperbole and irony. This is simple investigative incompetence.

If indeed it is a “social media analysis” program that produced this incident, the U.S. government is paying money to cause U.S. government officials to waste their time on making the United States an unattractive place to visit. That’s a cost-trifecta in the face of essentially zero prospect for any security benefit. I slept no more soundly last night knowing that some Brits were denied a chance to paint the town red in L.A.

In case it needs explaining, “paint the town red” is archaic slang. It does not imply an intention or plan to apply pigments to any building or infrastructure in Los Angeles, whether by brush, roller, or spray can.

Sorrell vs. IMS Health: Not a Privacy Case

The Supreme Court’s decision in Sorrell vs. IMS Health is being touted in many quarters as a privacy case, and a concerning one at that. Example: Senator Patrick Leahy (D-VT) released a statement saying “the Supreme Court has overturned a sensible Vermont law that sought to protect the privacy of the doctor-patient relationship.” That’s a stretch.

The Vermont law at issue restricted the sale, disclosure, and use of pharmacy records that revealed the prescribing practices of doctors if that information was to be used in marketing by pharmaceutical manufacturers. Under the law, prescription drug salespeople—”detailers” in industry parlance—could not access information about doctors’ prescribing to use in focusing their efforts. As the Court noted, the statute barred few other uses of this information.

It is a stretch to suggest that this is a privacy law, given the sharply limited scope of its “protections.” Rather, the law was intended to advance the state’s preferences in the area of drug prescribing, which skew toward generic drugs rather than name brands. The Court quoted the Vermont legislature itself, finding that the purpose of the law was to thwart “detailers, in particular those who promote brand-name drugs, convey[ing] messages that ‘are often in conflict with the goals of the state.’” Accordingly, the Court addressed the law as a content- and viewpoint-oriented regulation of speech which could not survive First Amendment scrutiny (something Cato and the Pacific Legal Foundation argued for in their joint brief.)

What about patients’ sensitive records? Again, the case was about data reflecting doctors’ prescribing practices, which could include as little as how many times per year they prescribe given drugs. (They probably include more detail than that.) The risk to patients is based on the idea that patients‘ prescriptions might be gleaned through sufficient data-mining of doctors prescribing records (no doubt with other records appended). That’s a genuine problem, if largely theoretical given the availability and use of data today. Vermont is certainly free to address that problem head on in a law meant to actually protect patients’ privacy—against the state itself, for example. Better still, Vermonters and people across the country could rely on the better sources of rules in this new and challenging area: market pressure (to the extent possible in the health care area) and the (non-prescriptive, more adaptive) common law.

Whatever the way forward, Sorrell vs. IMS Health is not the privacy case some are making it out to be, it’s not the outrage some are making it out to be, and it’s not the last word on data use in our society.

Good News! Online Tracking is Slightly Boring

You have to wade through a lot to reach the good news at the end of Time reporter Joel Stein’s article about “data mining”—or at least data collection and use—in the online world. There’s some fog right there: what he calls “data mining” is actually ordinary one-to-one correlation of bits of information, not mining historical data to generate patterns that are predictive of present-day behavior. (See my data mining paper with Jeff Jonas to learn more.) There is some data mining in and among the online advertising industry’s use of the data consumers emit online, of course.

Next, get over Stein’s introductory language about the “vast amount of data that’s being collected both online and off by companies in stealth.” That’s some kind of stealth if a reporter can write a thorough and informative article in Time magazine about it. Does the moon rise “in stealth” if you haven’t gone outside at night and looked at the sky? Perhaps so.

Now take a hard swallow as you read about Senator John Kerry’s (D-Mass.) plans for government regulation of the information economy.

Kerry is about to introduce a bill that would require companies to make sure all the stuff they know about you is secured from hackers and to let you inspect everything they have on you, correct any mistakes and opt out of being tracked. He is doing this because, he argues, “There’s no code of conduct. There’s no standard. There’s nothing that safeguards privacy and establishes rules of the road.”

Securing data from hackers and letting people correct mistakes in data about them are kind of equally opposite things. If you’re going to make data about people available to them, you’re going to create opportunities for other people—it won’t even take hacking skills, really—to impersonate them, gather private data, and scramble data sets.

If Senator Kerry’s argument for government regulation is that there aren’t yet “rules of the road” pointing us off that cliff, I’ll take market regulation. Drivers like you and me are constantly and spontaneously writing the rules through our actions and inactions, clicks and non-clicks, purchases and non-purchases.

There are other quibbles. “Your political donations, home value and address have always been public,” says Stein, ”but you used to have to actually go to all these different places — courthouses, libraries, property-tax assessors’ offices — and request documents.”

This is correct insofar as it describes the modern decline in practical obscurity. But your political donations were not public records before the passage of the Federal Election Campaign Act in 1974. That’s when the federal government started subordinating this particular dimension of your privacy to others’ collective values.

But these pesky details can be put aside. The nuggets of wisdom in the article predominate!

“Since targeted ads are so much more effective than nontargeted ones,” Stein writes, ”websites can charge much more for them. This is why — compared with the old banners and pop-ups — online ads have become smaller and less invasive, and why websites have been able to provide better content and still be free.”

The Internet is a richer, more congenial place because of ads targeted for relevance.

And the conclusion of the article is a dose of smart, well-placed optimism that contrasts with Senator Kerry’s sloppy FUD.

We’re quickly figuring out how to navigate our trail of data — don’t say anything private on a Facebook wall, keep your secrets out of e-mail, use cash for illicit purchases. The vast majority of it, though, is worthless to us and a pretty good exchange for frequent-flier miles, better search results, a fast system to qualify for credit, finding out if our babysitter has a criminal record and ads we find more useful than annoying. Especially because no human being ever reads your files. As I learned by trying to find out all my data, we’re not all that interesting.

Consumers are learning how to navigate the online environment. They are not menaced or harmed by online tracking. Indeed, commercial tracking is congenial and slightly boring. That’s good news that you rarely hear from media or politicians because good news doesn’t generally sell magazines or legislation.

Pre-Crime Software?

It sounds a little bit like the “pre-crime” unit featured in the 2002 film “Minority Report,” but news that Washington, D.C. will implement software to “predict” crime is not quite as worrisome as it might seem at first blush.

Beginning several years ago, the researchers assembled a dataset of more than 60,000 various crimes, including homicides. Using an algorithm they developed, they found a subset of people much more likely to commit homicide when paroled or probated. Instead of finding one murderer in 100, the UPenn researchers could identify eight future murderers out of 100.

Berk’s software examines roughly two dozen variables, from criminal record to geographic location. The type of crime, and more importantly, the age at which that crime was committed, were two of the most predictive variables.

Unlike applying data mining to detection of terrorism planning or preparation, which is exceedingly rare, using tens of thousands of examples of recidivism to discover predictive factors is a good way to focus supervision resources where they are most likely to be effective.

The article describes use of this software for monitoring parolees and probationers. Using data mining to justify anything approaching extra punishment would be a misuse, and many far more difficult issues would arise if it were used on the general population.

The Wall Street Journal’s Surveillance Fantasies

There are too few periodical venues for good short fiction these days, so I’d normally be enthusiastic about the Wall Street Journal‘s decision to print works of fantasy. Unfortunately, they’ve opted to do so on their editorial page—starting with a long farrago of hypotheticals concerning the putative role of the Foreign Intelligence Surveillance Court in hindering the detection and apprehension of failed Times Square bomber Faisal Shahzad. In fairness to the editors, they acknowledge near the end of the piece that much of it is unvarnished speculation, but their flights of creative fancy extend to many claims presented as fact.

Let’s begin with the acknowledged fiction. The Journal editors wonder whether Shahzad might have been under surveillance before his botched Times Square attack, and posit that the NSA might have intercepted communications from “Waziristan Taliban talking about ‘our American brother Faisal,’ which could have been cross-referenced against Karachi flight manifests,” or “maybe Shahzad traded seemingly innocuous emails with Pakistani terrorists, and minimization precluded analysts from detecting a pattern.”  Anything is possible. But it’s a leap to make this inference merely because investigators appear to have had fairly specific knowledge about his contacts with terrorists after he had already been identified.  They would not have needed to “retroactively to reconstruct his activities from other already-gathered foreign wiretaps:” Once they had zeroed in on Shahzad, his calling patterns could have been reconstructed from phone company calling records whether or not he or his confederates were being targeted at the time the communications occurred, and indeed, those records could have been obtained by means of a National Security Letter without any oversight from the FISA Court.

Read the rest of this post »

The Difficulty With Finding Rare Events in Data

John Brennan, assistant to the president for homeland security and counterterrorism, made the rounds of the Sunday political shows this weekend. He’ll be reviewing the attempted bombing of Northwest flight 253 for the president. 

His appearance on ABC’s This Week program revealed his struggle with the limitations on data mining for counterterrorism purposes. His interviewer, Terry Moran, betrayed even less awareness of the challenge. Their conversation is revealing:

Moran: Who dropped the ball here? Where did the system fail?

Brennan: Well, first of all, there was no single piece of intelligence or “smoking gun,” if you will, that said that Mr. Abdulmutallab was going to carry out this attack against that aircraft. What we had, looking back on it now, were a number of streams of information. We had the information that came from his father, where he was concerned about his son going to Yemen, consorting with extremists, and that he was not going to go back.

We also, though, had other streams of information, coming from intelligence channels that were little snippets. We might have had a partial name, we might have had indication of a Nigerian, but there was nothing that brought it all together.

What we need to do as a government and as a system is to bring that information together so when a father comes in with information and we have intelligence, we can map that up so that we can stop individuals like Abdulmutallab from getting on a plane.

Moran: But that is exactly the conversation we had after 9/11, about connecting these disparate dots. You were one of the architects of the system put in place after that, the National Counterterrorism Center. That’s where the failure occured, right? The dots weren’t connected.

Brennan: Well, in fact, prior to 9/11, I think there was reluctance on the part of a lot of agencies and departments to share information. There is no evidence whatsoever that any agency or department was reluctant to share.

Moran: Including the NSA? Were the NSA intercepts shared with the National Counterterrorism Center?

Brennan: Absolutely. All the information was shared. Except that there are millions upon millions of bits of data that come in on a regular basis. What we need to do is make sure the system is robust enough that we can bring that information to the surface that really is a threat concern. We need to make the system stronger. That’s what the president is determined to do.

Moran: You see millions upon millions of bits of data that—Facebook has 350 million users who put out 3.5 billion pieces of content a week, and it’s always drawing connections. In the era of Google, why does [the] U.S. intelligence community not have the sophistication and power of Facebook?

Brennan: Well, in fact, we do have the sophistication and power of Facebook, and well beyond that. That’s why we were able to stop Mr. Najibullah Zazi, David Headley, [and] other individuals from carrying out attacks, because we were able to do that on a regular basis. In this one instance, the system didn’t work. There were some human errors. There were some lapses. We need to strengthen it.

In our paper, Effective Counterterrorism and the Limited Role of Predictive Data Mining, distinguished engineer and chief scientist with IBM’s Entity Analytic Solutions Group Jeff Jonas and I distinguished between what we called subject-based data analysis and pattern-based analysis.

Subject-based data analysis seeks to trace links from known individuals or things to others. . . . In pattern-based analysis, investigators use statistical probabilities to seek predicates in large data sets. This type of analysis seeks to find new knowledge, not from the investigative and deductive process of following specific leads, but from statistical, inductive processes. Because it is more characterized by prediction than by the traditional notion of suspicion, we refer to it as “predictive data mining.”

The “power” that Facebook has is largely subject-based. People connect themselves to other people and things in Facebook’s data through “friending,” posting of pictures, and other uses of the site. Given a reason to suspect someone, Facebook data could reveal some of his or her friends, compatriots, and communications.

That’s a lot compared to what existed in the recent past, but it’s nothing special, and its nothing like what Brennan wants from the data collection done by our intelligence services. He appears to want data analysis that can produce suspicion in the absence of good intelligence—without the “smoking gun” he says we lacked here.

Unfortunately, the dearth of patterns indicative of terrorism planning will deny success to that project. There isn’t a system “robust” enough to identify current attacks or attempts in data when we have seen examples of them only a few times before. Pattern-based data mining works when there are thousands and thousands of examples from which to build a model of what certain behavior looks like in data.

If Brennan causes the country to double down on data collection and pattern-based data mining, plan on having more conversations about failures to “connect the dots” in the future.

As George Will said on the same show, “When you have millions of dots, you cannot define as systemic failure—catastrophic failure—anything short of perfection. Our various intelligence agencies suggest 1,600 names a day to be put on the terrorist watch list. He is a known extremist, as the president said. There are millions of them out there. We can’t have perfection here.”

We’ll have far less than perfection—more like wasted intelligence efforts—if we rely on pattern-based or predictive data mining to generate suspicions about terrorism.

Fort Hood: Reaction, Response, and Rejoinder

Commentary on the Fort Hood incident can be categorized three ways: reaction, response, and rejoinder (commentary on the commentary).

Reactions generally consist of pundits pouring their preconceptions over what is known of the facts. These are the least worthy of our time, and rejoinders like this one from Stephen M. Walt of Harvard University in the Fort Hood section of The Politico‘s Arena blog dispense with them well:

Of course [Fort Hood] is being politicized; there is no issue that is immune to exploitation by politicians and media commentators. The problem is that there are an infinite number of “lessons” one can draw from a tragic event like this — the strain on our troops from a foolish war, the impact of hateful ideas from the fringe of a great religion (and most religions have them), the individual demons that drove one individual to a violent and senseless act, etc., — and so no limits to the ways it can be used by irresponsible politicians (is that redundant?) and pundits.

My favorite response—by “response,” I mean careful, productive analysis—was written last year as a general admonition about events like this (which at least has terrorist connotations):

Above all else is the imperative to think beyond the passions of those who are hurt, frightened or angry. Policymakers who become caught up in the short-term goals and spectacle of terrorist attacks relinquish the broader historical perspective and phlegmatic approach that is crucial to the reassertion of state power. Their goal must be to think strategically and avoid falling into the trap of reacting narrowly and directly to the violent initiatives taken by these groups.

That’s Audrey Kurth Cronin, Professor of Strategy at the U.S. National War College in her monograph, Ending Terrorism: Lessons for Defeating al-Qaeda.

But I want to turn to a critique leveled against my recent post, ”The Search for Answers in Fort Hood,” which discussed how little Fort Hood positions us to prevent similar incidents in the future. (I hope it was response and not reaction, but readers can judge for themselves.)

A thoughtful Cato colleague emailed me suggesting that there may have been enough indication in Nidal Hasan’s behavior—in particular, correspondence with Anwar al-Awlaki—to stop him before his shooting spree.

There may have been. Current reporting has it that his communications with al-Awlaki were picked up and examined, but because they were about a research paper that he was in fact writing, he was deemed not to merit any further investigation.

This can only be called error with the benefit of hindsight. And it tells us nothing about what might prevent a future attack, which was my subject.

If humans were inert objects, investigators could simply tweak the filter that caused this false negative to occur. They could not only investigate the people who contact known terrorists as they did Nidal Hassan, they could know to disregard claimed academic interests. Poof! The next Nidal Hassan would be thwarted at a small cost to actual researchers.

But future attacks are not like past attacks. Tweaking the filter to eliminate this source of false negatives would simply increase false positives without homing in on the next attacker. Terrorists and terrorist wannabes will change their behavior based on known and imagined measures to thwart them. Nobody’s going to be emailing this al-Awlaki guy for a while.

Read the rest of this post »

Report to DoD: Data Mining Won’t Catch Terrorism

Via Secrecy News, “JASON”—a unit of defense contractor the MITRE Corporation—has reported to the Department of Defense on the weakness of data mining for predicting or discovering inchoate terrorist attacks.

“[I]t is simply not possible to validate (evaluate) predictive models of rare events that have not occurred, and unvalidated models cannot be relied upon,” says the report.

In December 2006, Jeff Jonas and I published a paper making the case that predictive modeling won’t discover rare events like terrorism. The paper, Effective Counterterrorism and the Limited Role of Predictive Data Mining, was featured prominently in a Senate Judiciary Committee hearing early the next year.

Privacy gives way to appropriate security measures, as the Fourth Amendment suggests, where it approves “reasonable” searches and seizures. Given the incapacity of data mining to catch terrorism and the massive data collection required to “mine” for terrorism, data mining for terrorism is a wrongful invasion of Americans’ privacy—and a waste of time.