26 May 2006
SETI League
PriUPS Project


Traffic Analysis

Sounds like a mundane term, right?  Maybe one of those gadgets counting the cars going by to see if a traffic light is necessary on the corner.  But it's also the term used to explain why the NSA is trying to collect all the telephone numbers we call.  Traffic analysis is one of the most valuable tools in the arsenal of an intelligence agency.  If you don't believe me, consider yourself as the analyst:  When you get your telephone bill, don't you try to figure out who called whom?  If you can't account for a call and are suspicious of your spouse, do you check the area code and try to remember where his (or her) old flame moved?  If you are (correctly) suspicious of your teenager, do you try to figure out whether that hour-long call is about homework, elopement, or potential mayhem?  Of course you haven't actually listened to the call itself.  But you are performing traffic analysis just from the number, call duration, and time of day.

The National Security Agency does the same thing, and presumably is doing so by collecting records of all of our telephone calls.  And surprise! they're better at analysis than you are.  Even if the conversations are in a foreign language, a word code, or fully encrypted, they can still note that calls between two or more parties are occurring more frequently.  Is it a prelude to a terrorist attack or a book club meeting? (There is a difference!)  Are they discussing enriching literature or enriched uranium?  I'd like to think that the NSA can usually tell.  Traffic analysis is an extremely valuable tactic.  I'd hate to deny it to the good guys.  (That would be the USA, by the way.)

Is collecting our call records a violation of privacy?  That's a good question to start an argument, and it clearly has.  If the fact that you made a call to Mom or to your bookie is known to the NSA concerns you, then it probably is a violation.  If you have a more relaxed attitude everyone calls his mom, and bookies always seem to have a lot of telephones then maybe you're not too concerned.  I personally am not concerned; I recognize that many others are.

But that isn't the point of this blogitem.  When I heard about the NSA program, I became curious about the technical aspects of it.  How can even the NSA handle this massive classified data collection program?  This question led me to make some rough assumptions and do a quick calculation:

  • There are 300 million people in the United States.  250 million of them are telephonically active.
  • Each one makes ten telephone calls per day.
  • For each of these calls, the NSA needs to store not just the calling number, but the called number as well.  Not to mention the start time, and, for completed calls, the duration.

This is a truly massive amount of data!  250 million people * 10 calls/day * (10 bytes per calling number + 10 bytes per called number + 6 bytes for the start time + 3 bytes for the call duration in seconds)

Why, that comes to 72,500,000,000 bytes, or almost enough to fill a small hard drive in a cheap computer!  (Tell me that my numbers are off - that people make twice as many calls.  I'll tell you that that 10 bytes per number is too generous by far - it can be compressed to five bytes with no effort at all.)

Could something be wrong here?  Is our candidate DCI* testifying behind closed doors about an enormous classified program whose daily take can't even fill a bargain computer disk?  Does ATT need a secret room to collect this data?  Could there possibly be more to it?  Are we the nation in an uproar over 73 gigabytes per day of raw data?  Let's ponder that over this holiday weekend.  

Can they possibly be thinking of confirming someone named Hayden as DCI?  Haven't they read Tinker, Tailor, Soldier, Spy?

Richard Factor