Links 1 through 5 of 5 by Justin Mason tagged number-crunching

Fantastic comparative number crunching on the JC Decaux Dublin Bikes scheme, compared to their other European cities (Brussels, Lyons, Paris, Seville), times of day, busiest stations, rainfall, etc.

specifically, Hadoop and Pig for log/metrics analytics, Cassandra going forward; great preso, lots of detail and code examples. also, impressive number-crunching going on at Twitter

910-node cluster sorting 1TB of data in 209 seconds, using Hadoop and HDFS. I wish we had a Hadoop cluster to do SpamAssassin mass-checks on ;)

good review of the current state of the Netflix machine-learning challenge (via BO'S)

good number-crunching on the Verisign .com dump file, via waxy. 100% of the top 10,000 US family names are already registered. Hey ICANN! the zones should be publicly FTP-able again ;)

