Manic data miner
2010-10-13
The other day at work , prompted by a shoutbox conversation with one of our users , I did a little bit of exploring some of the artist catalogue data. The idea was to find band names that were repeating words, such as ' Talk Talk ' and ' The The '. Coincidentally, I had a freshly installed database server with just this sort of information on it, and needed a good excuse to stress test it a little. PostgreSQL's regular expression support is brilliant , and it was a very trivial exercise to quickly knock up a query that returned promising data. In the process of refining it, I got a chance to play around with the Hadoop cluster. I wrote the whole thing up over on the company blog, if you'd like further details. Fame fame fatal fame, it can play hideous tricks on the brain, as the song goes .