beatworm.co.uk

There is a top level navigation menu at the foot of the page

Manic data miner

The other day at work, prompted by a shoutbox conversation with one of our users, I did a little bit of exploring some of the artist catalogue data. The idea was to find band names that were repeating words, such as 'Talk Talk' and 'The The'. Coincidentally, I had a freshly installed database server with just this sort of information on it, and needed a good excuse to stress test it a little. PostgreSQL's regular expression support is brilliant, and it was a very trivial exercise to quickly knock up a query that returned promising data. In the process of refining it, I got a chance to play around with the Hadoop cluster. I wrote the whole thing up over on the company blog, if you'd like further details. Fame fame fatal fame, it can play hideous tricks on the brain, as the song goes.

posted Wednesday, October 13, 2010 at 10:16 by cms in computers, music, programming | Comments Off

Building python extensions on Snow Leopard

I ran into some problems while I was trying to install python bindings for the Growl notification framework on my MacBook Pro. My Mac is running the current release of Snow Leopard ( 10.6.4 ) and I'm using a python.org installed binary package of python, under /usr/local/python. Building using distutils and the supplied setup.py failed, seemingly because the compiler was unable to find quite routine include files, such as stdarg.h and float.h.

/Developer/SDKs/MacOSX10.4u.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: No such file or directory

This error message both confused and perturbed me, because stdarg is a fairly fundamental component of a working C library, and I am pretty certain that my compiler isn't that fundamentally broken.

Picking apart the build output from the generated Makefile, I see that it is setting the -isysroot gcc flag, to /Developer/SDKs/MacOSX10.4u.sdk/. I presume this is because the python installation is built to use the OS X 10.4 compatability SDK. This is why it's pulling in /Developer/SDKs/MacOSX10.4u.sdk/usr/include/stdarg.h. That header is a stub, and included the following stanza


/* GCC uses its own copy of this header */
#if defined(__GNUC__)
#include_next

#include_next is a gcc extension to cpp, and instructs the preprocessor to start searching for the include file again starting with the next directory on the include path after this one. Standard libraries like stdarg and float can be quite compiler specific, and as the comment indicates, GCC is expected to have it's own copy of this header file, which would be put away somewhere under /usr/lib/gcc.

At this point, a nagging memory of building cocoa apps with XCode resurfaced, suggesting that the 10.4 SDK isn't compatible with gcc-4.2 ( the system default gcc under snow leopard ). GCC 4.0 is supplied though, for use with building against legacy SDKs. On this whim, I tried exporting CC=/usr/bin/gcc-4.0 and rebuilding, and everything worked as it should.

From inspection, it seems like the snow supplied leopard python is built to use 10.6 SDKs and gcc-4.2 and may well be a more sensible python to use. Further googling ducking, turned up this bug report.

posted Friday, August 6, 2010 at 17:50 by cms in computers, programming, python | Comments Off

Top 10 UNIX commands

Ever wondered what your most used shell commands are? Here's a very silly way I knocked up to find out mine.

history|perl -anle'$C{/\d+\s+sudo/?$F[2]:$F[1]}++;END{print map{qq|$_\[$C{$_}]\n|} sort{$C{$b}<=>$C{$a}}keys%C}'|head -10

Some people would have you believe that perl is difficult to read.

posted Wednesday, October 15, 2008 at 12:41 by cms in computers, programming | Comments Off

Django + SQLite , unfinalised statements on close

If you have a Django 1.0 deployment configured to use sqlite3, and are struggling to understand sporadic eruptions of what are clearly exceptions thrown by closing a database cursor with uncommitted work; either manage.py commands on the shell, or page requests to the application generate stack traces centered around messages like 'Unable to close due to unfinalised statements' then it might be a file permissions problem.

In my case, neither my developer shell account, nor the user id of the running apache httpd processes had write permissions to the directory with the sqlite3 database file. Not immediately apparent from the wording used in the error messages.

posted Friday, October 3, 2008 at 10:04 by cms in computers, programming | Comments Off

Using categories on Objective C classes in a static library

I have some Objective C classes that I've built for use in a project. They are model and utility classes, and have no direct UI responsibility. To aid in automated testing and debugging, I've built them as a project that creates a static library. The project has a test target that runs suites of automated unit tests, and a library target that builds a C-style static library archive binary. The install configuration of this target copies the library to $HOME/lib/, and the class headers to $HOME/include/$LIBNAME.

This way when I use these classes in another project, I can just #import the headers in the sources, add the static library to the project frameworks list, add the include and lib directories to the compiler and linker search paths in the XCode target inspector, and build as normal. Build times are reduced, base classes are frozen in a stable, well tested implementation, code re-use is easier, everybody wins.

Recently I broke this happy pattern, a little perplexingly, with what I thought was a fairly innocuous piece of refactoring. I noticed that one of my classes was rather a simple set container, and its implementation really little more than a thin wrapper around NSMutableArray, with only a trivial specialisation of behaviour. As it was only used within a parent class structure, with no interface outside the library innards, it seemed a bit of overkill to have it implemented as a sizeable class. The special behaviour really boiled down to maybe two additional methods on top of the normal array interface.

I first refactored it to be a subclass of NSMutableArray, but that actually introduced more complexity. NSMutableArray is implemented as a class cluster, with an abstract API around a private hidden shadow class. In order to subclass it, you are expected to provide your own implementations of a subset of its interface. In my case, this would have made for more code than the class I was trying to replace.

Of course, Objective-C allows you to define categories on any existing classes. Categories allow you to formally define and implement additional methods onto an existing class definition at compile time. I could re-implement my class as a tiny category on NSMutableArray, removing lots of my code, and reducing the size of my library footprint and perhaps add some value by introducing NSMutableArray's extensive interface.

Surprisingly, it wasn't plain sailing. Coding up the category, and tweaking the library to use NSMutableArray in place of the now-redundant class was straightforward. Once the updated code passed the original test suite, it was deployed as a library. The first time I built a project using it, it crashed on startup, with an unhandled exception. I cleaned all targets and rebuilt. Same problem. I checked the library headers to confirm that the new data structures were properly defined on include. No problems there, but still a hard crash on initialisation.

The system logs had an entry for the crash; selector not recognized attached to symbols that were recognisably the new array methods from my category. Running 'nm' against the library file showed the symbols present, and correctly defined as a category on NSMutableArray. I was stumped. After a bit of googling, I came up with the correct solution.

It turns out that in order to link against a static library that contains Objective C categories, you need to pass the linker a special flag, '-ObjC'. Adjusting the build settings of my project to include this flag in the 'Other linker flags' entry of the target inspector fixed it so that the symbols are correctly resolved at runtime. Here is the official word, Technical Q&A QA1490.

posted Wednesday, September 24, 2008 at 12:03 by cms in computers, programming | 3 Comments »

iTunes automation, revisited

Apple released iTunes version 8 this week, which introduced some excellent new features, such as Genius playlists, but broke the fancy perl script that I wrote to rotate my music library on my iPod touch.

While revisiting this, I took the opportunity to re-implement it, aiming to fix a few of it's faults, most specifically the terrible performance. I decided to use Python this time around, chiefly because of the existence of appscript, an apple event bridge with a nice syntax. Python's object and sequence semantics are a slightly better fit with AppleScript's data models, and appscript should be a more optimal solution than Mac::Glue for sending lots of messages iteratively.

I've also improved the actual command recipe, using 'duplicate' rather than 'add' to build the playlist seems more efficient. Also the overhead of having to periodically build glue modules with the 'gluemac' tool is removed. Sadly appscript isn't shipped with OS X, but installing it ( at least on Leopard ), is as simple as 'sudo easy_install appscript'.

The concept behind the tool is the same : use a nominated playlist to synchronise the albums with the iPod, and pick a random set of albums from buckets organised by album rating. Currently it's set to shuffle in 10 '2 star' albums, 20 'three star' albums, and 30 'four star' albums, selected from a 'just music' smart playlist that filters the master library, removing all spoken word, and podcasts and other miscellany from the pool.

Here's the source. I'm far less experienced at python than I am perl, so I wouldn't claim it was a particularly idiomatic solution. It does run many times more quickly than the perl / Mac::Glue solution, taking a minute or so, rather than the best part of an hour. I would put all the performance gains down to the AppleEvents bridge , appscript interface, and using more efficient apple event set operations, rather than iterating over individual data.

posted Sunday, September 14, 2008 at 11:15 by cms in computers, music, programming, python | Comments Off

Get xterm title string from the UNIX shell

Sometimes you run programs in xterm windows that try and do you a favour, by setting the xterm title property. Potentially useful enough, but aggravatingly some of them don't restore the previous title when they exit. If you're using some scheme of your own to set meaningful window titles, this is annoying.

Here's a shell one liner that you can use to grab the current title in an xterm. You could use this to write a wrapper script that gracefully launches any such rude application, and restores the rightful title property when it's done

/usr/X11R6/bin/xprop -id $WINDOWID | perl -nle 'print $1 if /^WM_NAME.+= \"(.*)\"$/'

posted Saturday, June 7, 2008 at 14:21 by cms in computers, programming | Comments Off

Simple accumulator in Quartz Composer

Another kind of iteration you often want to do when constructing programs, is to count things. Quartz composer provides the counter patch, which increments a running total when one of it's inputs switches from false to true. Similarly, it decrements the total whenever the signal to it's other input changes from false to true.

By generating a regular true/false alternating value, and connecting this up to the increment line, you could generate a regular count. This composition demonstrates one way to do this. Using the Patch Time patch, a count of time in seconds is passed through a modulo 2 operator to generate a regular sequence of alternate 1s and 0s. This is connected up to the increment line of the counter, which then counts upward in integers.
quartz composer counter generating stripe width

The counter value is used to govern the stripe width of a vertical stripe pattern. As the patch runs, the stripe width increases every other second. This is a very simple display, but the bit generator and accumulator demonstrated are useful in a variety of ways. You can download a copy of this patch here.

posted Sunday, February 10, 2008 at 13:25 by cms in computers, programming | 1 Comment »

Basic looping with Quartz Composer

Quartz composer is a visual programming tool from Apple that ships as part of the Developer tools with Mac OS X 10.4 or later. It presents a visual object-oriented programming metaphor around Quartz and Core Graphics that allows you to simply compose graphical effects by connecting inputs and outputs of different objects together, graphically.

You can use QC to build pipelines that respond to a variety of inputs, local or via peripheral interfaces to construct visualisers for a variety of source signals, such as MIDI, audio from the built in mic, video signals from an iSight camera, or even networked events from computers on your internet or LAN. It also can be used to procedurally generate graphics, which you can use to build fancy displays or screen savers. Some of the system screen savers that ship with OS X, like the 'word of the day' or the 'rss visualiser', are actually simple Quartz Composer scripts.

It's an impressive tool, and ships with documentation and some examples of what you can do. You can achieve nice effects quite quickly, but there is still a learning curve to climb. As an example, a common thing you might want to do when constructing simple animating displays, is loop over a set of possible outcomes. Iterators are a common piece of the vocabulary of programming languages, but it took me a little while to figure out how to achieve this with the 'box and string' interface of this tool.

Here is a simplistic solution solution I came up with. Read the rest of this entry »

posted Friday, February 8, 2008 at 13:44 by cms in computers, programming | Comments Off

Tracing Unit Tests with the XCode 3 Debugger

XCode has a nifty integrated debugger which is really a pretty wrapper around gdb. It lets you point and click, and drill down on things within the gui with ease, but still preserves access to the underlying raw gdb console and output. You can create breakpoints and watches, both literal and dynamic, step through your application as it runs, all the usual stuff.

I'm not the world's greatest user of debuggers. I'm more likely to trace through things until they make sense using some combination of logging, print statements, paper and pencil, or my absolute favourite, just explaining your mystery problem out loud to a nearby third party, embarrassing yourself by spotting the obvious bug mid-flow. That last one sometimes even works with the dog. Sometimes though, you're stumped, and you want to set some watchpoints, step through your program as it executes, or just generally prod things mid-run, and poke around under digital rocks.

Something I've been trying to practice recently is Test Driven Development. XCode 3 ships with support for the OCUnit testing framework built in. You can add a Testing target to your XCode project, and build up test case classes that use this framework, and the build tools know how to run these through the test harness. And so you progress, write a test for a feature, run the test harness, write code to pass the test harness, repeat. It's a great way of not only catching certain classes of bug before they happen, but perhaps more interestingly imposing a more minimal design focus on your application as you build it; you're automatically casting yourself more in the mind of a consumer of your application services, something I find really helps avoid over-design.

At some point though you are likely to run into some kind of hard to understand failure case within a unit test, and find yourself reaching for the debugger. And then finding that the debugger doesn't work. This is because the runtime of your unit testing target is actually the separate test harness framework, and not your application target. The test harness is a regular application that's dynamically loading your test classes and running them. In order to be able to use the IDE to debug your unit tests, you just need to do a little extra configuration within your XCode project, as follows.

Read the rest of this entry »

posted Thursday, February 7, 2008 at 15:56 by cms in computers, programming | 13 Comments »