My blog has moved!

You will be automatically redirected to the new address, all posts have been transferred from this blog. Use site search to find them. If that does not occur, visit
http://www.ianhopkinson.org.uk
and update your bookmarks.

Saturday, May 29, 2010

That's nice, dear

This blog post is about programming, for people that don't program - at least that's the effect I'm aiming for. The title is in recognition of my tolerant wife, The Inelegant Gardener, who has learnt the appropriate response to my enthusiastic displays of the results of my programming: "That's nice, dear"!.

I started programming a long time ago - in around 1980, at the school computer club, when I was 10. Since then I've been taught odd bits of programming by scientists, and done quite a lot of programming as part of my scientific job. I've started to get more interested in proper software engineering in the last few years. This is a roundabout way of saying I am an enthusiastic amateur.

People associate programming with the mathematically minded, but this isn't necessarily the case: the codebreakers at Bletchley Park, who were amongst the first users of electronic computers, had a range of skills - amongst them were linguists and crossword wizards. I was talking to a Fellow in linguistics, who'd helped write his college's library software - as he pointed out: a very logical view of language is a great benefit for a programmer. Programming is about giving an idiot very exact instructions, if the instructions concern maths then you need to know maths - otherwise you don't.

The core of programming is still what I learned years ago, data (numbers or letters) is stored in "variables" that have names. There are conditional statements: "If [something is true] Then [do this] or else [do the other]". There are looping statements: "Do this 100 times". And there are functions: "add 2 to this number, square it, add the number you first thought of and tell me the answer" or "how many times does the letter a occur in this sentence".

These simple statements are being buried under an increasing depth of additional ideas. Since the 80's the big thing in programming has been "object-orientation". In object-orientated programming you package up data of a particular sort with functions that relate to that data. So if you had data modelling an octopus you would include functions such as "wave-tentacles" and "change colour", such functions would be useless for data describing a horse. The real benefit to this is comprehending larger software systems, because a sea of functions and data is grouped together into logical islands. Beyond this there are design patterns - reoccurring systems of objects which I haven't entirely go the hang of.

In addition to the changes in language, there are changes in the tools used to program: syntax highlighting is nice, it amounts to colouring the verbs, nouns and proper names in programming in different colours - makes it easier to spot mistakes. Auto-completion is another handy tool, in a well-designed language there are only a limited number of next possible statements when you are programming - auto-completion presents you with them as you type. Sites like Stackoverflow are great for asking programming questions, and there no end of function libraries available on the web to help you out.

I have a number of little software projects on the go, you can see them in much the same way as woodworking projects, suduko or crosswords: they keep me out the way, muttering quietly to myself and exercising my brain. It doesn't matter that what I'm doing isn't groundbreaking and new.

Programming does lead to some odd habits; when I started programming it was useful to know binary and hexadecimal number systems, as a consequence I believe that numbers such as 1024 and 128 are nice and round. I've come to appreciate a wide range of bracket styles [] (){} since they are all used for different things and the semi-colon is one of the most important pieces of punctuation in my life. If I program for too long in a stretch I start to forget how to speak to people.

And just to show off the results of my latest fiddlings: maps of the UK election results. I got interested in doing this just after the General Election. The Guardian has published a lovely spreadsheet of election results, including data on every single candidate. You see lots of maps of data of this sort, I wanted to know how it was done. (Technical details beyond the maps.)

First of all the gender of MP's by constituency: constituencies represented by ladies are marked pink, those by men marked blue:


The black constituency in northern England is Thirsk and Malton, which held its election on 27th May, following the death of one of the candidates during the general election campaign.

The population of each constituency is also interesting, here I have coloured the constituencies with 9 different shades of green, the palest shade corresponds to a voting population of between 20,000 and 30,000, the darkest shade corresponds to a population of between 100,000 and 110,000:

The Western Isles (now known by it's Gaelic name: Na h-Eileanan an Iar) has the smallest population at about 22,000 and the Isle of Wight has the largest population with just under 110,000 potential voters. I used ColorBrewer to find a nice set of colours.

Finally here's a map of which party came second in each constituency in the 2010 General Election:

Red for Labour, blue for the Conservatives, orange for Liberal Democrats, yellow for Scottish Nationalists, pale green for Plaid Cymru, dark green for Sinn Fein, blue for Ulster Conservatives and Unionists, and there are a few independents and minor Northern Island parties which are all coloured white. 

Footnotes

So the task is to get the spreadsheet data into a map: To get started I did a bit of memory trawling and googling, a couple of people have written about colouring in maps: this one uses shapefile format map data and the R programming language, whilst this one uses SVG format map data and Python (another programming language). It turns out the shapefile format data for constituencies is a little difficult to get - you have to fill in forms! However enterprising people on Wikipedia have made SVG format constituency maps available. SVG stands for Scaleable Vector Graphics, it's an XML format which means it's plaintext and there are standard means to extract data from it and manipulate it. The only real problem is that the constituency names in the spreadsheet don't exactly match the names inside the SVG format map - I had to resort to some horrible constituency by constituency coding for a load of them. To do this I used the C# programming language, largely because Visual Studio Express C# is a very nice, free development environment which I've used before. To view the SVG maps inside my application I used the Webkit .NET library to provide a webbrowser control (which wraps up the rendering engine used in the Safari and Google Chrome browsers) - the native C# webbrowser control is based on Internet Explorer - which doesn't render SVG. Output to bitmaps is a bit clumsy, Inkscape (a free SVG editor) wasn't keen on displaying the original constituency map, so I resorted to viewing the map in Google Chrome and taking a screen shot (a terrible bodge).

5 comments:

Lesleyalmost said...

As someone who works in IT though not techie any more, I really liked your post. I do love and have always loved logic but I am your polar oppostite, my blogging habit is nothing to do with logic at all.

Twho things of note to mention about your blog; you made me feel very old when you talked about computer club in 1980 when you were 10 and the use of semicolons. The humble semicolon is probably my favourite punctuation mark as it allows my rambling, meandering sentences to have a little stucture......

Rebecca Sutherland said...

Umm... you're very clever and although I read it all I don't really understand the programming bit much. I tend to over-think stuff like that then get into a terrible muddle. We all have different brains and varying levels of skills. For example, I know I understand pictures well, so the function of reading this data from your illustrations is easier for me than if it were a table or graph.

I'm glad that people do enjoy doing this sort of thing. Without them I wouldn't be writing this and I certainly wouldn't have read your article.

Perhaps I could mention though, I think your questions are as interesting as the way you visualise your answers.

SomeBeans said...

@Lesleyalmost in a way I'm trying to be a bit less logical in most my blog posts - looks like I haven't quite got there! I quite like ellipsis in informal writing, which are used in some programming languages...

@RebeccaSutherland I spend quite a lot of time as a scientist trying to visualise things in the right way to find what I'm looking for. Drawing conclusions from looking at raw numbers is difficult for most people I know.

The questions I asked of the electoral data are the things I was curious about and are topical. I was wondering with the gender thing whether geography was important - it seems to me that the constituencies represented by women are more likely to be adjacent. There's probably a more rigorous test for this...

Constituency size and second placings are relevant to discussions on electoral reform.

nemski said...

As a data geek and IT professional, your post was great. I thought it was spot on. Your paragraph on variables, loops and functions was simple -- a huge compliment.

SomeBeans said...

@nemski thank you for your kind compliment!