Thursday, June 24, 2010

Some notes on SQL: 3 - changing a table

This is the third in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database. This post covers commands to modify the structure and contents of an existing database. No claim of authority is made for these posts, they are mainly intended as my notes on the topic.

UPDATE and DELETE allow the rows in a table to be either updated or deleted according to a select-like WHERE clause. This is UPDATE, acting on multiple columns:

UPDATE your_table
SET    first_column = 'newvalue',
       second_column = 'another_value'
WHERE  some_column = 'a test';


And this is the DELETE command:

DELETE FROM your_table
WHERE  some_column = 'a test';

In combination with the ALTER keyword, the following operations can be performed:
The CHANGE keyword allows the name and data type of an existing column to be changed.

ALTER TABLE project_table 
CHANGE COLUMN a_silly_column_name a_better_column_name VARCHAR(100), 
CHANGE COLUMN another_poorly_named_column a_better_name VARCHAR(30);

It's necessary to be cautious here because data loss can occur depending on the source and destination types, for example going from VARCHAR(100) to VARCHAR(30) could potentially lose 70 characters.
The MODIFY keyword allows the data type or position of an existing column to be changed.

ALTER TABLE my_table 
MODIFY COLUMN target_column VARCHAR(120), 
MODIFY COLUMN another_column AFTER target_column;

The ADD keyword allows new columns to be added to a table:

ALTER TABLE my_table 
ADD COLUMN new_column INT NOT NULL AUTO_INCREMENT FIRST;

The ADD and MODIFY keywords take position identifiers: FIRST, LAST and BEFORE, AFTER - which require a second column identifier as indicated in the MODIFY example.

In addition RENAME TO allows the table to be renamed:

ALTER TABLE poor_name RENAME TO good_name;

And DROP deletes a column:

ALTER TABLE my_table DROP COLUMN unwanted_column; 

Obviously you should use DROP COLUMN cautiously!

Keywords: ALTER, UPDATE, CHANGE, MODIFY, ADD, DELETE, AFTER

Book review: Botany of Desire

This post can be read in full here: http://www.ianhopkinson.org.uk/2010/06/book-review-botany-of-desire/

Yet another in my erratic series of book reviews cum notes. This time I'm reading "The Botany of Desire:A Plant's-eye View of the World" by Michael Pollan.

The introduction lays out the land of the book, sections on apples, tulips, marijuana and potatoes and the central thesis: that it's a useful idea to consider that not only do we domesticate plants but that in a sense plants naturalise us. As stated in the introduction this thesis felt a bit hardline, grating a little for my taste but once into the reading this feeling receded since the illustrative stories are enticing and nicely written.

First up, are is the story of apples in American and the folk hero, Johnny Appleseed, who travelled the mid-West, setting up ad hoc orchards from seeds, a little way in front of the settler-wave, and sold them trees as they moved into the area.

The point about apples is that they don't grow true from seed, take a fine apple and plant its seed and what you get is a lucky dip. This is a recurring theme, plants amenable to domestication appear quite often to be those amenable to quickly producing a wide variety. To grow "true" from an apple you need to graft from the parent onto a root stock. It's always struck me as something of a miracle that grafting works and that people managed to discover it.

Apples were significant to the early settlers since they offered sweetness (sugar would not have been very available), a sense of order when planted in neat orchards and cider. It seems cider played a big part in the popularity of Johnny Appleseed during his life, since the apples grown from seed were most often best suited to cider-making rather than eating. After he died the temperance movement gained strength in the US, and this aspect of apple cultivation was pushed into the background.

Despite the focus on Johnny Appleseed (and comparisons to Dionysus) the thing that will remain with me from this section is the descriptions of the wild apple forests around Alma-ata in Kazakhstan. You can get a flavour of the place from the fabulous images here, in an article in Orion Magazine and here, on the BBC website. These wild trees are important because they represent massive genetic diversity. The drawback of grafted plants is that they are genetically identical to their parents, so over time they become more and more susceptible to pests and diseases which evolve freely to take advantage of their stasis.

After the apples come the tulips, and Tulip Mania amongst the unlikeliest of enthusiasts: the Dutch. Tulips are a relatively recent addition to the pantheon of flowers, unlike the rose and the lily which appear in the Bible, tulips appear to have been introduced to Europe from Turkey in around 1550.

Interesting thought from this section: flowers became beautiful before there were ever humans to appreciate them - in a sense flowers are the result of the aesthetic decisions of bees (and other pollinating insects).

Tulip Mania was a speculative bubble in the Netherlands slightly before the middle of the 17th century wherein the prices paid for tulip bulbs skyrocketed, a single bulb fetching the equivalent of a acres of land or a fine townhouse, only to crash thereafter.

The flower in the picture to the left is Semper Augustus, emblematic of the most valued of the tulips during tulip mania. The interesting thing is that the most prized of these flowers - those that had "broken"  - were actually suffering the effects of a virus from which their line would eventually weaken and die. "Broken" refers to the variegated appearance with a dark colour, appearing in streaks on a lighter background. The modern Rembrandt tulips are similar in colouring but, according to Pollan, less impressive than the best of the virus "broken".

A common theme through all these stories is the large variability of the species from which the domesticated cultivars are drawn and the vulnerability of the much more uniform varieties once domesticated.

The third section is devoted to marijuana, clearly a plant for which the author has some fondness. Marijuana has long been cultivated for two reasons: one for fibre as hemp, and one for drugs. Since the early 80's and the American "War on drugs" marijuana production has been pushed underground, or rather indoors. Pollan recounts the story of the recent cultivation of marijuana by Dutch and American growers. The plant has undergone fairly rapid change in the last few years with the crossing of the large, traditional cannabis sativa and the more compact, frost resistant cannabis indicas. A substantial amount of work and horticultural ingenuity has gone into this process, leading to plants that can produce high yields of the active material in small, indoor spaces. The prize being the $13,000 that a hundred plants grown on a 6 foot square table can yield in a couple of months.

For Pollan there is an element of horticultural challenge in this process, he clearly grows a wide range of plants  in his own gardens (from each of the sections of this book) valuing the challenge and the diversity. The garden at SomeBeans Towers is similar: more a plantswoman's garden than a designer's garden.

He digresses at length on purpose of intoxication and whether drug taking really does open the doors of perception, or just lead to inane blithering, falling eventually for the former. There's an interesting section on the neuroscience of cannabis.

The book finishes with a chapter on potatoes, in particular on a genetically modified potato called NewLeaf which was developed by Monsanto to express the pesticide from the Bacillus thuringiensis bacteria (Bt). Organic certification schemes allow the limited 'manual' application of the Bt pesticide. In this chapter he visits various potato growers, spanning the ultra-technological to the organic. He highlights the dilemma that he finds GM potatoes more palatable than the non-organic equivalent when presented with the choice, in large part because the level of inputs, in particular fungicides and insecticides, to conventionally grown potatoes is very high.  His visit to an organic highlights something from the organic movement in which I'm in favour: which is a willingness to explore different methods of cultivation (and a wider range of cultivars), where I part company is where they say "There must be no X" where X is a somewhat arbitrarily drawn list, enforced with religious fervour.

The section also covers the history of the cultivation of the potato, from the wide variety in the mountain gardens of its native Peru, to its introduction into Europe as a favoured staple crop. Prior to the introduction of the potato bread was the staple food in Europe; wheat is somewhat fussy in its growing conditions particularly in Northern Europe and getting bread from wheat is quite an involved process. Potatoes, on the other hand, are less fussy on growing conditions and exceedingly simple to prepare for eating (stick in fire and wait, or if feeling extravagant: boil in water).

Overall I enjoyed this book, each section seemed to divide into two unlabelled parts one largely factual and one rather more philosophical - I preferred the more factual sections but appreciated the philosophical too.

Thursday, June 17, 2010

Some notes on SQL: 2 - Basic SELECT

Part 1 of this sequence of blog posts provided a preamble and showed how to create databases. This post introduces the basic SELECT command, which shows you what lies within your database and as it's name implies allows you to select only parts of the data contained within.

The basic form of SELECT is:

SELECT * FROM my_contacts
WHERE first_name = 'Anne';

* indicates that all fields should be returned from the table 'my_contacts', where the first_name field = 'Anne'. We don't have to take all the fields from a table:

SELECT first_name, last_name, email FROM my_contacts
WHERE first_name = 'Anne';

As well as the equivalence operator =, we can also use comparison operators <> (not equal), <, >, <=, >= these work not only with numerical values, but also with text values. WHERE clauses can also be combined with AND and OR operators.

SELECT drink_name FROM drink_info
WHERE calories >= 30
AND calories <= 60;


SELECT drink_name FROM drink_info
WHERE calories BETWEEN 30 AND 60;

The second select using the BETWEEN keyword is equivalent to the first.
In addition there are wildcards, % meaning 'any number of characters' and _ meaning 'one character' which are accessed via the LIKE keyword:

SELECT first_name FROM my_contacts
WHERE first_name LIKE '%im';

This first search will return 'Tim', 'Slim', and 'Ephraim'.


SELECT first_name FROM my_contacts
WHERE first_name LIKE '_im';

This second search will only return 'Tim'. NULL is special, nothing equals NULL but you can check if something is NULL:

SELECT first_name FROM my_contacts WHERE flag IS NULL;

Comparisons can be made to a list with the IN keyword:

SELECT drink_name FROM drink_info
WHERE rating IN ( 'good', 'excellent', 'average' );


Finally, the NOT operator can be used to find the inverse of the selection made, the NOT keyword goes directly after IN but otherwise goes after WHERE:

SELECT drink_name FROM drink_info
WHERE rating NOT IN ( 'good', 'excellent', 'average' );


SELECT first_name FROM my_contacts
WHERE NOT first_name LIKE '_im';

Keywords: AND, OR, BETWEEN, IS NULL, NOT, LIKE, IN

Wednesday, June 16, 2010

Some notes on SQL: 1 - creation

These are some notes on SQL a language for creating and querying databases, I'm learning it because a couple of programming projects I have in mind for work and home will need it. The source for these notes is  Head First SQL book. I've used a previous book in this series and I quite like the presentational style. I'm using MySQL via it's command line client to do the exercises, because it's about as straightforward as you can get. The code is formatted for display using this SQL to HTML formatter. Notepad++ recognises SQL as a language and will do syntax highlighting, so I type my commands into it and copy them into the MySQL command line client.

SQL is quite an old language and the convention is to write keywords in block capitals (reminds me of FORTRAN!). Command sequences are terminated by a semi-colon.

To start, this sequence creates a database, sets it as active and then adds a table containing a range of fields of different types, the command DESC shows the layout of a table:


CREATE DATABASE my_database;
USE my_database
CREATE TABLE contacts
(
contact_id INT NOT NULL,
first_name VARCHAR(20),
last_name VARCHAR(20),
birthday DATE,
life_story BLOB,
weight DEC(3, 2) NOT NULL DEFAULT 80.00,
state_code CHAR(2),
appointment DATETIME
);

Desc contacts;



NOT NULL keywords are used if a field must be specified on INSERT. Once created data can be added to the table using the INSERT command:

INSERT INTO contacts (contact_id,first_name,last_name,birthday,life_story,weight,
appointment)
VALUES
(1,'Ian','Hopkinson','1970-24-04','A very long text string',80.0,
'10:30am 2010-21-06'
);


INSERT can be used with no specified fields (in which case values for all supplied fields must be provided for all fields), or with a subset of fields. In order to add the ' character we can either use '' or \' (where '' is two single quote characters, rather than a double quote).

To delete a table:
DROP TABLE contacts;
This command should be used with care since it deletes the table whether or not it contains data. The next post should be on the SELECT command. 


Of course you can find SQL cheatsheets elsewhere.

Keywords: CREATE, DATABASE, TABLE, USE, DESC, DROP, INSERT INTO, VALUES, INT, VARCHAR, BLOB, CHAR, DATETIME, DATE, DEC

A sceptical look at the economy


This blog post was written partly because I'd got fed up with hearing about how all cuts were evil, without hearing an alternative plan. It has ended up more a collection of interesting data sources, and some mild ranting.

First to define some terms: The gross domestic product (GDP) is a measure of the total economic output of the country. It's handy because we can use it to compare any other number we come up with to see how big they are. £1billion may sound like a lot, but the GDP is £1.5trillion, so £1billion is a less than 0.1% of GDP.

The debt is the total amount of money that the state owes; the deficit is the annual gap between what the state takes in taxes and what it spends. A debt is sustainable in the long term but running an annual deficit above a certain size, for a period, is not sustainable. The deficit can be divided into two components: a cyclical component which goes up and down with current economic conditions and a structural component which is on top of this. The structural component is the bad bit. There is some dispute over the validity of this division since economic cycles are not easy to define.

As Mr Micawber says in David Copperfield: "Annual income twenty pounds, annual expenditure nineteen nineteen six, result happiness. Annual income twenty pounds, annual expenditure twenty pounds ought and six, result misery."

I'm a big fan of the Guardian's government spending chart (see the image at the top of page), it's a rather pretty way of seeing where government spending goes. For the year 2008/9 the total spending is £621bn, in this year the Treasury gets an enormous, anomalous amount in financial stabilisation (18%) - this will not recur in future years. Beyond this there's the Department of Health, spending about 18%, Department of Work and Pensions (22%) with the largest fraction of this going on the state pension, Department for Children, Schools and Families (10%). The key point here is that a very large chunk of the money spent is spent on things that people very vocally want (schools, health care, state pensions).

The figures for where tax comes from are perhaps a little surprising, extracting the data from table 1 in this report by the Institute for Fiscal Studies. The major chunks are shown in the piechart below:



The "other" category is made up of various minor indirect taxes (tobacco, alcohol etc) and capital taxes (3%). The surprising thing to me was the relatively low level of corporation tax. There seems to be evidence of tax avoidance by corporations amounting to something in the region of £10bn, but this would only be roughly 1.6% of the tax take.

By the way, as a physicist, I look down on piecharts!

The total tax take of around 36% of GDP puts the UK roughly in the middle of the OECD table of taxes, with countries like New Zealand and Germany taking very similar levels of tax, France, Italy and Finland taking rather more (at around 44%) and countries like Ireland, Switzerland and the United States taking rather less at ~30%. The full OECD data is here, and wikipedia has a sortable list for all the countries of the world. An interesting exercise is to consider this list, and think in which countries you might want to live.

Of income tax the top 1% of earners pay nearly 25% of all income tax, you can see the full breakdown in this article on the BBC website. Put another way, the 40% tax band covering perhaps 15% of tax payers provides nearly 40% of income (here). This does exclude National Insurance contributions though. I was surprised by these figures, I'd assumed that the relatively small number of higher tax rate payers would result in a much lower total take.

The net result of these incomings and outgoings is that we had a national debt of 68.1% of GDP at the end of 2009, and a total deficit of 11.4% of GDP. (According to the Office of National Statistics). According to the newly formed Office of Budget Responsibility  (table 4.5) the national debt (which they call net debt) is 53.5% this year and the deficit (net borrowing) is 11.1% of which 8.8% is structural. 

Numbers without context are meaningless: a priori I have no idea how these numbers for debt and deficit compare to the past. However, help is at hand: this report shows how they have varied over the past 100 or so years. I've copied the key figure for national debt below:


At the end of the second world war the UK had a national debt of around 250% of GDP, much larger than our current debt (and even our predicted debt over the next few years). Interestingly we see in the same report (figure 1) that the deficit is rarely negative (i.e income greater than expenditure), hovering around 2% (i.e. still a deficit) debt is still paid off via growth in the economy and inflation.

I suppose the purpose of all of this preamble is a discussion of cuts, or if you prefer tax increases. Prior to the election everyone seemed to agree on the size of the gap to be filled but none of the parties managed to fill more than 25% of the gap, as evidenced in this report by the Institute of Fiscal Studies, the Guardian's data blog had a nice breakdown of the measures proposed by the three main parties. The major political point of departure was when cuts should start (not if cuts would start), and my view prior to the election was that whoever won broadly similar levels of cuts would be made although there was some evidence that the balance between taxation and cutting would be different depending on party but since none of them revealed (or had) much of their plan it's rather difficult to say.  It is a very minority view that no cuts are required, although I see the unions are trying that one out today along with threats if there are any cuts.

YouGov,  for the Sunday Times, helpfully asked the public:
"The government has asked for public advice on where it should cut public spending. Which, if any, of the following areas do you think should be targeted for cuts? Please tick up to three". 
And the public demonstrated that if you ask a stupid question, you get a stupid answer (or, being generous to the public, if you ask a question without providing contextual information you get a stupid answer). I considered trying to find a fancy way of presenting this information, but in a nutshell: by far the most popular area for cuts (61%) is in international aid whose total budget is 0.8% of total spending (i.e. pretty much the smallest bit of the budget you can find).

To be fair to the public, many of them will be working in the private sector and will have variously experienced pay cuts (or at least freezes), reduced working, redundancies, budget reductions and frozen recruitment and they may well be feeling it's someone else's turn.

You can experiment with cuts yourself with this handy tool from the FT, have a play and think about how you'd stand up and justify the cuts you've made. To paraphrase Polly Toynbee: "Don't be young, old, vulnerable, one of our brave boys, sick etc".

The alternative to cuts are tax increases, but nobody seems keen to talk about them. Prior to the election there was a report stating that the deficit was equivalent to about 6p on the basic rate of income tax. Proposals to raise tax were normally described as a "Tax on jobs" or "Death tax", which is unhelpful to say the least. Another popular idea is to tax the bankers, one option here is the Tobin or Robin Hood Tax which puts a small tax (typically fractions of a percent) on every financial transaction, because there are very many of these transactions potentially the amount raised could be large this would seem to require international coordination and it isn't clear where the money raised would be spent (climate change, international aid, fund for future bank collapses have all been suggested). The banking sector contributes approximately £70bn to GDP, or 6.8%. The structural deficit isn't about any money spent rescuing banks though, it's about an ongoing gap between spending and taxation.

My personal view is that we should be talking about taxation, and where the balance between cuts and increased taxation should lie (currently it looks like 80/20 cuts to taxes). There should be some discussion of where tax rises are best levied : "on someone else" isn't really a proper answer. Income tax seems like the best place to me, probably at basic rate with uplift of the lower threshold to protect some of the least well off, but possibly lowering the threshold to the upper tax bracket. In the longer term making the public sector more flexible to economic hardship would be nice, this time there seems to have been much more flexibility in how companies have approached recession - not necessarily painless, but better than losing your job. One element of this could be variable pay in the public sector (or bonuses as we colloquially call it) this provides two things to an employer: the ability to vary pay when income to the company is poor and some decoupling of current salaries from pension entitlements (since bonuses are typically not counted towards pension payments).

So to end on a happy note: I propose bonuses for the public sector!

* Update: hat-tip to AlexConner who pointed out that it is Mr Micawber not Uriah Heep who is responsible for the quote from David Copperfield

Thursday, June 10, 2010

Compare and contrast

I thought I might try describing my job as an academic in a physics department, and comparing that to my current work as an industrial scientist.

Some scene setting: in the UK undergraduates are students who study taught degree courses lasting 3-4 years, typically they start at age 18 or 19. Postgraduates are studying for PhD's, research courses lasting 3-4 years (after which research councils start getting nasty). After PhD. level there are postdoctoral workers who typically do contract research lasting 2-3 years per contract - they may do multiple contracts at an institution but it's a rather unstable existence. Permanent academic staff are lecturers, senior lecturers, readers and professors in increasing order of seniority/pay.

As a lecturer-level academic, the shape of the year revolves around teaching, if not the effort involved. Undergraduate students start their year in September, with breaks over Christmas and Easter followed by exams in May/June. The teaching year amounts to about 30 weeks. Should you be lecturing the students, you will spend time preparing and giving lectures; how long this takes depends on your conscientiousness, the number of times you have lectured the course and the number of other things you have to do. In addition you will probably give tutorials, small groups of students working through questions set by other lecturers, practical classes and manage final year undergraduate projects and literature surveys. Compared to a school teacher or further education college lecturer your "contact" time with students will be relatively low - maybe 10 hours a week.

Final year projects are of particular interest to you as a researcher since there's always vigorous competition amongst academics to attract the best undergraduates to do PhD.'s as postgraduates. A final year project done by a good student can be an excellent way to try an idea out. To be fair to students though, their performance in a final project and talking about that final year project can be the strongest part of a CV - since it demonstrates the ability to work individually in an unknown area.

In between undergraduate teaching there's grant application writing, doing research of your own, writing papers, and then, come the end of term, the possibility of conferences.

In the end it was the apparently endless futility of writing grant applications which did for me as an academic. My success rate was zero, furthermore I had this terrible feeling that even after successfully winning a grant I would struggle to recruit postdocs or PhD students to do the work and there was little chance that having started a fruitful line of research there would be a good chance of continuing it with further successful grants.

I was recruited to my current company by a recruitment agency, who found my webpage still hanging around at Cambridge University a couple of years after I had left. I didn't actually end up doing the job they nominally recruited me for but what I do is relevant to my research background and can be rather interesting.

I turned up to my new workplace on the Friday before I started and was shown my desk - in a shared office. I did wonder at that point whether I had done the right thing, back in academia I had an office roughly the size of a squash court and could go days without seeing anyone. As it turns out sharing an office isn't too bad, you get to find out what's going on, but it can be a pain when your neighbour decides to have a long, detailed meeting next to you.

Another novel aspect to working in industry is that someone seems interested in my career within the company. In getting on for 15 years as an academic I can remember rarely ever talking about my career with anyone who might have influence on its direction whilst in a company it's at least an annual occasion. It's true that the company's enthusiasm for management-speak can be excessive (and changeable) as new human resources fads come and go.

I get to go to lots of meetings.

Relevant to current discussions on the public sector we have regular restructuring, and in the past year or so: pay freezes, arbitrary cuts in travel budget mid-year, a change to pensions for new recruits, and redundancies - the latest round equivalent to losing about 15% of the people on the site I work at. It's fair to say that we are not necessarily models of efficiency internally: I heard on the news that it takes 5 signatures for someone in the NHS to buy a new bed costing about £1000 - sounds about par for the course.

One noticeable difference is that largely I feel much more wanted, inasmuch that if I'm put on a project then the project leader will be keen to get some sort of intellectual exertion on my part and will even appear quite pleased when this is achieved. Even better, people for whom I do "a bit on the side" are even more grateful. This is a big difference from being an academic, where the odd student (undergraduate or postgraduate) may appreciate your efforts but largely nobody shows much sign of caring about your research.

Looking back on my time as an academic: I think I would have benefited from some sort of master plan and career direction. I'd quite liked to have carried on as a postdoc, i.e. actually doing research work rather than trying to manage other people doing research. However, this isn't a career option and is a rather unstable existence.