ironic cog

Upgrading Fedora Linux 20 (Heisenbug) to 21 & 22

2015-09-06T09:59:00.000-07:00

Although I've recently switched to Ubuntu for my work, I'm still using Fedora (dual-boot with Windows 7) on my home workstation. I've been running Fedora 20 ("Heisenbug", horrible name) since mid-2014 so my installation was pretty aged - and with the release of Fedora 22 in May has been end-of-life'd since June this year. So it was definitely time for an update.

When I moved previously to F20 I did a fresh install (although I was able to keep my /home from the previous F16 installation). However this time I decided to go with an upgrade using FedUp (FEDora UPgrader), especially attractive since it promised to also update the installed packages - meaning that I wouldn't need to start over again with installing and configuring all the applications and tools that I use.

The upgrade procedure is covered pretty comprehensively in the FedUp documentation at https://fedoraproject.org/wiki/FedUp. As I was two releases adrift I decided to perform the upgrade twice (i.e. going from F20->F21->F22) as recommended. This seemed the safest route, especially given two of the significant changes introduced with each of these realeases, specifically:

Fedora 21 introduced the idea of "products", which can be thought of as sub-releases specifically targeted for particular user needs. When upgrading from F20, you need to specify which product sub-release you're moving to: the available options are "workstation" (which looks like the best choice for desktop use), "server" or "cloud".
Fedora 22 switched from the old yum package manager to a newer version called dnf ("dandified yum") - see https://fedoraproject.org/wiki/Changes/ReplaceYumWithDNF

(As an aside, another change is that from F21 the releases will no longer have names - so F21 is just "Twenty One", and F22 is "Twenty Two".)

The upgrade process for each revision was then:

1. Install the fedup package:

sudo yum install fedup

2. Perform all updates & backup your system (my preference is still for CloneZilla Live)

3. Perform a network-based upgrade (the recommended method over e.g. upgrading from an ISO image):

sudo yum update fedup fedora-release

sudo fedup --network 21 --product=workstation

(As noted earlier, for F20-to-F21, you need to specify which product line you are moving to - hence the --product option above. For F21-to-F22 the second line can be reduced to simply sudo fedup --network 22)

4. Reboot and select "System Upgrade (fedup)" from the GRUB menu, which will boot into an upgrader environment and then churn through all the packages. (Note that this can take some time: for F20-to-F21 this has a textual display and for me the screen would go black periodically during this process, however moving the mouse seemed to restore the display. For F21-to-F22 there is a nice graphical progress bar; using F1 toggled between this and the textual display.)

5. If the previous step completes successfully then the system will reboot automatically and there should be an option for the new Fedora version in the boot menu. Select this to boot into the new version and check that it looks as you expect.

At this point the documentation outlines some post-upgrade clean-up actions that need to be performed manually:

1. Rebuild the RPM database:

sudo rpm --rebuilddb
sudo yum distro-sync --setopt=deltarpm=0

2. Install and run rpmconf to identify packages that have more recent versions, and interactively decide what to do for each:

sudo yum install rpmconf

sudo rpmconf -a

(After upgrading to F22 replace the first line with
sudo dnf install rpmconf
However for me when running the subsequent step, rpmconf crashed out with an error about a missing file, so I wasn't able to complete this step.)

Overall the process was pretty painless, and after a week or so I haven't noticed any issues (other than my continued inability to get a working version of Google Chrome - but that problem predates this upgrade), with everything appearing to work at least as well as before.

Also while Fedora 22 looks very much like Fedora 20, there have been some visual tweaks which I feel make it a bit nicer to look at than before (however this is perhaps a personal thing - see Chris Duckett's A month with Fedora 22 leaves me hungry for 23 for a counter-view), and for now I'm enjoying my "refreshed" Fedora experience.

Checking the RAID level with MegaCLI on Ubuntu 14.04

2015-08-26T13:33:00.001-07:00

I recently installed Ubuntu 14.04 on a new PC (Dell Precision Tower 7910) and wanted to check the RAID configuration that the system had been set up to use. Unfortunately none of the BIOS or system configuration menus seemed to give me access to this information.

Luckily a colleague pointed me towards MegaCLI, a utility for obtaining information about (and troubleshooting) LSI RAID controllers. Instructions on how to install and use MegaCLI are given in this Nerdy Notes blog post: http://blog.nold.ca/2012/10/installing-megacli.html. However as this dates from 2012, doesn't cover use on Ubuntu, and doesn't include information on how to interpret the outputs from the utility, I've written up what I did below.

0. Check you're using a MegaRAID controller

Directly from Nerdy Notes: do

lspci -nn | grep RAID

and check that the output for MegaRAID, e.g.

02:00.0 RAID bus controller [0104]: LSI Logic / Symbios Logic MegaRAID SAS-3 3008 [Fury] [1000:005f] (rev 02)

1. Obtain and install MegaCLI

The first problem is actually obtaining a copy of MegaCLI. According to the Nerdy Notes post, it should be available for download by searching on the LSI website, however when I tried this the download link was non-functional. Subsequently I've discovered that it can be located by searching on the Avago site (go to http://www.avagotech.com/ and enter "megacli" into the search box).

Opening the downloaded zip file reveals a set of subdirectories with versions of the utility for different OSes, with the Linux directory only holding a "noarch" RPM file. This should be straightforward to install using yum on Redhat-based systems (such as Fedora), but for Ubuntu it's necessary to extract the MegaCLI executables using the rpm2cpio utility:

rpm2cpio MegaCli-8.07.14-1.noarch.rpm | cpio -dimv

(The rpm2cpio utility can be installed via the rpm2cpio package using Synaptic or apt-get.)

This should pull out the MegaCli64 executable. Note that it doesn't need to be installed in any special location, you can run it from wherever you extracted it to using the above command, however you need to run with superuser privileges otherwise the output is blank.

2. Run MegaCli6 to probe the RAID information

To get information on the adapter and see which RAID levels the system supports, do:

sudo MegaCli64 -AdpAllInfo -aAll

and grep the output for "RAID"; this should give you output of the following form:

RAID Level Supported : RAID0, RAID1, RAID5, RAID00, RAID10, RAID50, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span

Then to get information on which RAID level is actually being used:

sudo MegaCli64 -LDInfo -Lall -aAll

and again grep for "RAID":

RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3

This doesn't explicitly name a "standard" RAID level, but I found this ServerFault post which gives details on how to interpret the output: http://serverfault.com/questions/385796/how-to-interpret-this-output-from-megacli

In this case "Primary-5, Secondary-0, RAID Level Qualifier-3" turns out to be equivalent to RAID 5
(https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5), which is suitable for my needs - so I was able to rest a bit easier.

I'd like to acknowledge my colleague Ian Donaldson for tipping me off about MegaCLI.

Enabling correct monitor drivers on Ubuntu 14.04 LTS

2015-08-24T12:42:00.003-07:00

Having recently installed Ubuntu 14.04 LTS on a new PC, I had some troubles configuring the display to operate correctly with a Dell 24" monitor: specifically, the monitor aspect ratio of 1920x1200 (16:10) didn't seem to be supported by the default install from the Live CD. In fact it was more than a little unstable (to say the least: for example, attempting to change the display aspect ratio rendered the display unusable several times, necessitating a complete reinstall to recover).

This appeared to be a missing driver issue, but it was difficult to find out what drivers were needed. Searching on the web for the specific problem was rather fruitless, and the many suggestions to look at xrand (for example https://wiki.ubuntu.com/X/Config/Resolution) didn't work for me either. In the end I stumbled across a post which suggested using the command:

ubuntu-drivers devices

from a terminal, which shows all devices which need drivers, and which packages apply to them. For example for my system the output looks like:

== /sys/devices/pci0000:00/0000:00:02.0/0000:03:00.0 ==
vendor : NVIDIA Corporation
modalias : pci:v000010DEd000013BAsv000010DEsd00001097bc03sc00i00
driver : nvidia-346-updates - distro non-free
driver : nvidia-346 - third-party free
driver : nvidia-340 - third-party free
driver : nvidia-352 - third-party free
driver : xserver-xorg-video-nouveau - distro free builtin
driver : nvidia-349 - third-party non-free
driver : nvidia-355 - third-party free recommended
driver : nvidia-340-updates - distro non-free

When I ran this it marked 'nvidia-349' as the recommended driver (nb the output above is from a subsequent run of ubuntu-drivers and doesn't show the mark). This was easily installed using:

apt-get install nvidia-349

and appeared to solve my issues. Unfortunately I haven't been able to find the post again that originally suggested this, however if you are experiencing similar resolution issues then it might be worth a try before grappling with xrand.

Experiences installing Ubuntu 14.04 LTS on UEFI system with Boot Repair

2015-08-23T12:05:00.000-07:00

I recently set up a new desktop PC (Dell Precision Tower 7910) for my work computer. For this new machine I've decided to move away from Fedora Linux (which I've used for the past four years) and go with the latest Ubuntu LTS release (14.04 "Trusty Tahr"). (While I've enjoyed using Fedora, the relatively high turnover of releases has been a nuisance - so the thought that I'd only need to do one install over the lifespan of this machine was an attractive one.)

Although this was a new machine with Windows pre-installed, I decided to trash this and install Ubuntu over it as a single boot setup(my preference is to run Windows inside a virtual machine on the Linux box, rather than dual booting). The system was also already set up to use UEFI (Unified Extensible Firmware Interface, a replacement for traditional BIOS firmware), however according to the official Ubuntu documentation for UEFI (https://help.ubuntu.com/community/UEFI) for a single boot system either UEFI or "legacy" boot modes can be used.

So if you don't really care which mode is used then the section on Installing Ubuntu for Single Boot with a Random Boot Mode suggests that the easiest approach is basically just try and install from the Live media, and see what happens. This was the approach that I took, installing from the Live CD and setting up my own partitioning scheme with /home separated from the rest of the system.

Initially attempting to reboot post-installation failed with no bootable devices. However it's possible to repair this quite easily and complete the installation using the Boot Repair utility. The basic recipe for obtaining and running it is summarised in the post at http://askubuntu.com/a/604623 - but unfortunately the version from the repo given (yannubuntu/boot-repair) didn't seem to work for Ubuntu 14.04 - instead it's necessary to get it from a different location (kranich/cubuntu; see http://ubuntuforums.org/showthread.php?t=2277484).

Based on the above posts, I restarted from the Live CD and selected "Try Ubuntu" to boot from the CD. Once the system was ready I opened a terminal window and did:

sudo add-apt-repository ppa:kranich/cubuntu
sudo apt-get update
sudo apt-get install -y boot-repair
sudo boot-repair

which brings up the Boot Repair GUI. I selected the "Recommended Repair" option, and once this had completed I restarted the system again without the Live CD. This time Ubuntu booted okay directly from the hard drive, and the installation was complete.

Addendum: it looks like the system ended up being installed in UEFI mode, which according to the Ubuntu docs is indicated by the existence of the /sys/firmware/efi/ directory on the hard drive.

Keeping a GitHub fork up-to-date with the original repo

2015-08-15T05:44:00.003-07:00

Forking is a standard way to make changes to a third party repository on GitHub. Typically a developer makes a fork (essentially a clone on the GitHub server side) of someone else's repo in order to make their own changes to the project. These changes might be for private use, or they could subsequently be submitted back to the original repo for evaluation and possible inclusion via the pull request mechanism.

However: once the fork is created, any further commits or other changes made to the original repo will not automatically be reflected in the fork. This post describes how to keep the fork up-to-date with the original repository (which is normally referred to as the upstream repo) by pulling in the changes manually via a clone of the fork, using the process described below.

The update procedure is:

0. Make a clone of your fork onto your local machine (or use one you've already made), and move into the cloned directory.

1. Add a new remote repository to the clone which points to the upstream repo. The general syntax to do this is:

git remote add upstream https://github.com/ORIGINAL_OWNER/ORIGINAL_REPO.git

A remote is simply a version of the project which is hosted on the Internet or elsewhere on the network, and a repo can have multiple remotes (use git remote -v to list all the ones that are defined; you should that origin is also a remote). Conventionally the remote used for this syncing is given the name upstream, however it could be called anything you like.

For example:

git remote add upstream https://github.com/fls-bioinformatics-core/genomics

This step only needs to be done once for any given clone.

2. Update the local clone by fetching and merging in any changes from the upstream repo, using the procedure:

git fetch upstream
git checkout master
git merge upstream/master

If you have made your changes in your own dedicated branches and avoided making commits to the fork's master branch then the upstream changes should merge cleanly without any conflicts. (It's considered best practice that changes made to the fork should always be done in a purpose-made branch; working directly in master makes it harder to make clean pull requests and can cause conflicts when trying to merge the upstream changes into your fork.)

3. Push the updated master branch back to your fork on GitHub using:

git push origin master

Once again, if you have avoided committing local changes to master then the push should be drama-free.

Finally, you will need to repeat steps 2 and 3 in order to stay up to date whenever new changes are made to the upstream repo.

Book Review: "Instant Flask Web Development" by Ron DuPlain

2013-12-09T08:42:00.003-08:00

"Instant Flask Web Development" by Ron DuPlain (Packt Publishing http://www.packtpub.com/flask-web-development/book) is intended to be an introduction to Flask, a lightweight web application framework written in Python and based on the Werkzeug WSGI toolkit and Jinja2 template engine.

The book takes a tutorial style approach, building up an example appointment-management web application using Flask and introducing various features of the framework on the way. As the example application becomes more complicated, additional Python packages are covered which are not part of the Flask framework (for example SQLAlchemy for managing interactions with a database backend, and WTForm for handling form generation and validation) along with various Flask extensions that can be used for more complicated tasks (for example managing user logins and sessions). The final section of the book gives an overview of how to deploy the application in a production environment, using gunicorn (a Python WSGI server) and nginx.

Given its length (just short of 70 pages) the book is quite ambitious in the amount of ground that it attempts to cover, and it's quite impressive how much the author has managed to pack in whilst maintaining a light touch with the material. So while inevitably there is a limit to the level of detail that can be fitted in, there are some excellent and concise overviews of many of the topics that could act as excellent starting points for more experienced developers (for me the section on SQLAlchemy is a particular highlight). Overall the pacing of the book is also quite sprightly and conveys a sense of how quickly and easily Flask could be used to build a web application from scratch.

The flipside of the book's brevity is that it cannot possibly contain everything that a developer needs to know (although this is mitigated to some extent by extensive references to online documentation and resources). In this regard it is really more a showcase for Flask, and is best viewed as a good starting point for someone wishing to quickly get up to speed with the framework's potential. I'd also question how suitable this is for newcomers to either Python, or to web programming in general - I felt that some of the concepts and example code (for example the sudden appearance of a feature implemented using Ajax) might be a bit of a stretch for a novice. Also there are some occasional frustrating glitches in the text and example code which meant it took a bit of additional reading and debugging in places to get the example application working in practice.

In summary then: I'd recommend this book as a good starting point for developers who already have some familiarity with web application development, and who are interested in a quick introduction to the key components of Flask and how they're used - with the caveat that you will most likely have to refer to other resources to get the most out of it.

Disclosure: a free e-copy of this book was received from the publisher for review purposes; this review has also been submitted to Amazon. The opinions expressed here are entirely my own.

Software Carpentry Bootcamp Manchester

2013-04-19T12:27:00.000-07:00

I've spent the last two days as a helper at a Software Carpentry Bootcamp held at the University of Manchester, and it's been really interesting and fun. Software Carpentry is a volunteer organisation and runs the bootcamps with the aim of helping postgraduate students and scientists become more productive by teaching them basic computing skills like program design, version control, testing, and task automation. Many of the materials are freely available online via the bootcamp's Github page: along with transcipts of some of the tutorials there are some excellent supporting materials including hints and tips on common Bash and editor commands (there's even more on the main Software Carpentry website).

The bootcamp format consisted of short tutorials alternating with hands-on practical exercises, and as a helper the main task was to support the instructors by offering assistance to participants if they found themselves stuck for some reason in the exercises. I'll admit I felt some trepidation beforehand about being a helper, as being put on the spot to debug something is very different to doing it from the relaxed privacy of my desk. However it turned out to be a both very enjoyable and very educational experience; even though I consider myself to be quiet a proficient and experienced shell and Python programmer, I learned some new things from helping the participants both with understanding some of the concepts and with getting their examples to work.

There were certainly lots of fresh insights and I learned some new things from the taught sessions too, including:

Bash/shell scripting: using $(...) instead of "backtick" notation to execute a command or pipeline within a shell script;
Version control: learning that Bitbucket now offers free private repositories (and a reminder that git push doesn't automatically push tags to the origin, for that you also need to explicitly use git push --tags);
Python: a reminder that slice notation [i:j] is inclusive of the first index i but exclusive of the second index j, and independently that string methods often don't play well with Unicode;
Testing: a reminder that writing and running tests doesn't have to impose a big overhead - good test functions can be implemented just with assert statements, and by observing a simple naming convention (i.e. put tests in a test_<module>.py file, and name test functions test_<name>), Python nose can run them automatically without any additional infrastructure.
Make: good to finally have an introduction to the basic mechanics of Makefiles (including targets, dependencies, automatic variables, wildcards and macros), after all these years!

As a helper I really enjoyed the bootcamp, and from the very positive comments made by the participants both during and at the end it sounded like everyone got something valuable from the two days - largely due to the efforts of instructors Mike Jackson, David Jones and Aleksandra Pawlik, who worked extremely hard to deliver excellent tutorials and thoroughly deserved the applause they received at the end. (Kudos should also go to Casey Bergman and Carole Goble for acting as local organisers and bringing the bootcamp to the university in the first place.) Ultimately the workshop isn't about turning researchers into software engineers but rather getting them started with practices and tools that will support their research efforts, in the same way that good laboratory practices support experimental research. (This isn't an abstract issue, there can be very real consequences as demonstrated by cases of Geoffrey Chang, McKitrick and Michaels, and the Ariane 5 rocket failure - the latter resulting in a very real "crash".)

If any of this sounds interesting to you then the Software Carpentry bootcamp calendar shows future events planned in both Europe and the US, so it's worth a look to see if there's one coming up near your location. Otherwise you could consider hosting or running your own bootcamp. Either way I'd very much recommend taking part to any researchers who want to make a positive impact on their work with software.

Inline images in HTML tags with Python

2012-08-19T08:54:00.000-07:00

I recently discovered a neat trick for embedding images within HTML documents, really useful if you've got an application where you would like the HTML files to be portable (in the sense of being moved from one location to another) and not have to rely on also moving a bunch of related image files.

Essentially the inlining is achieved by base64 encoding the data from the image file into an ASCII string, which can then be copied into the src attribute of an <img> tag with the following general syntax:


<img src="data:image/image_type;base64,base64_encoded_string" />

For example to embed a PNG image:

<img src="data:image/png;base64,iVBORw...." />

i.e. image_type is png and iVBORw... is the base64 encoded string (truncated here for readability).

If you're familiar with Python then it's straightforward to encode any file using the base64 module, e.g. (for a PNG image):


>>> import base64
>>> pngdata = base64.b64encode(open("cog.png",'rb').read())
>>> print "<img src='data:image/png;base64,%s' />" % pngdata

And here's an example inlined PNG generated using this method:

In fact this inlining is an example of the general data URI scheme for embedding data directly in HTML documents, and can also be used for example in CSS to set an inline background image - the Wikipedia entry on the "Data URI scheme" is a good place to start for more detailed information.

There are some caveats, particularly if you're interested in cross-browser compatibility: older versions of Internet Explorer (version 7 and older) don't support data URIs at all, while version 8 only supports encoded strings up to 32KB. More generally the encoded strings can be around 1/3 larger than the original images and are implicitly downloaded each time the document is refreshed; these are definitely considerations if you're concerned about bandwidth. However if these aren't issues for your application then this can be a handy trick to have in your toolbox.

Update 24/10/2012: another caveat I've discovered since is that at least some command line HTML-to-PDF converters (for example wkhtmltopdf) aren't able to convert the encoded images, so it's worth bearing this in mind if you plan to use them. (On the other hand the PDF conversion in Firefox - via "Print to file" - works fine but can't be run from the command line AFAIK.)

Installing Fedora 16 (Verne) dual-boot with Windows 7

2012-04-22T05:30:00.000-07:00

Back in February I finally got around to installing Fedora 16 (Verne) as a dual-boot option on my Windows 7 home machine. I was helped considerably by an impressively detailed tutorial on How to dual-boot Fedora 15 and Windows 7 posted on the LinuxBSDos.com blog, which was invaluable for the tricky steps of partitioning and boot loader set up.

In this post I give an overview of what I did for my own dual-boot installation, with some hints and tips that I picked up in the process.

Preparation: back up the existing system

My first step was to make an image of the existing system on an external hard drive so that I could recover everything if the installation should fail for any reason. To do this I used Clonezilla Live, which can be burned onto a CD or other bootable media (I used the 1.2.12-10 AMD64 iso). I found the Clonezilla website quite difficult to navigate but there are very detailed usage instructions at http://www.clonezilla.org/clonezilla-live-doc.php, and while Clonezilla Live's menu-driven structure might feel a bit retro to those used to a slicker UI, the program itself turns out to be very easy to use.

Note that the size of the resulting image file (and hence the amount of space required on the external drive) depends on how much of the disk actually contains data, rather than the size of the disk or partition being imaged. So although the Windows partitions had been allocated the whole of my 1 TB hard drive, in this case only about 70 GB was in use and the corresponding image file was even smaller at around 45 GB.

Repartitioning: create space for the Fedora installation

Having created the image the next step was to free up some space by shrinking the size allocated to the Windows partitions (the LinuxBSDos dual-booting tutorial assumes there's already some free space available for the Fedora install). To do this I used GParted Live (version 0.11.0-10), a free partition editor that runs from bootable media and works with 64-bit systems. Launching the GParted application from the bootable CD is covered at http://gparted.sourceforge.net/display-doc.php?name=gparted-live-manual and there is comprehensive documentation on how to use it (with screenshots) at gparted.sourceforge.net/display-doc.php?name=help-manual. However I think the user interface is pretty intuitive on its own. For my installation I reduced the Windows allocation down to 50% of the disk space, leaving around 500 GB of unallocated space.

Installing Fedora 16

I was now in a position to install Fedora into the free space. I downloaded and burned the Fedora 16 installable live media (aka Live Desktop CD) from http://fedoraproject.org/en/get-fedora (make sure you get the appropriate 32- or 64-bit version for your system), which also allows you to try out Fedora without installing (see my previous post on Fedora 15 and Gnome 3 user basics if you're unfamiliar with the Gnome 3 desktop).

Detailed information about the installation procedure can be found in the extensive Fedora 16 Installation Guide but in summary this is what I did:

Booted the system from the Fedora Live Desktop CD and started the installer via Activies/Applications/Install to Hard Drive.

Clicked through the installer screens and set up keyboard layout, computer name, network setup, computer name, and timezone etc (generally if you're installing a vanilla home desktop system then the default options should be okay) until I reached the Type of installation screen. Here I needed to tell the installer where to put Fedora:
- Choose the Use Free Space option to use the unallocated disk space, and
- Make sure the Review and modify partitioning layout option is checked.
before clicking through to the next screen.

The next step was to modify the assigned logical volumes to my preferred layout (here the detailed instructions in the LinuxBSDos blog post were invaluable; I strongly recommend consulting them if you're unfamiliar with logical volumes and partitions). For my installation I changed the Fedora defaults to set up the following new partitions:
- lv_root (/): 10 GB
- lv_home:
- lv_swap: 16 GB
Generally it's recommended to only use the space you need for the new installation and leave unused capacity unallocated; keeping /home separate from the root partition means that I can share user directories between multiple Linux installations in future; and the recommendation is that swap should be allocated around twice the size of the system RAM. However more elaborate partitioning schemes can be created, with various pros and cons - there's useful information on this in the Recommended Partitioning Scheme section of the Fedora documentation.

Having set up the partition layout, I then needed to change the location of the Fedora boot loader (GRUB) from the default of Master Boot Record to First sector of boot partition. The reason for this is that any changes that are made to the Master Boot Record (MBR) (such as installing GRUB) are liable to be overwritten by future Windows updates, so it's better to manage the boot options from within Windows after Fedora has been installed:
- Click Change Device and select First sector of boot partition (I'd recommend verifying that the correct location is shown before clicking Next to complete the installation, and also making a note of the boot loader location for reference later).

At this point I was finished and the installation could complete, which was a relatively quick process (in my case it took around 5 minutes).

Configuring the Windows boot manager

Once the installation process had completed the computer rebooted directly into Windows, so to be able to access Fedora I needed to update the Windows boot manager: for this I used EasyBCD 2.1.2 (EasyBCD is a Windows bootloader editor freely available from NeoSmart Technologies) as recommended in the LinuxBSDos.com tutorial. Once installed in Windows I was able to add a new entry for Fedora 16:

Click on the Add New Entry tab
Select the Linux/BSD tab and select the partition that the GRUB boot loader was installed to (which I made a note of earlier)
Select Edit Boot Menu to view the change

Then restarting the system then gave me the option of booting either Windows or Fedora 16.

(Note that using this method, when Fedora 16 is selected you get the Linux GRUB boot loader rather than just booting directly into Linux; however it should be possible to configure GRUB to boot Fedora immediately.)

Post-installation Fedora set-up

Once Fedora 16 was booted for the first time, there were a few standard post-installation steps to complete: agreeing to the user license, setting up the date and time - usually I select the option to synchronise over the network - and creating a non-root user account. I also ran the Software Update application to pull in any updates (this also fixed the settings for my monitor, which initially weren't properly configured). It was then possible to start building up the software inventory and configuring the system to my personal preferences.

The final thing I did was to make donations to the Clonezilla, GParted and EasyBCD projects, as I would have been unable to set up my dual-boot system without them. It's also worth taking a moment to acknowledge the efforts of the Fedora community and the wider Linux community for making a complete operating system available and easy to install at no cost to the end user.

For my part I hope that this overview will be useful to someone else - and good luck with your own dual-booting installations.

Firefox add-ons for the occasional web developer

2011-11-30T11:47:00.000-08:00

I'm not a hardcore web developer but I do some occasional web-based work, and one of the issues I have is that - because web applications exist in an environment which spans both browser and server (and which often seems to hide the workings of its components) - it can be quite difficult to see under the hood when there are problems.

Fortunately there are a number of adds-on for Firefox (my browser of choice) that can help. These are the ones that I like to use:

Firebug http://getfirebug.com/ is possibly one of the most essential add-ons for web development. I first came across it as a Javascript logger and debugger, but it's far more than that: describing itself as a complete web development tool, its functionality extends to HTML and CSS in additional to script profiling and network analysis capabilities. As an occasional user I've found the Javascript debugging functions invaluable, and the ability to edit CSS in-place and see the results immediately has also been really helpful in debugging style sheets - in fact its biggest downside from my perspective is that it's not immediately obvious how to use many of its functions.

Live HTTP Headers http://livehttpheaders.mozdev.org/ is a great tool for exposing the interactions between your browser and a server. I found this invaluable when I was debugging some website functionality that I was developing earlier this year, as it enabled me to follow a redirect that I'm sure I couldn't have seen otherwise.

QuickJava http://quickjavaplugin.blogspot.com/ is a utility that allows support for Java, Javascript, Flash, Silverlight, images and others to be toggled on and off within your browser, enabling you to check how a page behaves when viewed by someone who doesn't have these enabled.

I really like the HTML Validator http://users.skynet.be/mgueury/mozilla/ for ensuring that my HTML markup is actually W3C compliant; the main issue with this is that it's only available for Windows platforms. Provided you have the "Add-on Bar" visible in Firefox (toggle via "Options" in the Firefox main menu, or do ctrl+/), this displays a little icon at the bottom of the screen indicating the goodness or otherwise of your markup.

There are a few other useful add-ons for working with design elements like colours and images:

Colorzilla http://www.colorzilla.com/ is a tool that allows you (among other things) to pick colours from the current webpage and get the corresponding hex codes or RGB values.

Measureit http://frayd.us/ creates a ruler that lets you measure the size of page elements in pixels - particularly helpful when sizing images for web display.

In the past I've found the in-browser screen capture utility Fireshot http://screenshot-program.com/fireshot/ quite handy for taking screenshots of an entire webpage including the off-screen portions. I have to admit I haven't used it for a while though. There's a paid "pro" version which offers a lot of additional functionality.

Although I've given URLs, the easiest way to install any of these is via the "Get Add-ons" tab accessed via the "Add-ons" option in Firefox's main menu (I'm using Firefox 8.0 at the time of writing). Once installed the individual add-ons appear in various places, for example by default Firebug's icon can be found at the top-right hand corner. If an add-on's icon doesn't appear automatically (as seems to happen for Measureit) then you might have to add it manually: go to "Options"/"Toolbar layout", locate the item and drag it to the toolbar.

I wouldn't try to argue that this is definitive list, but for an occasional user like as myself these tools work well and (with the exception of Firebug) are easy to remember how to use even after several months away from them. However if these don't meet your needs then I'd recommend checking out the "Web Development" category of Mozilla's add-ons site for many more options.

Creative Commons overview

2011-11-13T06:55:00.000-08:00

A while ago I came across an interesting overview of the Creative Commons licence for digital content by Jude Umeh in the BCS "IT Now" newsletter ("Flexible Copyright", also available via the BCS website at http://www.bcs.org/content/conBlogPost/1828 as "Creative Commons: Addressing the perils of re-using digital content"), which I felt gave a very clear and concise introduction to the problem that Creative Commons (CC) is trying to solve, how it works in practice, and some of the limitations.

Essentially, anyone who creates online content - whether a piece of writing (such as this blog), an image (such as a photo in my Flickr stream), or any other kind of media - automatically has "all rights reserved" copyright on that content. This default position means that the only way someone else can (legally) re-use that content is by explicitly seeking and obtaining the copyright owner's permission (i.e. a licence) to do so. As you might imagine this can present a significant barrier to re-using online content.

The aim of the Creative Commons is to enable content creators to easily pre-emptively grant permissions for others to re-use their work, by providing a set of free licences which bridge the gap between the "all rights reserved" position (where the copyright owner retains all rights on their work) and "public domain" (where the copyright owner gives up those rights, and allows anyone to re-use their work in any way and for any purpose).

These licences are intended to be easily understood and provide a graduated scale of permissiveness. According to the article the six most common are:

BY ("By Attribution"): this is the most permissive, as it grants permission to reuse the original work for any purpose - including making "derived works" - with no restrictions other than that it must attributed to the original author.

BY-SA ("By Attribution-Share Alike"): the same as BY, with the additional restriction that any derived work must also be licensed as BY-SA.

BY-ND ("By Attribution-No Derivatives"): the original work can be freely used and shared with attribution, but derivative works are not allowed.

BY-NC ("By Attribution-Non-Commerical"): as with BY, the original work can be used, shared and used in derived works, provided attribution is made to the original author; however the original work cannot be used for commercial purposes.

BY-NC-SA ("By Attribution-Non-Commercial-Share Alike"): similar to BY-SA, so any derived work must use the same BY-NC-SA licence, and like BY-NC, in that commercial use of the original work is not permitted.

BY-NC-ND ("By Attribution-Non-Commercial-No Derivatives"): the most restrictive licence (short of "all rights reserved"), as this only allows re-use of the original work for non-commercial purposes, and doesn't permit derivative works to be made. Umeh states that BY-NC-ND is "often regarded as a 'free advertising' licence".

As Umeh points out, "CC is not a silver bullet", and his article cites examples of some of its limitations and potential pitfalls. Elsewhere I've also come across some criticisms of using the non-commercial CC licences in certain contexts: for example, the scientist Peter Murray Rust has blogged about what he sees as the negative impact of CC-NC licensing in science and teaching (see "Suboptimal/missing Open Licences by Wiley and Royal Society" http://blogs.ch.cam.ac.uk/pmr/2011/10/27/suboptimalmissing-open-licences-by-wiley-and-royal-society/ and "Why you and I should avoid NC licences" http://blogs.ch.cam.ac.uk/pmr/2010/12/17/why-i-and-you-should-avoid-nc-licences/).

However it's arguable that these are special cases, and that more generally CC-based licensing has a significant and positive impact on enabling the legal re-use of online material that would otherwise not be possible: indeed, even the posts cited above only criticise its NC aspects, and otherwise see the CC as greatly beneficial. Certainly it's worth investigating if you're interested in allowing others to reuse digital content that you've produced (there's even a page on the CC website to help choose the appropriate CC licence based on answers to plain English questions: http://creativecommons.org/choose/).

As I'm not an expert on CC (or indeed on copyright law or content licensing), I'd recommend Umeh's article as the next step for a more comprehensive and expert overview; and beyond that of course more information can be found at the Creative Commons website http://www.creativecommons.org/ (with the UK-specific version due to become available at http://www.creativecommons.org.uk later this month).

Day Camp 4 Developers: Project Management

2011-10-24T12:02:00.000-07:00

Just over three of weeks ago I attended the third online Day Camp 4 Developers event, which this time focused on the subject of project management. The DC4D events are aimed at filling the "soft skills" gap that software developers can suffer from, and rather than being a "how-to" on project management (arguably there are already plenty of other places you can learn the basics) the six speakers covered a range of topics around the subject - some of which I wouldn't initially have thought of in this context (for example, dealing with difficult people). However as one of the speakers noted, fundamentally project management is as much about people as it is about process, and all of them delivered some interesting insights.

The first talk (which unfortunately I missed the start of through my own disorganisation) by Brian Prince about "Hands-on Agile Practices" covered the practical implementation of Agile process in a lot of detail. I've never worked with Agile myself, but I have read a bit about it in the past and Brian's presentation reminded me of a few Agile concepts that sound like they could be usefully adopted elsewhere. For example: using "yesterday's weather" (i.e. statistically the weather tomorrow is likely to be the same as today's) as a way to plan ahead by considering recent performance; and the guidelines for keeping stand-up meetings concise could also be applied to any sort status meeting (each person covers the three points "what I did yesterday", "what I'm doing today and when it will be done", and "what issues I have"). The idea of focusing on "not enough time" rather than "too much to do" also appealed to me.

The next presentation "Dealing with Difficult People" by Elizabeth Naramore turned out to be an expected delight. Starting by asking what makes people you know "difficult" to interact with, she identified four broad types of behaviour:

"Get it done"-types are focused on getting information and acting on it quickly, so their style is terse and to-the-point,
"Get it right"-types are focused on detail, so their style is precise, slow and deliberate,
"Get along"-types are focused on making sure others are happy, so their style is touchy-feely and often sugar-coated, and
"Get a pat on the back"-types are focused on getting their efforts recognised by others, so their style is more "person-oriented".

The labels are instantly memorable and straight away I'm sure we can all think of people that we know who fit these categories (as well as ourselves of course). Elizabeth was at pains to point out that most people are a mixture of two or more, and that none of them are bad (except when someone is operating at an extreme all of the time). The important point is that they affect how people communicate, so if you can recognise and adapt to other people's styles, and learn to listen to them, then you'll stand a better chance of reducing your difficult interactions.

Rob Allen was next up with "Getting A Website Out Of The Door" (subtitled "Managing a website project"), and covered the process used at Big Room Internet for spec'ing, developing and delivering website projects for external clients. Rob included a lot of detail on each part of the process, what can go wrong, and how they aim to manage and reduce the risks of that happening. One specific aspect that I found interesting was the change control procedure, which is used for all change requests from their clients regardless of the size of the change, essentially:

Write down the request
Understand the impact
Decide whether to do it
Do the work!

I think that the second point here is key: you need to understand what the impact will be, and how much work it's really going to be (I'm sure we've all agreed at one or another to make "trivial" changes to code which have turned out in practice to be far more work than anyone first imagined). A more general point that Rob made was the importance of clear communication, particularly in emails (which should have a subject line, summary, action and deadline).

Rob was followed by Keith Casey talking about how "Project Management is More Than Todo Lists". One of the interesting aspects of Keith's talk was that he brought an open source perspective to the subject. In open source projects the contributors are unpaid and so understanding how their motivations differ from doing work paid work is important for the projects to successful: as Keith said early on in his talk, in this case "it's about people".

He argued that people managing open source projects should pay attention to the uppermost levels of Maslow's Hierarchy of Needs (where the individual focuses on "esteem" and "self-actualisation"), but there was also a lot of practical advice: for example, having regular and predictable releases; ensuring that bugs and feature requests are prioritised regularly, and that developements should be driven input and involvement by the community. I particularly liked the practical suggestion that frequently asked questions can be used to identify areas of friction that need to be simplified or clarified. He also recommended Karl Fogel's book "Producing Open Source Software", which looks like it would be a good read.

Thursday Bram's presentation "Project Management for Freelancers" was another change of direction (and certainly the subtitle "How Freelancers Can Use Project Management to Make Clients Happier than They've Ever Been Before" didn't lack ambition). She suggested that for freelancers, project management is at least in part about helping clients to recognise quality work - after all they're not experts in coding (that's why they hired you), so inevitably they have an issue with knowing "what does quality look like?". (If you've ever paid for a service such as car servicing or plumbing then I'm sure you can relate to this.) So arguably one function of project management is to provide a way to communicate the quality of your work. The key message that I took away from Thursday's talk was that "what makes people happy is 'seeing progress' on their projects". Again I felt this was an idea that I could use in my (non-freelancer) work environment.

The last session of the day was Paul M. Jones talking about "Estimating and Expectations". Essentially we (i.e. everyone) are terrible at making estimates, as illustrated by his "laws of scheduling and estimates":

Jones' Law: "if you plan for the worst, then all surprises are good surprises"
Hofstadter's Law: "it always takes longer than you expect - even when you take Hofstadter's Law into account"
Brooks' Law: "adding more people to a late software project will make it even later"

However there are various strategies and methods we can use to try and make our estimates better: for example, using historical data and doing some design work up front can both provide valuable knowledge for improved estimates. In this context Paul also had my favourite quote of the day: "It's not enough to be smart; you actually have to know things" (something that I think a lot of genuinely clever people can often forget, especially when they move into a domain that's new to them).

It felt like Paul packed an immense amount of material into this talk, covering a wide range of different areas and offering a lot of practical advice drawn from various sources (Steve McConnell's Code Complete, Fred Brooks' The Mythical Man Month and Tom DeMarco and Timothy Lister's Peopleware: Productive Projects & Teams were all mentioned) both for estimation techniques and for expectation management - where ultimately communication and trust are key (a message that seemed to be repeated throughout the day).

In spite of a few minor technical issues (the organisers had opted to use a new service called Fuzemeeting, which I guess was still ironing out some wrinkles), overall everything ran smoothly, and at the end I felt I'd got some useful ideas that I feel I can apply in my own working life - in the end surely that's the whole idea. It was definitely worth a few hours of my weekend, and I'm looking forward to being able to see some of the talks again when the videocasts become available. In the meantime if any of this sounds interesting to you then I'd recommend checking out the DC4D website and watching out for the next event!

Using Fedora 15 & Gnome 3: an update

2011-10-16T09:29:00.000-07:00

Following up on my previous posting about the Fedora 15/Gnome 3 user experience, I've now been using it as a day-to-day working environment for the last 4 1/2 months and thought it was time to post a brief update.

Overall the experience has been pretty good (although I gather a lot of other commentators on the web wouldn't agree). For me the least satisfying aspect is still the automated workspaces/virtual desktops, closely followed by the default left-click behaviour of icons in the favourites sidebar. Both of these continue to catch me out from time to time, but I'd class their deficiencies as merely irritating rather than unusable.

Another aspect that I complained about in my previous post was the limited set of customisations that seemed to be available. However I've since discovered the gnome-tweak-tool, which provides access to a much wider range of customisations than is offered via the "Preferences" options. (This and many other useful features are covered in Fedora's release notes for desktop users, which I should probably have read right at the start.)

It's likely that you'll need to explicitly install it as it doesn't appear to be there by default, i.e.:

% yum install gnome-tweak-tool

(Nb this requires superuser privileges). To launch, start from the command line (or go to the "Applications" desktop view and use the search box to look for "tweak"). The tool itself looks like this:

Figure 1: gnome-tweak-tool displaying the "Fonts" tab

There are various categories ("Fonts", "Interface" etc) with a set of options for each, and at first glance there doesn't seem to be that many options available. However if the one you want might doesn't appear to be there then it's worth typing in some search terms to see if something comes up (for example, this is how I found the option for displaying the full date next to the time at the top of the screen).

Another useful utility is gnome-session-properties (again, seems easiest to launch from the command line), which really doesn't have many options but does allow you to customise which applications start up on login:

Figure 2: gnome-session-properties dialog

As you can see by the fact that I'm still using the default desktop wallpaper, I'm not big on customisations (in fact my needs are basic: web browser, email client, terminal window, Emacs and some development tools are usually sufficient), however these additional tools have helped make me feel a little more at home, and generally I'm pretty happy with the setup now.

Finally I thought I'd give the GnomeShell Cheatsheet page at http://live.gnome.org/GnomeShell/CheatSheet a quick mention. It covers similar ground to my previous post but from a more expert perspective and with some useful extra detail.

Book review: “Python Testing Cookbook” by Greg L. Turnquist

2011-07-27T13:33:00.000-07:00

Disclosure: a free e-copy of this book was received from the publisher for review purposes. The opinions expressed here are entirely my own; a copy of this review has also been posted at Amazon.

Greg L. Turnquist’s “Python Testing Cookbook” explores automated testing at all levels, with the intention of providing the reader with the knowledge needed to implement testing using Python tools to improve software quality. To this end the book presents over 70 “recipes” in its nine chapters (ranging from the basics of unit testing, through test suites, user acceptance and web application testing, continuous integration, and methods for smoke- and load-testing), covering both tools for testing Python, and Python tools for testing. It also delivers advice about how to get the most from automated testing, which is as much an art as a science.

The first three chapters introduce the fundamentals: writing, organising and running unit tests, comprehensively covering unittest (Python’s built-in unit testing library), nose (a versatile tool for discovering, running and reporting tests) and doctest (which turns Python docstrings into testable code – a sample of this chapter can be downloaded from http://www.packtpub.com/python-testing-cookbook/book). Having established a solid foundation, subsequent chapters look at increasingly broader levels of automated testing using the appropriate relevant Python tools: for example, the “lettuce” and “should_DSL” libraries for “behaviour driven development” (an extension of “test driven development” which aims to produce human-readable test cases and reports), and the “Pyccuracy” and “Robot” frameworks for end-user acceptance testing of web applications. Later chapters cover higher level concepts and tools, such as using nose to hook Python tests into “continuous integration” servers (both Jenkins and TeamCity are covered in detail), and assessing test coverage using the “coverage” tool (both as a metric, and to identify areas that need more tests). A detailed chapter on smoke- and load-testing includes practical advice on developing multiple test suites for different scenarios, and methods for stress-testing (for example, by capturing and replaying real world data) to discover weaknesses in a system before going to production. The final chapter distils the author’s experience into general advice on making testing a successful part of your code development methodology, both for new and legacy projects.

There’s a lot of good stuff in this book: the initial chapters on unittest and nose are particularly strong, and I can imagine returning to these in future as a reference. There is also a lot of excellent and hard-won practical advice from the author’s own experience – not only in these early chapters but throughout the book – which is consistently valuable (in this regard the final chapter is a real highlight and could easily stand alone – I will definitely be re-reading it soon). Elsewhere the various tools and topics are presented clearly with plenty of useful detail, and in some cases have demystified things that I’d always assumed were quite esoteric and difficult to do (nose in particular was a revelation to me, but also setting up continuous integration servers and measuring test coverage).

There are a few disappointments: the section on mock objects left me feeling baffled as to how to actually implement them in practice – a shame as it was something that I’d looked forward to learning. I’d also have liked something about approaches for handling difficult testing scenarios such as software which interacts with the file system or with large files – a few hints here would have been invaluable for me. There are typos in some commands and code in a few recipes (e.g. for nose), which meant I had to look up the correct syntax elsewhere – perhaps not so bad, but annoying (especially in a cookbook) – and since the recipes themselves aren’t numbered, this sometimes made it difficult to navigate between them.

However these are fairly minor quibbles, and in conclusion I was impressed with both the breadth of material covered by the book and the level of detail for many topics. Moreover I enjoyed reading it and was often left feeling excited at the prospect of being able to apply the ideas to my own projects, which is I think was one of the author’s aims (and no mean feat for a technical book). I think that the combination of the detail together with the author’s practical advice make this book both an excellent introduction to testing with Python, and a valuable resource to refer back to subsequently.

(Addendum: Greg Turnquist's blog about the book can be found at http://pythontestingcookbook.posterous.com/ and features some interesting supplementary material.)

Fedora 15 and Gnome 3: user basics

2011-07-10T11:08:00.000-07:00

I've been using Fedora 15 for about a month now and thought it was time to write up some of my experiences with the new Gnome 3 desktop, since certain aspects are quite a bit different from the previous version. I know other people have posted details about the Fedora 15 desktop (for example Xavier Claessens' One Week with Gnome 3) but when I first installed it there didn't seem to be much from a "user basics" perspective. So this is my take, hope it's useful.

Getting started

When you first start up Gnome 3 the desktop looks pretty empty - in fact there are no desktop icons in this new version (even when you put them in your "Desktop" subdirectory):

Figure 1: "Empty" Gnome 3 desktop on startup. No desktop icons allowed!

To get started, move the mouse to the word "Activities" at the top right-hand corner of the screen (the so-called "hot corner") - immediately changing the desktop to the "Windows" view:

Figure 2: "exploded view" of the Gnome 3 desktop, accessed either by moving the mouse over the "Activities" hot corner (top-left of the screen), or by hitting the "Gnome" (i.e. Windows) key on the keyboard. The Favourites sidebar sits on the left edge of the screen, and the edge of the Workspaces sidebar peeks out on the right.

In this view (figure 2) you can see the Favourites sidebar on the left side, and just the very edge of the Workspaces sidebar on the right (more about those below). I call this the exploded view of the current workspace, since (as in this example) features minatures of any windows in the workspace. The exploded view can also be toggled by pressing the ~~"Gnome key"~~ "Super key" (i.e. the Windows key).

From this view you can toggle to the "Applications" view, by clicking on "Applications" near the top left (figure 3):

Figure 3: "Applications" view of the Gnome 3 desktop

This shows all the applications installed on the system, with a search box and category groupings on the right to help you find the one you want.

Drag icons from this view to the "Favourites" bar to make them more easily accessible in future.
The "Add/remove software" application is a graphical front end to yum for installing and managing additional packages that are weren't included by default.

The Favourites sidebar: launching and navigating applications

The Favourites sidebar is the strip down the left-hand side which holds various application icons. These icons do "double-duty": if you've dragged an icon there from elsewhere (essentially "favouriting" it), then it initially acts as a launcher for that application; also the icons for any running applications (favourited or otherwise) will appear here.

If an application already has one or more instances running then clicking on its icon takes you to the "nearest" running instance; clicking-and-holding (or right clicking) gives you more options (e.g. to start a new instance, or move to any of the currently open instances) - behaviour that is probably already familiar to users of Windows 7 and Mac OS X (figure 4):

Figure 4: The "Favourites" sidebar with dialogue (i.e. the black bubble) opened for Firefox after right-clicking on its icon. This gives options to move to a running Firefox window, or to start a new Firefox instance.

Another way to navigate between running applications is to use Alt+Tab to cycle between them (figure 5):

Figure 5: Alt+Tab moves between running applications...

Repeated Alt+Tabb'ing moves between the applications; if there's more than one instance running then these are also shown when you Alt+Tab to it, and you can use the arrow keys to navigate to the specific one you want (figure 6):

Figure 6: ... and arrow keys allow you to select a specific instance if there are multiple instances of a particular application.

The Workspaces sidebar: navigating multiple desktops

Workspaces provide a way to manage applications, by giving the user multiple virtual desktops. These should already be familiar to seasoned Gnome users, but they operate somewhat differently in Gnome 3: there are no longer a fixed number of workspaces, instead they are created and destroyed automatically by the system as required.

You can move between workspaces in at least two different ways. Firstly, you can access the workspaces sidebar on the right-hand side of the screen, by moving the mouse over it in the "exploded view" of the desktop and causing it to "pop out" (figure 7):

Figure 7: the Workspaces sidebar "popped out" on the right of the screen in the exploded view of the Gnome 3 desktop.

The sidebar shows miniatures of each workspace, with the current workspace highlighted with a white outline. Clicking on one of the images takes you to that workspace; you can also drag application windows between the different workspaces.

Note the sidebar also shows an extra "empty" workspace at the bottom: if an application is opened or moved into this workspace then a new empty workspace is automatically created underneath. Furthermore, there's only ever one empty workspace - so if a workspace "empties" (e.g. because you've closed all the applications it contains) then Gnome automatically removes it. This can be quite disconcerting, and is probably the feature that causes me the most confusion in practice as it often upsets my sense of where I am in the workspace order.

The other way to navigate between workspaces is Alt+Ctrl+Up/Down Arrows, which I find myself using quite a lot (although I frequently overshoot into the empty workspace by accident) (figure 8):

Figure 8: moving between multiple desktops using Alt+Ctrl+Up/Down keys

Other observations

Resizing windows: windows can be maximised by double-clicking on their title bar (double-click again to restore to the original size). There's no "minimise" button on the window frame, so you now have to right-click and then select the "Minimise" menu option. Also, note that dragging a window to the top of the screen automatically causes it to maximise (again similar to Windows 7, and not always what you intend). Manual resizing is also possible as always, by dragging the window edges - but this can be fiddly, as the area where an edge can be "caught" for dragging seems to be quite small.

System notifications: these now pop up rather discreetly at the bottom of the screen, but interacting with then can be frustrating at times - often they disappear before you have a chance to read them, and sometimes (counterintuitvely) disappear when clicked.

Customisation: while some preference-type options are available via the "username" menu (top right-hand corner of the screen) under "System Settings", overall the customisation options feel quite limited (for example, no screen-savers). However as a number of interfaces to system tools currently only seem to be accessible by launching from a command line, it's not clear if this a conscious design decision or whether more customisation options will be exposed in future versions.

Fallback mode: this is a half-way house between Gnome 2 and Gnome 3, and is started by default on systems which can't support the full Gnome 3 experience (which appears to include virtual machines). However as it's much more like the old Gnome, if you really don't like the new version then you could try using fallback mode instead.

Conclusion

Having been using Fedora 15 and Gnome 3 day-to-day for a few weeks now, I'm now largely used to its quirks and finding it overall a perfectly serviceable working environment - for me the new workspaces model and the rather random system notification mechanism have proved to be the most challenging differences from previous versions. So while it may not suit everyone's tastes it's definitely worth trying (and hopefully the more egregious foibles will be ironed out in future versions).

Mac OS X: new user tips

2011-05-30T10:02:00.000-07:00

Over the couple of weeks I've been using an Apple iMac, and as a Windows/Linux user I've found navigating the desktop has been something of a learning experience for me.

As different as they are, in many ways the standard Windows and Linux desktops are idiomatically quite similar these days, and both support the standard PC three-button mouse. By contrast the Mac OS X desktop environment (and its use of the infamous one-button mouse) has a number of differences which can turn even basic operations (for example, cut-and-paste) initially into something of a challenge.

However some basic knowledge should go a long way in helping. First, there are the three essential keys you need to know about:

Command key: [⌘]
Option key: [⌥]
Control key: [ctrl]

(The links give more background but aren't essential to the following. You can think of the option key as being the same as the "Alt" key on Windows/Linux.)

Then:

Emulating the right-hand mouse button: [ctrl] + mouse click (essential for desktop and web applications that use this to activate context menus and so on)

Basic text editing operations:

Cut: [⌘] + [x]
Copy: [⌘] + [x]
Paste: [⌘] + [x]

Basic keys for navigating within text documents:

Home: [↖]
End: [↘]
Page up: [⇞]
Page down: [⇟]

Useful shortcuts for navigating the desktop:

Cycle between open windows: [⌥] + [tab]
Zoom out (pulls back to show all open windows): [F9]
Show desktop (hides all open windows): [F11]

And finally (and essential if you're programming and find your Apple keyboard is missing a hash key):

Hash symbol ("#"): [⌥] + [3]

These all work on OS X 10.4.11 ("Tiger"), which is admittedly no longer a very recent release, but hopefully they're also applicable to later Mac OSes. I can't say that I've fallen in love with Apple as a result, but they have enabled me to operate at an acceptably functional level (until I can get my Linux workstation up and running!).

Managing Python packages: virtualenv, pip and yolk

2011-04-09T06:04:00.000-07:00

I've recently been playing with the Python virtualenv package - along with pip and yolk - as a way of managing third-party packages. This post is my brief introduction to the basics of these three tools.

virtualenv lets you create isolated self-contained "virtual environments" which are separate from the system Python. You can then install and manage the specific Python packages that you need for a particular application - safe from potential problems due to version incompatibilities, and without needing superuser privileges - using the pip package installer. yolk provides an extra utility to keep track of what's installed.

1. virtualenv: building virtual Python environments

virtualenv can either be installed via your system's package manager (for example, synaptic on Ubuntu), or by using the easy_install tool, i.e.:

$ easy_install virtualenv

(If you don't have the SetupTools package which provides easy_install then you can download the "bootstrap" install script from http://peak.telecommunity.com/dist/ez_setup.py. Save as ez_setup.py and run using /path/to/python ez_setup.py.)

Once virtualenv is installed you can create a new virtual environment (called in this example, "myenv") as follows:

$ virtualenv --no-site-packages myenv

This makes a new directory myenv in the current directory (which will contain bin, include and lib subdirectories) based on the system version of Python. The --no-site-packages option tells virtualenv not to include any third-party packages which might have been installed into the system Python (see the virtualenv documentation for details of other options).

To start using the new environment, run the environment's "activate" command e.g.:

$ source myenv/bin/activate

The shell command prompt will change from e.g. $ to (myenv)$, indicating that the "myenv" environment (and any packages installed in it) will be used instead of the system Python for applications run in this shell. (Note that the Python application code doesn't need to be inside the virtual environment directory; in fact this directory is just using for the packages associated with the virtual environment.)

Finally, when you've finished working with the virtual environment you can leave it by running the deactivate command (also in the bin directory).

(On Windows you may have to specify the full path to the "Scripts" directory of your Python installation when invoking the easy_install and virtualenv commands above, e.g. C:\Python27\Scripts\virtualenv. Also, note that when a virtual environment is created it won't contain a "bin" directory - instead it's activated by invoking the Scripts\activate batch file in the virtual environment directory. Invoking the deactivate command exits the environment as before.)

2. pip: installing Python packages

Once you're created a virtual environment you can start to add packages (which is really the point of doing this in the first place). virtualenv automatically includes both easy_install and an alternative package installer called pip (at least, for virtualenv 1.4.1 and up; earlier versions only have easy_install, so you'll need to run easy_install pip within the virtual environment in order to get it).

Most packages that are easy_installable can also be installed using pip, and it's designed to work well with virtualenv. However I think its main advantage is that it offers some useful functionality that's missing from easy_install - most significantly, the ability to uninstall previously installed packages. (Other useful features include the ability to explicitly control and export versions of third-party package dependencies via "requirements files" - see the pip documentation for more details.)

Basic pip usage looks like this:

(myenv)$ pip install python-dateutil # install latest version of a package

(myenv)$ pip uninstall python-dateutil # remove package

(myenv)$ pip install python-dateutil==1.5 # install specific version

(As an aside, the python-dateutil package is illustrative of one of the advantages of using pip over easy_install: after installing the latest version of python-dateutil, I discovered that it's only compatible with Python 3 - an earlier 1.* version is required to work with Python 2. pip let me uninstall the newer version and reinstall the older one.)

3. yolk: checking Python packages installed on your system

The final utility I'd recommend is yolk, which provides a way of querying which packages (and versions) have been installed in the current environment. It also has options to query PyPI (the Python Package Index). Installing it is easy:

(myenv)$ pip install yolk

Running it with the -l option (for "list") then shows us what packages are available:

(myenv)$ yolk -l
Python          - 2.6.4        - active development (/usr/lib/python2.6/lib-dynload)
pip             - 1.0          - active
python-dateutil - 1.5          - active
setuptools      - 0.6c9        - active
wsgiref         - 0.1.2        - active development (/usr/lib/python2.6)
yolk            - 0.4.1        - active

(See the yolk documentation to learn more about its other features.)

Summary

Obviously the above is just an introduction to the basics of virtualenv, pip and yolk for managing and working with third-party packages - but hopefully it's enough to get started. If you're interested in using virtualenv in practice then Doug Hellman's article about working with multiple virtual environments (and his virtualenvwrapper project, which provides tools to help) is recommended as a starting point for further reading.

Richard Stallman: "A Free Digital Society?"

2011-04-04T09:16:00.000-07:00

About a month ago I was fortunate to attend an IET-hosted lecture by Richard Stallman, entitled "A Free Digital Society?". Probably most famous as the originator of the GNU project (out of which came GNU/Linux) and initiator of the free software movement, Stallman has for many years been an active and vocal advocate for free software, and has a campaigned against excessive extension of copyright laws

He began the talk with the observation that there is an implicit assumption in the recent movement towards "digital inclusion", that using computers and the internet is inherently good and beneficial. However, as the question mark in the title of his talk indicated this assumption merits closer attention, as (in his opinion) there are various issues and threats associated with these technologies. These include:

Survelliance: technology now makes it possible for ISPs, websites and other organisations to monitor and analyse what individuals do online (e.g. the sites that they visit, things they buy, search terms they use etc) to an extent to which (in Stallman's words) "Stalin could only dream".

Censorship: for example, governments or corporations blocking access to particular websites (think Google in China), or even forcing them to close.

Restrictions on users imposed by data formats: both proprietary (e.g. Silverlight) and patented data formats (e.g. MP3) restrict what the end user is able to do with the data they encode.

Non-free software: here "free" is in the sense of "freedom", rather than price. Non-free software is essentially software that isn't under the control of you, the user - in the case of proprietary software, it's controlled by the owner (for example Microsoft, Apple, Amazon) who is able to insert features (e.g. to track user behaviour) that serves their interests rather than those of the user. By contrast, free software - which by the way you can still charge money for - gives the user four basic freedoms: 0. to run the software for any purpose; 1. to study how the software works, and make changes to it; 2. to redistribute the software as-is; 3. to redistribute the software with your changes (see the free software definition). In this way malicious features can be detected and removed, and control is returned to the user.

"Software as a service" (SaaS): in Stallman's definition, "software as a service" is anything where the computation is done by programs that you can't control - this is like non-free software above, because someone else has control and can change how your computing is done at any time without your permission. He made a distinction between things like e-commerce, online storage storage (e.g. Dropbox), publishing (e.g. Twitter) and search (which are about "data" or "communication", and so are not SaaS), and e.g. Google Docs (which does do computation for you, and so is SaaS). (See Stallman's article Who does that server really serve?)

Misuse of an individual's data: essentially doing something with your data without your permission, or even your knowledge - for example, passing on personal data to the authorities, unilaterally modifying your data, or even (for example in the case of Facebook) using it for commercial purposes.

"The War on Sharing": according to Stallman, sharing is "using the internet for what it's best at", and the war on sharing - whether digital rights management (DRM) technology or threatening internet users with disconnection (as under the UK's Digital Economy Act) - is an attempt by commercial interests to unfairly restrict what users are allowed to do (see Stallman's article Ending the War on Sharing).

Users don't have a postive right to do things on the internet: essentially, all the activities that users perform on the internet - communications, payment etc - are dependent on organisations who have no obligation to continue providing those services to you.

This is a pretty long list of issues (hopefully I've accurately captured the essence of each), and while many of them can be mitigated by moving to free software; others (for example, monitoring by ISPs) require other solutions - and Stallman admitted that he's quite pessimistic about the future. Aside from that, it was a fascinating and entertaining talk (including the auctioning of a GNU gnu soft toy to raise funds for the Free Software Foundation) and the subsequent audience Q&A session provided many opportunities for elaboration and clarification on many of the issues.

I'm still mulling over many of the issues raised. On the one hand there is a fundamental question about what moral rights you believe individuals should have, both generally and with specific regard to the digital world; and on the other there is the question of what you should do if you feel those rights are not being upheld. Stallman's position is clear and uncompromising: for example, not owning a mobile phone and not using a key card to enter his office (to avoid the possibility of being tracked), and using a netbook that allows him to run 100% free software (down to the BIOS level). It's certainly given me plenty to think about, and I'm looking forward to reading his book of collected essays "Free Software, Free Society" - which might be a good place to start if you're also interested in learning more.

Book review: "Python 2.6 Text Processing: Beginner’s Guide" by Jeff McNeil

2011-04-03T11:51:00.000-07:00

Jeff McNeil’s “Python 2.6 Text Processing: Beginner’s Guide” is a practical introduction to a wide range of methods for reading, processing and writing textual data from a variety of structured and unstructured data formats. Aimed primarily at novice Python programmers who have some elementary knowledge of the language basics but without prior experience in text processing, the book offers hands-on examples for each of the techniques it discusses – ranging from Python’s built-in libraries for handling strings, regular expressions, and formats such as JSON, XML and HTML, through to more advanced topics such as parsing custom grammars, and efficiently searching large text archives. In addition it contains a great deal of general supporting material on working with Python, including installing packages and third-party libraries, and working with Python 3.

The first three chapters lay the foundations, covering a number of Python basics including a crash course in file and URL I/O, and the essentials of Python’s built-in string handling functions. Useful background topics – such as installing packages with easy_install, and using virtualenv – are also introduced here. (A sample of the first chapter can be freely downloaded from the book’s website at https://www.packtpub.com/python-2-6-text-processing-beginners-guide/book). The next three cover: using the standard library to work with simple structured data formats (delimited “CSV” data, “ini”-style configuration files, and JSON-formatted data); working with Python regular expressions (a stand out chapter for me); and handling structured markup (specifically, XML and HTML). Subsequent chapters on using the Mako templating package (the default system for the Pylons web framework) to generate emails and web pages, and on writing more advanced data formats (PDF, Excel and OpenDocument), are separated by an excellent overview of understanding and working with Unicode, encodings and application internationalization (“i18n”).

The remaining two chapters cover more advanced topics, with some good background theory supplementing the practical examples: using the PyParsing package to create parsers for custom grammars (with a brief nod to the basics of natural language processing using the Natural Language Toolkit, NLTK); and the Nucular package for indexing large quantities of textual data (not necessarily just plain text) to enable highly efficient searching. Finally, an appendix offers a grab-bag of general Python resources, references to some more advanced text processing tools (such as Apache’s Lucene/Solr), and an excellent overview of the differences between Python 2 and 3 (including a hands-on example of migrating code from 2 to 3).

The book covers a lot of ground and moves fairly quickly; however it adopts a largely successful hands-on approach, engaging the reader with working examples at each stage to illustrate the key points, and this certainly helped me keep up. I was also impressed by the clear and concise quality of code in the examples, and the very natural way that general Python concepts and principles – generators, duck typing, packaging and so on – were introduced as asides. (One very minor criticism is that the layout of the example code could have been improved, as the indentation levels weren’t always immediately obvious to me.) Aside from a surprisingly unsatisfying chapter on structured markup (reluctantly, I would recommend looking elsewhere for an introduction to XML processing with Python) and a few niggling typos, there’s a lot of excellent material in this book, and the author has a knack for presenting some tricky concepts in a deceptively easy-to-understand manner. I think that the chapter on regular expressions is possibly one of the best introductions to the subject that I’ve ever seen; other chapters on encodings and internationalization, advanced parsing, and indexing and searching were also highlights for me (as was the section on Python 3 in the appendix).

Overall I really enjoyed working through the book and felt I learned a lot. I think it’s fair to say that given the rather ambitious range of techniques presented, in many cases (particularly for the more advanced or specialised topics) that the chapters are inevitably more introductory than definitive in nature: the reader is given enough information to grasp the background concepts and get started, with pointers to external resources to learn more. In conclusion, I think this is a great introduction to a wide range of text processing techniques in Python, both for novice Pythonistas (who will undoubtedly also benefit from the more general Python tips and tricks presented in the book) and more experienced programmers who are looking for a place to start learning about text processing.

Disclosure: a free e-copy of this book was received from the publisher for review purposes; this review has also been submitted to Amazon.

Day Camp 4 Developers: Telecommuting

2011-03-18T11:37:00.000-07:00

About two weeks ago I took part in the second online Day Camp 4 Developers, on the topic of telecommuting. The idea behind the Day Camp events is to provide software developers with practical knowledge and advice in the area of "soft" skills, to complement their expertise with "hard" skills (i.e. actual coding). In this case five speakers gave consistently excellent web presentations (slides and audio) with different perspectives on remote working, while an IRC chatroom gave all participants a forum to discuss the issues behind the scenes.

Lorna Jane Mitchell started off by asking "Could You Telecommute?". As a teleworker herself, Lorna Jane looked at the environmental, organisational and personal factors that influence the happiness and productivity of the remote worker: for example, ensuring you have a good home working space, and set clear boundaries between work and personal life (both for yourself and for others). In particular you have to be aware of the tendency for other people to think that working from home is easy, and that your time is infinitely flexible. She also noted that there are some big differences between being part of a distributed team and being a telecommuting member of a co-located team (where you risk feeling isolated), and further differences between employees and freelancers. Particularly for lone telecommuters, it's important to build professional and social support networks that might otherwise be taken for granted in more conventional work settings.

Next self-described "entreprenerd" Ivo Jansch talked about "The Business Case For Telecommuting". Ivo's company Egeniq is built around a distributed team (essentially using remote working as an organisational model) - so in addition to benefiting individual workers, he suggested ways that telecommuting could positively impact the company's bottom line, for example enabling access to an bigger talent pool and increasing its geographical reach (if providing consultancy services). He acknowledged that this distributed model won't suit every company or industry however, and success requires (amongst other things) a results-driven culture where individuals are trusted to self-manage and have a sense of shared responsibility. Ultimately good communication between team members is paramount.

After the lunch break, Jack G. Ford gave a manager's perspective on setting up a telecommuting programme in his presentation "Can I Work From Home Tomorrow?". Jack introduced himself as an ex-coder who is now the manager for 17 developers in a more conventional environment than Ivo's, but in spite of that his key points seemed remarkably similar: beyond asking whether the company infrastructure can support remote working, the main issues are trust (both with the manager and with the team) and good communication between the manager and the individual. Jack emphasised that as a manager, when you telecommute, "I can't see you," so the telecommuter must stay connected, keep the manager informed, and must not only act professionally but be seen to do so. Although it might seem obvious, this was a fascinating insight into telecommuting from the other side of the management chain.

Ligaya Turmelle's presentation on "Managing the Work/Life Balance" emphasised the challenges of balancing work and home life, with her lists of "the good, the bad and the ugly" of remote working from a teleworker perspective. Ligaya focused especially on balancing family commitments with work commitments, and among some interesting observations (for example, no longer doing the daily commute means you lose some "me time" to yourself), I was most struck by the admission that if you love your work then it can mean sometimes that you want to go on working, and are in danger of not respecting your own ground rules. While noting that situations can differ both for individuals and companies, her advice was: clarify everyone's expectations (e.g. policies for "on-call" hours, weekends, and holidays); set up ground rules and limits (and be disciplined in adhering to them); and try to be flexible and imaginative in how you approach your work.

The final presentation was Avdi Grimm talking about "The Well-Equipped Remote Worker". Avdi is a freelance software developer who is also a "dispersed teams facilitator" and runs the Wide Teams blog. As might be expected from the title, some of the focus was on the hardware and software tools that can help with remote working, but there was just as much information on practices that can support distributed teams. Once again promoting communication is key, and using tools and practices that help team members create good working relationships (for example, utilising social media like Twitter and Facebook, and holding regular face-to-face meetings) can really contribute to this.

Looking back over all the talks, a few common themes had emerged for me:

Good communication (both with managers and with other team members) to build trust, keep people informed and avoid misunderstandings;
Clarify expectations on all sides, and establishing well-defined boundaries between work and personal life. Set ground rules to ensure that those boundaries are respected by others (your boss, your family and friends) and have the discipline to also respect them yourself;
Build and maintain your social and professional support networks for when there are problem times;
Provide yourself with a good working environment and (software and hardware) tools.

I was also able to relate some points to my own experiences: when I worked briefly as a remote member of a co-located team, I did feel a real sense of isolation; another time as a home teleworker I got the impression from some people that they assumed (not maliciously) that I only did a few hours work a day; and previous experience as part of a large organisation makes me feel that there was some truth in Ivo's comment that "co-location is over-rated", in that it doesn't automatically lead to great communication between individuals or groups.

Overall it was an excellent event and a good use of 8 hours of my Saturday - although the time difference (coincidentally another telecommuting issue) meant that it didn't finish until 10pm UK time I surprised myself by staying with it to the end. Hats off to Cal and Kathy Evans for organising the day and to the speakers for their excellent presentations. Here's waiting for the next Day Camp 4 Developers!

MadLab: pancake café and the Omniversity of Manchester

2011-02-27T11:55:00.000-08:00

Yesterday I dropped into the Manchester Digital Laboratory (aka MadLab) in Edge Street for the MadLab Café Pancake Day, and enjoyed a couple of hours chatting to various friendly people while eating an extremely tasty pancake and drinking cups of tea (one of my favourite pastimes), and at one point even discussing Outkast's back catalogue.

MadLab describes itself as "a community space for people who want to do and make interesting stuff - a place for geeks, artists, designers, illustrators, hackers, tinkerers, innovators and idle dreamers; an autonomous R&D laboratory and a release valve for Manchester's creative communities." I'm not sure precisely where I'd put myself in that list - I've only been there a couple of times before, for the Python Northwest user group meetings - but the folks I met seemed to be a representative cross section of the target community.

There's a packed and eclectic schedule of (mostly free) events hosted there, which is well-worth checking out (see http://madlab.org.uk/events/), but their most recent new development is the Omniversity of Manchester - a programme of professional-level training courses that so far have covered experimental film making and physical computing with Arduino, with plans to extend to topics as diverse as web design, Ruby on Rails, writing workshops and urban gardening. These courses won't be free, but the fees will go towards keeping MadLab sustainable and supporting the other free events.

If you're interested in learning more then you can watch out a video, and register the subjects you'd like to see covered by taking a moment to fill in their survey:

The Omniversity Crowd Sourced Cirriculum Survey

Personally I think it's a really exciting idea - I'm generally a fan of courses, and many of the proposed workshops are things that I'd love to learn more about, so it would also be great to see the Omniversity take off and help MadLab expand and flourish as a focal point for Manchester's digital community - the more people who find out about it and get involved the better. And in the meantime I'll be looking forward to the next (undoubtedly tasty) MadLab café event.

Book review: "Simply SQL" by Rudy Limeback

2011-02-25T06:00:00.000-08:00

Rudy Limeback's "Simply SQL" (Sitepoint) is an overview of SQL targeted at web application developers, and intended to fill a gap between the basic "SQL 101"-type tutorials (seemingly compulsory in just about every introductory article or book about web programming) and more advanced texts covering topics which at first glance don't seem so relevant to the straightforward day-to-day requirements of many web applications.

The chapters are grouped into two main sections. The first deals with the details of the SQL language and comprises the bulk of the book. It starts with a short introduction to the SQL commands most commonly needed by web developers to create and modify data within the database (all the usual suspects - CREATE, ALTER, INSERT, UPDATE, DELETE and so on - are quickly dealt with here). The rest of this section focuses on the SELECT command (the one used to retrieve information), with each chapter covering one specific clause - FROM, WHERE, GROUP BY and so on - in quite extensive detail, and illustrated with examples from sample applications.

The second section of the book has three chapters covering some basic database design concepts, specifically SQL data types, relational integrity, and the use of "special structures" (such as tables that refer to themselves) for particular situations. The appendices then outline the basics of using some specific SQL implementations, along with details of the sample applications and scripts used in the main part of the book.

The heavy emphasis on the SELECT statement might seem odd, but it makes a lot of sense in the context of web applications where data is typically read from the database far more than it's written. The detailed examples are also excellent - at times invaluable - for clarifying things like (for example) the nuances of the different types of JOINS, the subtleties of the GROUP BY and HAVING clauses (useful for aggregating data from subsets of rows in conjunction with summing and averaging functions), and the issues with working with time data. I certainly learnt a few things - the GROUP BY clause was completely new to me, as were the distinctions between the FLOAT and DECIMAL data types (DECIMALs are exact - within certain limits - while FLOATs are approximate). I found the brief sections on views, derived tables and subqueries extremely enlightening, as was the discussion of foreign keys in the chapter on relational integrity, and the clear writing style throughout made the book a pleasure to read.

It's important to note that "Simply SQL" is based on the SQL standard, rather than the syntax of specific implementations (although in places it does indicate where there are notable deviations from the standard, particularly for MySQL) - also it doesn't cover any of the programming APIs, so it's not really a reference text (admittedly it doesn't claim to be). However with its clear and detailed explanations it looks like it would be a useful companion to more traditional reference or cookbooks and will definitely reward re-reading - least ways, I'm sure I'll be squeezing plenty more juice out of it in the future. So overall highly recommended.

Don Knuth: BCS/IET Turing Lecture

2011-02-11T08:09:00.000-08:00

Earlier this week was the annual Manchester BCS/IET Turing Lecture, and this year's guest speaker was Don Knuth. Possibly he's best known (at least to me) as the author of the seminal "The Art of Computer Programming" (a multi-volume book which he began in 1962, and continues to work on to this day - subvolume 4A is the most recently published, with another 5 sections still to come), and the typesetting system TeX (pronounced "tek", and used for typesetting countless Ph.D theses - including mine). However Knuth's contributions to computer science throughout his long career (he's now in his seventies) are staggering - as are his "extra-curricular" activities, which include writing novels and playing the pipe organ.

So it was quite an opportunity to be able to listen to this giant of computing first hand - even more so since rather than a straightforward lecture, this was actually a Q&A, with Knuth taking questions from the audience. After opening with a concise explanation of the significance of the number 885205232 (which I won't spoil by revealing here, since it's a puzzle in his book "Selected Papers on Fun & Games", other than noting that it involves Alan Turing's manual for programming the Ferranti Mk. I computer), Knuth fielded questions on various topics including: elegance in programming languages, the public's fear of computers, "busy beaver" numbers, the best way to teach programming to elementary schoolchildren, and whether an aptitude for programming is an art or a "genetic defect".

Throughout his answers were thoughtful, often surprising (for example, making a case for pointers in C as an elegant language feature), consistently interesting, and delivered with characteristic humour (Knuth was once published in MAD magazine, and is famous for the quote "Beware of bugs in the above code; I have only proven it correct, not tried it", amongst others). In response to a question about "what are we 'enabling the information society' to do" (a reference to the BCS's current mission statement), Knuth initially replied "to have jobs", before more seriously reflecting that "there's a long way to go improving what we already have at the moment."

Although Knuth's world of computing feels like it's a long way from the one I inhabit, it was a great privilege to see and hear such a legendary figure - in spite of his age he seems as lively as ever, both physically and intellectually, and still enjoying it - and his career is truly inspiring: when asked what he'd do differently if he had his time again, his reply was that he wouldn't change anything. "In my case," he said, "Murphy's Law hasn't worked - so many things that could have gone wrong didn't."

The What, Where and How of Open Data

2011-01-31T08:27:00.001-08:00

Last week I attended a seminar at the Cathie Marsh Centre for Census and Survey Research, given by Rufus Pollock of the Open Knowledge Foundation (OKFN) on the topic of "open data".

Rufus started by showing two example applications built using open data. Yourtopia makes use of data from the World Bank that measures individual nations progress towards the Millennium Development Goals. Visitors to the site balance the relative importance of different factors (for example, "health", "economy" and "education"), and their preferences are matched with the data in order to suggest which country meets them most closely. Where Does My Money Go? offers various breakdowns of UK government spending and presents these in a way that allows the site visitor to see (for example) how much of the tax they pay is used for things such as defence, environment, culture and so on.

Both sites are eye-catching and fun (and can provide some surprising insights), while at the same time serving more serious purposes. In the context of the seminar Rufus noted that building the two sites also highlighted some key issues when working with these kinds of datasets:

Completeness: i.e. the data are not always complete
Correctness: i.e. the data are not always correct
Ease-of-use: it can take a lot of effort to put the data into a format where it can actually be used (for example an estimated 90% of the time developing Where Does My Money Go?, as opposed to 10% actually building the site)

These issues can largely be mitigated by "open data", which has two key characteristics:

Legal openness: the data must be provided with a licence that allows anyone to use, reuse and redistribute the data, for any purpose. ("Reuse" in this context can include combining it with other datasets and redistributing that.) An explicit open licence is required (such as those offered at Open Data Commons) because the default legal position for any data - even that posted "openly" on the web - doesn't entitle someone else to reuse or redistribute.
Technical openness: the data should be in a format that means that it's easy to access and work with, that it should be possible to obtain the data in bulk, and in a machine-readable, open format. These are pre-requisites for the data to be useful in a practical sense: for example, it's not sufficient to provide the data via a website that only returns subsets of that data via a form submission.

(See the official definition at http://www.opendefinition.org/.)

The data itself can be about almost anything: geographical (for example, mapping postcodes to a latitude and longitude), statistical, electoral, legal, financial - the OKFN's CKAN (Comprehensive Knowledge Archive Network) site has many examples. The key point is that the data should not be personal - that is, it shouldn't enable individuals to be identified, either directly or indirectly.

The motivation for making data open goes back to the initial issues of completeness, correctness and ease-of-use - it can take a lot of time to assemble a dataset (for example, the Government already collects a lot data), but once the effort has been made then the added cost of releasing it is small, and then sharing it reduces the cost of merging, filling gaps and correcting errors. To make an analogy with open source software, it's a essentially Linus' Law for data: "given enough eyeballs, all bugs are shallow". Rufus also talked about a corollary to this, the "many minds" principle: the best use of the data you produce will probably be thought of by someone else (and vice versa).

One argument against openness is that it precludes the possibility of commercial exploitation in order to offset the costs of compiling the data, and is a topical point given the current economic climate. Rufus's counter-argument is that there are many other ways to fund the creation of data aside from making it proprietary, by considering the data as a platform (rather than as a product), and building on that platform to sell extensions or complementary services (such as consultancy - again there are parallels with open source software). (Some of the audience expressed also concerns that in principle at least, open data is might be used irresponsibly - but arguably if the data is available to all then it means that others could challenge that interpretation.)

The final point that Rufus's talk addressed is how to actually build the open data ecosystem. To some degree it's up to the people who hold the data, but his suggestions are:

Start small and simple (which I took to mean, start with small sets of data rather than doing everything all at once).
If you're using someone else's dataset then you can make an enquiry via the OKFN website to find out what the licensing situation is.
If you have your datasets then put them under an open data licence and can register it at CKAN so that others can find it.
"Componentize" your data to make it easier to reuse (which I took to mean, divide the datasets up into sensible subsets).
Make the case with whoever holds the data you want (government, business etc) to release it openly.

For me as a "lay person", this was a fascinating introduction to the world of open data. Not unreasonably the seminar didn't go into details of actually working with such data (I think many of the seminar audience members were researchers already familiar with the available tools). However afterwards Rufus made the point that writing a paragraph of text after looking at the data is just as valid as the slick visualisations provided by Where Does My Money Go? and other sites. Ultimately it's having open access to the data in the first place that counts.

Python North-West: The Python Challenge

2011-01-23T07:24:00.000-08:00

Last week I went to my first-ever Python North-West meeting, at the Manchester Digital Laboratory (aka MadLab). The webpage describes it as a "user group for Pythoneers and Pythonistas of all levels and ages, open to everyone coding 'the way Guido indented it'", and meetings alternate between talks and coding dojos (group coding sessions where people get to share code and ideas with the aim of improving their knowledge and skills - see http://codingdojo.org/cgi-bin/wiki.pl?CodingDojo for more information).

This particular meeting was a coding dojo and so as a group we worked through The Python Challenge (http://www.pythonchallenge.com/), which is a series of puzzles that can be solved using Python programming combined with some imagination and lateral thinking. While most people had come with their own laptops, the format that developed was for one person to "drive" the laptop connected to the overhead projector, typing in code and taking suggestions from the others.

Although I'd already looked at the first two challenges earlier in the day to get an idea of what was involved, the group setting provided a great opportunity to see how other people worked, and to learn about bits of Python that I was unfamiliar with - one example for me was being introduced to list comprehensions, which are concise ways to generate lists, e.g.:

>>> [[x,x**2] for x in vec]
[[2, 4], [4, 16], [6, 36]]

(although there were several other examples which I won't write about here so as not to spoil the challenges for others). Also, as many of the challenges began with having to figure out what the programming problem actually was, it meant that collectively we didn't get stuck for too long on any particular puzzle - I know that at least a couple would have had me completely stumped if I'd been on my own. For me personally it was also an opportunity to play with IDLE - Python's IDE - under Windows (not an environment that I've used much in the past but quite handy for this kind of exploratory programming process.)

Overall it was great to get out and interact with other Python developers in an enthusiastic and friendly atmosphere, while at the same time broadening my knowledge of the language - and now I've had a taste I'll definitely be back for future meetings.