Sunday, August 19, 2012

Inline images in HTML tags with Python

I recently discovered a neat trick for embedding images within HTML documents, really useful if you've got an application where you would like the HTML files to be portable (in the sense of being moved from one location to another) and not have to rely on also moving a bunch of related image files.

Essentially the inlining is achieved by base64 encoding the data from the image file into an ASCII string, which can then be copied into the src attribute of an <img> tag with the following general syntax:

<img src="data:image/image_type;base64,base64_encoded_string" />

For example to embed a PNG image:

<img src="data:image/png;base64,iVBORw...." />

i.e. image_type is png and iVBORw... is the base64 encoded string (truncated here for readability).

If you're familiar with Python then it's straightforward to encode any file using the base64 module, e.g. (for a PNG image):

>>> import base64
>>> pngdata = base64.b64encode(open("cog.png",'rb').read())
>>> print "<img src='data:image/png;base64,%s' />" % pngdata

And here's an example inlined PNG generated using this method:


In fact this inlining is an example of the general data URI scheme for embedding data directly in HTML documents, and can also be used for example in CSS to set an inline background image - the Wikipedia entry on the "Data URI scheme" is a good place to start for more detailed information.

There are some caveats, particularly if you're interested in cross-browser compatibility: older versions of Internet Explorer (version 7 and older) don't support data URIs at all, while version 8 only supports encoded strings up to 32KB. More generally the encoded strings can be around 1/3 larger than the original images and are implicitly downloaded each time the document is refreshed; these are definitely considerations if you're concerned about bandwidth. However if these aren't issues for your application then this can be a handy trick to have in your toolbox.

Update 24/10/2012: another caveat I've discovered since is that at least some command line HTML-to-PDF converters (for example wkhtmltopdf) aren't able to convert the encoded images, so it's worth bearing this in mind if you plan to use them. (On the other hand the PDF conversion in Firefox - via "Print to file" - works fine but can't be run from the command line AFAIK.)