Martin's Blog

XML stream processing with Haskell

Memory-efficient processing of large XML documents requires the use of a streaming parser. This post gives an introduction to XML stream processing with the Haskell programming language, in particular to the streaming API of the xml-conduit package. It shows examples for reading, writing and transforming XML data in a conduit pipeline.

Writing binary by hand

This post is based on a talk I gave at iPres 2022 (slides). It explains how to read a file format specification (in this example, TIFF) and based on that build a minimal binary file by hand.

Rotating Matroska video

Videos created by mobile phones are often rotated by means of embedded metadata tags. Playback software that respects these tags applies the correct rotation when rendering a video while software that doesn’t only displays the tilted, un-rotated (original) video. Let’s see if we can rotate a Matroska (mkv) video using metadata! (spoiler: not really, but maybe in the future)

Useful Bash commands

A collection of random Bash commands/idioms/patterns I keep forgetting.

PDF/A validation and inconsistent glyph width information

Inconsistent glyph width information is a common cause for PDF/A validation errors, but the details are not easy to understand. This text provides the necessary background knowledge and dissects an example file.

Representing binary data

Each piece of digital data that is stored, processed or transmitted by a computer consists of nothing more than a sequence of bits. This text explains how such bit sequences are commonly displayed in a human-readable form known as binary or hexadecimal notation.

PDF Days Europe 2017

Last week I attended the “PDF Days Europe 2017” conference in Berlin. While a whole conference dedicated to one file format may sound funny it was in fact quite interesting. Here’s my personal summary of the most important topic, the soon to be published PDF 2.0 ISO standard.

Minimizing the DROID signature file

This article investigates whether DROID runs faster with a signature file that has been restricted to a limited number of file formats. In other words, we will see if it is worth to “minimize” a signature file.

Joining a Linux server to a Windows domain

This is a concise instruction for joining a Linux server to a Windows domain.

Where to put application icons on Linux

When you write a GUI application and want to install it properly on a Linux system you will ask yourself where to store the application’s icon so that it is shown in the application menu of the desktop environment. To my surprise, I found this a bit confusing. In particular, I was interested in Debian and the Gnome 3 desktop because this is what I happen to use myself. Here are what seemed to be the most relevant sources to me.

Delete image thumbnails with PowerShell

Windows Explorer stores image thumbnails in hidden files called Thumbs.db. Suppose you want to clean a large directory tree to keep just the actual image files. Here’s how.

TLS certificates in Linux

Installing TLS CA root certificates in Linux is actually quite easy. Well, at least if you know where to put the certificate files … Unfortunately, different distributions keep their certificate stores in different places. Here is a short overview on installing root certificates in Debian and Red Hat Enterprise Linux/CentOS. Other distributions based on Debian or RHEL probably handle this similar to one of the two approaces described here.

Useful ExifTool commands

When editing metadata of single image files I usually use my graphical metadata editor Verso. But when it comes to working with lots of files en masse, like shifting the date of all images in a directory by two hours, nothing beats the command line. ExifTool is great for this. Here is a random collection of handy commands.

Computing file hash values on Linux and Windows

Just some quick notes on computing hash values (aka checksums) for one or more files on Linux and Windows.

Set the Windows web proxy with PowerShell

Usually the system wide web proxy settings on Windows are configured via the (graphical) Internet Explorer’s or the System Control’s internet options panel. However, sometimes it would be nice to do this with PowerShell.

Using PowerShell behind a proxy

Suppose you try to load some resource from the web in PowerShell. If you get an “access denied” or “no network connection” error message there may be a web proxy blocking your way, demanding authentication. What needs to be done? You have to configure PowerShell’s proxy settings.

Refresh a proxy cache with Curl

If you try to download a file with Curl but keep getting wrong data there may be a web proxy getting in your way that holds an outdated version of the file in its cache.

Bash Job Control

You may think you can do only one thing at a time in Bash. This notion seems plausible – you have exactly one command prompt, so how should you run different commands at the same time? However, this notion is wrong. Bash offers a set of built-in multitasking features called Job Control.

Automatic updates in Debian

Regularly installing software updates is one of the most basic measures to keep a computer system safe. However, searching for and installing those updates is a tedious job that lends itself to some degree of automation. This article will show you how to configure automatic updates in Debian.

Mapping Caps Lock to Escape in Debian

When was the last time you used the Caps Lock key on your keyboard? Unless you like being rude on the internet or write all SQL statements in uppercase, it’s probably been a while. That’s a pity because the key itself is located very conveniently right next to the home row. (This is where your hands are centered when touch typing: ‘asdfjkl;’ on US keyboards.) Only its function is rather useless.

Mounting Windows network shares in Linux

To mount a Windows network share in a Linux system you will usually use the CIFS protocol. On Debian and RHEL/CentOS, the necessary tools are provided in the cifs-utils package.

Determining Windows uptime

The uptime of a Windows machine can be found on the “Performance” tab of the Windows Task Manager. To get the system startup time you have to resort to the command line.

Installing Debian behind a Windows proxy

The Debian “netinst” or network installer is a great way to download only the packages you need when installing a Debian system. However, as you might have guessed from the name, an internet connection is required during the installation. This might pose a problem if you find yourself installing a Debian system in a (corporate) Windows environment where internet access is restricted by a web proxy that uses Microsoft’s Active Directory for user authentication.

Using Debian behind a Windows proxy

Suppose you are running a Debian system in a (corporate) Windows environment where internet access is restricted by a web proxy. In many cases the proxy will be configured to use Microsoft’s Active Directory for user authentication as well as the rest of the environment. If you don’t want to integrate your Debian system completely into the Active Directory structure but only need internet access, you have to tell the proxy your Windows/Active Directory user credentials. The following article will show you how this can be done.

Image scaling with GTK+ 3 and Perl

Recently, I was writing a program in Perl that uses a GTK+ 3 GUI to display an image file. When I did something similar with Perl and GTK+ 2 a few years ago, there was a great image viewer widget available that provided all kinds of nice things like scaling and zooming out of the box. Sadly, I found nothing like this for GTK+ 3, so I had to resort to the basic GTK+ widgets to display image files.