Don’t run away screaming in terror. Linux audio is easier to understand than the, er, boot system, as these two pages will demonstrate.
Hasn’t PulseAudio been with us since the dawn of time?
Yes, it’s been around for over a decade. PulseAudio has been quietly doing its job for years (we’re on the cusp of version 6.0 in February 2015) amid a cacophony of debate surrounding its complexity and effectiveness. And that job is providing to get sound out of your computer.
Why have you chosen to cover it now?
There are huge chunks of Linux that we think need demystifying, especially when the technology is vital and current and still being actively developed. Sound is one of computing’s fundamental senses, along with video. We’re also certain that lots of users don’t appreciate what PulseAudio does and why it’s so important.
Didn’t the original Sound Blaster sound card do a decent enough job for sound?
Ah, the venerable Sound Blaster expansion card – bringing audio to PCs since 1989. Things were simpler then – you’d typically run only one thing at a time, and that one thing would talk to your Sound Blaster directly. If you had a competing Gravis Ultrasound card, for example, you’d have to make sure that whatever you were running supported it. And you also had to worry about IRQ and DMA addresses. Sound is more complex now, but at least we don’t have to worry about IRQs.
But why has sound become so complicated?
We think the best way of explaining this is to use a visual metaphor. Both video and audio suffer from many of the same problems. With video, the solution to running lots of different applications at once is a desktop and window manager. These surround the things you want to run and allow them all to share the same screen. This is what PulseAudio does for audio. It allows lots of different applications and processes to share your audio hardware, allowing the user to change their position – or their relative level – in the final output mix. This metaphor can be extended indefinitely: audio sample rate is synonymous with frame rate – 44100 samples per second for compact discs. The number of bits used to store a sample is equivalent to the bit depth of your display. CDs use 16 bits, or 65536 different levels. Most video hardware outputs 24-bit depth – 16.7 million colours. Anti-aliasing, re-sampling, error rates and compression all have analogous causes and solutions in video and audio processing.
But what has all this got to do with PulseAudio?
In those years since the Sound Blaster, and we know Linux came a few years later, Linux experimented with all kinds of different audio frameworks – OSS, ALSA, ESD, Jack, GStreamer, Xine and many more. To a greater or lesser extent, they were all trying to simplify audio by hiding the complexity of what they were doing from the user. PulseAudio encapsulates some of these frameworks and ideas, and tries to augment them with a modern network transparent framework that’s responsive and powerful while remaining simple enough for anyone to use. Mostly, it succeeds and it’s only when you need some specific configuration that you notice its complexity, or its inability to play nicely with other audio frameworks. That’s when it helps to understand a little of what it’s trying to do.
What is PulseAudio trying to do that’s so special?
If PulseAudio had a mission statement, it would be something simple like, “Initiate a sound and hear it.” And that’s what makes understanding what it does relatively difficult. You hear the results, but you don’t see how they’ve been produced. To be able to perform this one simple task, PulseAudio needs to bridge the many layers of Linux audio, some of which we just mentioned, and it starts at the very bottom – talking to the hardware.
Isn’t that where you’d normally need a driver?
Exactly, yes. PulseAudio talks to audio drivers directly. But these drivers aren’t specific to PulseAudio. Instead, it uses the part of the ALSA framework that talks to your audio hardware – the bit embedded within the kernel. ALSA, like PulseAudio, bridges several different layers, which may be why there’s so much confusion. Above the hardware/driver layer, ALSA is replaced by PulseAudio, because only one framework can access the hardware at any one time. That means your audio hardware only needs to support ALSA, as used within the Linux kernel, and it supports PulseAudio too. This can lead to confusion as you can’t run ALSA alongside PulseAudio, at least not without some manual intervention. So while both use the ALSA driver element, only one framework can talk to that driver. If you do want to use ALSA-only, you have to run it through PulseAudio or kill of PulseAudio’s access to your hardware.
Alongside drivers that talk to your hardware, PulseAudio is also capable of talking to your network. This is a little-used feature, but it means that you should be able to use any of your machines running PulseAudio as audio input and outputs for another machine on the network. You could have Spotify installed on one machine, for example, and play its output via another machine. It’s also the perfect example of how well engineered PulseAudio is – the engine is running independently of the input and output hardware.
How does this all come together on your desktop?
On top of the driver layer there’s PulseAudio’s core. This is where the real work is done, connecting the capabilities of your hardware drivers to the software you’re running on the desktop. Using our audio and video metaphor, this is where all the images are combined with the output capabilities and sent on their way. To ensure that PulseAudio can be as compatible as possible with older systems, a library layer sits atop the core, allowing applications that know nothing about PulseAudio to still play their audio through a compatibility layer. This is how native ALSA applications behave, for example, as the ALSA library is replaced by one that talks to PulseAudio instead. But the main desktops are also capable of talking directly to the core and managing which parts of their own sub-system are sending audio. This is what Ubuntu’s Unity is doing, for example, and that’s why you have a good degree of control over what applications are currently playing back and how loud they are. Gnome is the same.
This is great, but it still doesn’t give me access to all that power you’ve been talking about.
The ultimate in PulseAudio control comes from the command line, and specifically, a command called pactl. For example, typing pactl load-module module-raop-discover will load a dynamic network discovery module that will detect any AirPlay devices on your network (as long as roap is compiled into your version of PulseAudio, which it should be). Pavucontrol should now list any AirPlay devices, such as Kano or XBMC, and allow you to select these as outputs, enabling you to play music from one machine that outputs on another, and there are many more different modules and combinations.
This sounds straightforward, should I be worried if PulseAudio no longer scares me?
Not at all. When it works it’s brilliant and you don’t have to worry about it. But hopefully, you’ve got some appreciation of its complexity and how it’s performing all this magic, as well as knowing it can be used for some advanced trickery too. If you want to know more about the advanced stuff and grab version 6 as soon as it’s been released, PulseAudio lives at