Bachify: A curated Spotify playlist of all* Bach pieces

We live in an amazing era. Anybody with a sufficient Internet connection, even a wireless connection, can listen to nearly any piece of music they want to.

I vaguely remember a TV commercial from the mid ‘90s, advertising a telecom company by way of advertising the “Information Superhighway”. A weary traveler in a run-down bar in the desert asks the bartender something like “What kind of music you got here?” The bartender answers “Everything.” The traveler asks for a particular recording of a particular piece of classical music, and the bartender immediately puts it on on the cyber-jukebox.

It sounded ludicrous at the time. It’s reality now.

The particular reason I mention this is, I’ve just spent about 8 months listening to the complete works of J. S. Bach, and picking out good recordings of all of them. This wasn’t my intention at the start. At the start I just wanted to discover more good Bach pieces, and put them into a nice deduplicated playlist as I did so. But once you listen to enough Bach, something clicks, and any Bach piece has the potential to be really good. So as the list started filling in and I still wanted to hear even more Bach, I decided to finish the job.

I say “potential” because I found that the recording really matters. For every beautiful, inspired perfomance, there are ten performances that range from dull to tragically bad.

There are a few cases of compositions by Bach where I think nobody has made the right recording yet. But for most of the 1,075-ish pieces in this playlist, I was choosing from multiple great options.

Spotify is the thing that made it possible. Spotify is also the thing that made it hard, because their organization of classical music is so bad. When I was halfway done, I even had the misfortune of upgrading to a version of Spotify that dumbed down the search feature, so searching became much harder until, months later, I learned an undocumented trick that fixes it from the Spotify help forums. (More on this below.)

Spotify needs music librarians. I’m not one, but I’ll play one on the Internet. After all, I’ve just listened to all* Bach’s compositions, and I want to share with you what I found.

The playlists

Let’s get to the point now. Here’s the collection. It’s broken into six playlists to provide a bit of relief to your scrolling fingers. The fact that every* Bach piece is catalogued with a BWV number, and the BWV numbers are approximately sorted by what style of work they are, is quite helpful here. I’ll link to the playlists while describing them in broad strokes:

With these playlists, you could listen to different Bach pieces non-stop for nearly a week.

If you’d prefer a little more focus on quality over quantity — although the quality of things Bach composed is quite high overall — I also made a more-curated list of highlights.

How should you listen?

This is of course really subjective.

My recommendation is to pick a mostly arbitrary point in one of the playlists, but this point should be at the start of a BWV number, or the start of a collection such as the Goldberg Variations. Then listen for a while. When you get to the end of a piece and you want to hear something different, pick a different arbitrary point. Let’s call this “manual shuffle”.

These playlists won’t work well to listen straight through — if you did that, you’d hear all the cantatas first, then all the motets and masses… and you’d get to the instrumental pieces many days later.

In particular, if you listen straight through the BWV 250-524 playlist, you’re gonna have a bad time. You’ll hear a large number of sacred chorales in alphabetical order by their lyrics. It’s like reading a hymnal from start to finish.

Putting all these playlists into a folder and putting the whole folder on shuffle wouldn’t be a terrible thing to do. I wouldn’t entirely recommend it, because you’ll often hear something from the middle of a work out of context that way, but it is at least a convenient way to explore. If you hear a fragment of a work and it interests you, go back and listen to the whole work in order.

The highlights playlist is easier to put on shuffle and slightly better optimized for that purpose. But I still recommend the “manual shuffle” technique most of all for discovering new Bach pieces.

Abridging the recitatives

Why am I suggesting you listen to an “abridged” version of BWV 1-249 in a complete Bach playlist? In that version, I skipped some of the particularly uninteresting recitative movements. Every cantata is in there, but some of them get to the point faster in the abridged list.

It might sound sacrilegious to suggest that Bach wrote some uninteresting movements of cantatas, but he did. I’ve even seen conductors writing program notes who said the same. It’s my opinion that you shouldn’t force yourself to listen to these just because you’re listening to cantatas. Cantatas are an amazing listening experience, an experience you shouldn’t have to delay while listening to someone speak-singing in German on no particular melody just so they can get through all the text.

I think you’d include these recitatives if you actually were performing the music in a church service, as the cantatas originally were, but if you’re just appreciating the music, it’ll work better when some of the recitatives aren’t there.

There are interesting recitatives too, and there are recitatives that are musically important to get from one movement to the next. Those are there in the playlist, even in the abridged version. But you may disagree with my aesthetic choices, and that’s why I included the unabridged 1-249 playlist as well.

Searching for Bach

I’ll restate that trying to find classical music on Spotify is a pretty bad experience overall. There’s just no consistency to how things are labeled. Classical music doesn’t fit in the hierarchy that every major music player has settled on that attempts to describe every track by its “artist”, “album”, and “song”, and some ways that other systems kind of mitigate the problem don’t exist on Spotify.

In classical music, “albums” are unpredictable and arbitrary. They’re not grouped by work, they’re grouped by the whims of the label and by how much music you can fit on a CD. The “artist” is sometimes the composer, but more often the performer, especially when the performance is particularly good (look for the cello suites under “Yo-Yo Ma”, not “J. S. Bach”).

So it would almost sound like there’s no way to search for Bach on Spotify unless you already know what you’re going to find. This is the case for many composers. But in Bach’s case, the BWV catalogue can save the day.

Classical music labels have known for a long time that BWV numbers are the best way to distinguish which Bach piece is which, so nearly every Bach recording on Spotify includes the BWV number in its track title. To find all the recordings of the Canzona in D Minor, BWV 588, you can just type "BWV 588" in quotes into the search box. Hooray!

No, just kidding. It used to work that way, but now you actually find this:

Searching for many things that aren't BWV 588.

I really struggle to think of explanations for why the search works this way.

  • It’s apparently assuming I made a typo. Don’t you hate when you type “588” when you meant “881”? Good thing we have computers to correct us.

  • It’s correcting me because 588 is a rarely-listened piece, right? It would rather give me more famous pieces like… an organ pedal exercise?

  • It’s trying to emphasize quality releases like… “Digital Bach”?

If you scroll far enough down in the “Most Relevant” section, past all the not-at-all-relevant things, you’ll find recordings of BWV 588. Particularly the mediocre ones like “Spooky Bach Halloween” by Joe Fox and the Haunted Players.

Finally, at the bottom, you may find the “All Tracks” section — results that Spotify thinks are so irrelevant that they can’t even be on the same list as those titans of relevance we just scrolled past. This is where you might find some good recordings.

Hope you like this organizational scheme, because the results are not sortable. You can’t group them by album or sort them by popularity anymore.

But there’s a trick. A brilliant user named StevenR posted an incantation on the Spotify help forums that gets back the old search behavior. You type this into the search box:

spotify:search:"BWV 588"

This undocumented feature does revolutionary things like giving results that are all the correct piece, with good results near the top, and the columns are adjustable and sortable.

Much better search results.

If they ever take this away, the next version of this list might be on Rdio.

Judging albums by their cover

It seems that Bach, more than other composers, has motivated some performers to plod through an uninspired performance just to get a recording out there. The worst offenders are the compilation albums that shovel together generic recordings for some utilitarian purpose, with names like “Bach for Studying”, “Bach To Train Your Brain”, the aforementioned “Spooky Bach Halloween”, and worst of all “Smart Babies”.

Even the ones that sound well-intentioned, such as “Fifty Essential Bach Pieces”, are not going to get you the right recordings. That may sound like a hypocritical thing for me to say when I’m presenting an enormous Bach compilation. But the issue here is that the compilation albums didn’t license the best performances, just the ones that were most convenient to re-release. I can put Glenn Gould on the playlist. They can’t.

I found along the way that one really can judge a Bach recording by its cover. Here’s the cover of a recording from John Eliot Gardiner’s “Bach Cantata Pilgrimage”, a quality series of recordings of all the sacred cantatas that makes them sound fresh and new instead of fossilized:

Bach Cantata Pilgrimage album cover

If it looks like a National Geographic cover, that’s because they got a National Geographic photographer to make these inspiring humanist cover photos for the whole series of albums.

Meanwhile, this album cover is a warning that you’re going to have uninspired recordings shoveled at you:

A bad Bach album, 'Relax with Mozart Chopin & Bach'

Ugh. Finally, here’s the cover of an album that’s seriously called “Classical music for read a books”, performed by “Sweet reading music”:

A worse Bach album, 'Classical music for read a books'

Given the level of effort there, you might be able to guess what that album contains: recordings of MIDIs.

Good series of recordings

I found the cover images on Spotify to be important not just as an estimator of quality, but also because the easiest way to tell when a track is part of a series of related recordings is from the cover art.

After listening to enough recordings, here are the series I ended up choosing most of the time they were options:

  • Anything with Yo-Yo Ma on cello
  • Glenn Gould’s 1980 piano recordings
  • John Eliot Gardiner / Monteverdi Choir / English Baroque Soloists, in their “Bach Cantata Pilgrimage”
  • Masaaki Suzuki and Bach Collegium Japan, also performing cantatas
  • Kevin Bowyer on organ with Det Fynske Kammerkor
  • Simon Preston on organ, on recordings that aren’t too ravaged by time

Besides the fact that I often preferred these performers, I looked for variety when possible. Because Bach’s music is adaptable to so many instruments, I sometimes preferred performances that increased the variety of instruments, such as organ preludes featuring a trumpet, or Chris Thile playing fugues on the banjo.

I could have used Helmuth Rilling’s recordings of all the chorales (BWV 250-438). Helmuth Rilling’s ensemble is competent but unexciting, though, so I included non-Rilling recordings at many opportunities, even ones whose quality might be controversial.

Similarly, even though Gardiner and Suzuki produced excellent recordings of all the sacred cantatas, I frequently found recordings by someone else that would compare, so in those cases I tended to favor non-Gardiner, non-Suzuki recordings to increase the variety.


I’ve been saying this is a playlist of all-with-an-asterisk Bach pieces. What’s with the asterisk?

The problem is just that “all Bach pieces” is a bit hard to define. My intent is to have one recording of every authentic Bach piece from BWV 1 to 1128.

Some Bach pieces have been lost to history. Those tend not to have BWV numbers, at least, except in the BWV Anhang (the appendix). But among the ones that do have BWV numbers:

  • BWV 216 only survives in a fragmentary state, and no recordings of it are available on Spotify.

  • BWV 224 is extremely fragmentary — only 30 measures survive — and as far as I can tell, those 30 measures have not been recorded anywhere.

  • A number of chorales in BWV 250-438 correspond to organ arrangements in BWV 599-644, and sometimes their best recordings put them together in the same track.

  • Occasional recordings include pieces that aren’t in the BWV list. You’ll find a canon labeled “BWV deest” between BWV 1077 and BWV 1078 in the list, for example, because the performers think it goes there.

  • Many pieces with BWV numbers have since been deemed to be spurious — that is, not by Bach at all.

BWV 216 and 224 are missing from the list out of necessity. I skipped the BWV numbers that are considered spurious by broad consensus — okay, fine, considered spurious by and Wikipedia. If we know which other composer actually wrote a piece, I definitely left it out. In cases of doubt, I left pieces in the list.

I tried to avoid duplicating tracks, so some pieces in BWV 250-438 actually appear somewhere in BWV 599-644 in the order of the list.

So that’s what the asterisk is about. When I last checked, the list contains 1075 unique, non-spurious Bach pieces (and probably some spurious ones that we just don’t know are spurious yet). And that’s as close to “all” of them as you can get on Spotify.

To understand the list better, you can look at the Google spreadsheet I used for keeping track of the pieces I was adding, which shows the complete list of pieces and performers. The gray rows are the numbers left out because they’re spurious, and the purple rows are duplicates of a recording that appears somewhere else in the list.

What did I find when I listened to all of Bach?

I’ll be posting some follow-ups about the recordings that were particularly interesting, and the ones that are simply too good to not listen to. For now, follow the lists and enjoy the music.

I’ll end with a quote from an unlikely music critic, a great author who would have been overjoyed if he had lived to see the Internet become the world’s greatest music library:

“Beethoven tells you what it’s like to be Beethoven and Mozart tells you what it’s like to be human. Bach tells you what it’s like to be the universe.”

— Douglas Adams

Python 3.4 from scratch, in an isolated environment

Do you want to experiment with the new features of Python 3.4 without installing it system-wide? There are multiple reasons you might want to do this:

  • Your operating system doesn’t have a supported package for it yet, and installing an experimental one might cause problems later
  • You’re concerned about accidentally running into Python version conflicts when you have multiple installed versions of Python
  • You don’t have root access, or you’d prefer not to use it
  • You do everything in a virtualenv anyway, so what’s the point of having another wrong place for packages to go?

Assuming you’re on a reasonable Unix system (particularly Linux or Mac OS), you can accomplish this by building Python from the source code.

The new features of Python 3.4 make it very easy to install and start using it in an isolated way that never touches your /usr directory. Unlike Python 3.3 and earlier, you’ll be able to quickly get started using venv and pip, and you won’t be stuck in the purgatory of missing packages that venv would leave you in on 3.3.

I’ve tested these steps on Mac OS 10.9 and Ubuntu 13.10 Saucy Salamander.

Step 1: C dependencies

This is a step where it is helpful to be root, if there are dependencies that you need but don’t have yet. However, it’s not strictly essential; you could install these dependencies under a custom prefix. To keep it simple, though, I’m going to assume you can install the dependencies system-wide.

Given that this is the CPython interpreter, you’re going to need a C compiler (particularly gcc) to build it. On Mac OS, installing Xcode from the App Store gives everything that you need. On Ubuntu, you can get it with

sudo apt-get install build-essential

Python links to many external libraries to implement parts of its standard library, such as sqlite3 and readline. I’ve found that if you’re missing these libraries, it will compile, but of course you won’t be able to use those libraries. The ipython experience in particular will be terrible without those libraries.

On my Mac, I found I had nearly everything that I needed already installed. This might be because I’ve already installed Python with Homebrew before, though. I do highly recommend Homebrew as a way of setting up development on a Mac. I was missing ossaudiodev support, which I don’t plan to use anyway.

On Ubuntu, this command will install all of the library dependencies:

sudo apt-get install libc6-dev libreadline-dev libz-dev libncursesw5-dev \
     libssl-dev libgdbm-dev libsqlite3-dev libbz2-dev liblzma-dev tk-dev

Step 2: Extract and compile Python

You’re done with the hard part. The rest of the steps are things that should be as smooth as butter in Python 3.4.

Get the current source download from . Right now, that’ll give you a file called Python-3.4.0.tar.xz. Save it into a directory that you’d be happy running Python from, and change to that directory at the command line.

Then run:

tar xvf Python-3.4.0.tar.xz
cd Python-3.4.0
./configure && make

Sit back and relax for a few minutes.

At the end, if you were missing any optional libraries, Python will warn you about them. If anything shows up that you would sorely miss, go install the appropriate library, and ./configure && make again.

Step 3: Make your Python environment

We didn’t run the traditional last step, sudo make install, because we don’t need to! You’ve got everything you need to build a local Python environment right here, using Python 3’s new venv.

mkdir -p ~/.virtualenvs
./python -m venv ~/.virtualenvs/py34

To activate this environment, now or in the future, run:

source ~/.virtualenvs/py34/bin/activate

You could put it in a different directory. I chose this location because it’s compatible with Doug Hellmann’s virtualenvwrapper. If you have virtualenvwrapper installed, even from a previous version of Python, you can just run workon py34 instead. It’s totally fine with the fact that this environment was built with venv, not virtualenv.

You are now using Python 3.4. Type python and play around a bit.

But, of course, it’s not really your Python environment until you’ve got packages installed. Fortunately, pip is already set up for this new environment!

Just to be sure, type which pip. It should show you a path in your py34/bin directory. If it says something like /usr/bin/pip, either you’ve forgotten to activate your environment or something has gone terribly wrong.

Now you can use this copy of pip to install your favorite packages:

pip install requests

Let’s install IPython while we’re at it:

pip install ipython

IPython will install itself as ipython3, a compromise for the benefit of less-fortunate users who don’t have virtualenvs set up. You could symlink py34/bin/ipython to py34/bin/ipython3, because you won’t be needing ipython2 in this environment.

Deactivating Python 3.4

When the time comes that you need to work on old code instead of living gloriously in the future, all you need to do is:


The rest of your system is exactly as you left it until you activate the environment again.

The not-so-isolated, really easy, bonus version

If you do have Python 3.4 installed system-wide — maybe you’re on Ubuntu 14.04 already — then you can skip most of these steps and go straight to making the venv:

mkdir -p ~/.virtualenvs
python3.4 -m venv ~/.virtualenvs/py34
source ~/.virtualenvs/py34/bin/activate
pip install ipython

U+DEADBEEF: Why you shouldn’t trust arbitrary text encodings

A few months ago I posted a Gist about ways to totally break Python’s Unicode representation, by exploiting bug #19279 in its UTF-7 decoder.

You might not have heard much about UTF-7. It doesn’t have very much at all to do with UTF-8, the well-designed (by Ken Thompson) representation of Unicode that’s taking over the world. UTF-7 is a poorly-designed, obsolete proposal for encapsulating Unicode inside ASCII itself.

The Python bug, which is fixed now, is that the encoder doesn’t reset its state correctly when it encounters an erroneous UTF-7 sequence. You can make these errors pile up until they add up to an impossible character, at codepoint U+DEADBEEF.

(While Unicode codepoints are often represented as 32-bit integers, not all 32-bit integers are Unicode codepoints. The highest possible Unicode codepoint is U+10FFFF.)

Once you have this impossible character in a string, you can pass it to standard library functions to cause all sorts of data corruption, including making a SQLite database unreadable.

There’s one easy lesson to take from this: keep Python up to date. If you’re on Python 2.7.6 or 3.3 or later, you won’t encounter this particular bug. But then, this isn’t the last UTF-7 bug in Python.

Here’s the more important lesson, though, which applies in any programming language:

Don’t let someone else’s data tell your code what to do

When people initially uncover these encoding bugs, it is often because they are scraping the Web, or e-mails, or some other format where you can tell the recipient what encoding your text is in. There’s lots of code out there that will read that metadata and use the appropriate decoder in Python.

This is kind of dangerous. You’re letting data you didn’t create determine what your code does. If a Web page says it’s in UTF-7, it’ll use the UTF-7 decoder, which you’ve probably never used before. If it says it’s in gzip_codec and contains a gzip bomb, boom.

Some of these attacks are theoretical, but there are a few real web pages out there that — perhaps accidentally — claim to be in UTF-7 and trigger bug #19279. Really, it shouldn’t matter what the Web page you’re scraping claims: your code has no business firing up the UTF-7 decoder, which it usually doesn’t do, just because some Web page told it to.

UTF-7 is a vulnerability

Old posts and a new blog

It’s kind of been a while since I had a personal blog that I updated. I’m starting again on Octopress.

Because my previous attempt at a blog was also a static site generated from Markdown, just not one as well-planned-out as Octopress, it was actually straightforward to port over my old posts. That’s why, below here, you’ll see some posts from 2011.

Emoji are more common than hyphens. Is your software ready?

If you write software that handles text, you should expect to have your fundamental assumptions about how text works shaken up about once per decade.

It’s happening again around now. Suddenly, a large volume of text contains characters that not all software is ready to support. Three years ago, these characters didn’t exist, but they’ve been adopted so quickly that, on Twitter, they’re now more common than hyphens.

Read more on the Luminoso blog

ftfy (fixes text for you) version 3.0

About a year ago, Luminoso blogged about how to ungarble garbled Unicode in a post called Fixing common Unicode mistakes with Python — after they’ve been made. Shortly after that, we released the code in a Python package called ftfy.

You have almost certainly seen the kind of problem ftfy fixes. Here’s a shoutout from a developer who found that her database was full of place names such as “BucureÅŸti, Romania” because of someone else’s bug. That’s easy enough to fix:

>>> from ftfy import fix_text

>>> print(fix_text(u'BucureÅŸti, Romania'))
Bucureşti, Romania

>>> fix_text(u'Sokal’, L’vivs’ka Oblast’, Ukraine')
"Sokal', L'vivs'ka Oblast', Ukraine"

Read more on the Luminoso blog

Super Mao Bros.

Last night I fired up the nostalgia machine and decided to play an emulated version of Super Mario Bros. 1. It turns out that the ROM was corrupted, which would usually make it unplayable…

…except SMB1’s code is so resilient that that doesn’t happen. The usual failure mode for a corrupted version of SMB1 is that you start in a fascinating level called World 0-1, which reuses code from other levels to create a surprisingly playable level in a flooded castle. Finishing the level resets the game, though.

So I couldn’t play much of SMB1. But playing World 0 got me in the mood to experiment with other glitches by deliberately screwing with a game’s memory (in the way that a Game Genie used to do), leading to a new game I call Super Mao Bros.

BAPHL 4 post-mortem

BAPHL 4 was, of course, the largest puzzle hunt I’ve been involved in creating, and the first where I’ve ended up in charge of making the event happen. It seems to have gone very well.

The hunt would not have been nearly as good without our hordes of testsolvers, especially the three whole teams that Charles rounded up. They caught lots of mistakes, and lots of puzzles that were unreasonably hard. We spent much of the last week before the hunt cutting puzzles down to size wherever we could, and yet the three teams took 8 to 11 hours to solve the Normal Mode (admittedly with distractions), prompting us to cut even more.

There were a few cases where we did this by replacing a Hard Mode puzzle with a thoroughly-testsolved Normal Mode puzzle, which means that for much of the hunt, Normal Mode teams were solving the same puzzles as Hard Mode. This would presumably be why only one of the five Normal Mode teams finished. I apologize for that part. Normal Mode is a nice goal of BAPHL, but it’s hard to do. It means you have to calibrate the difficulty of each puzzle twice and testsolve two versions of it, and already you can never have enough testsolvers.

On the plus side, the difficulty ended up being a reasonable challenge for the 24 Hard Mode teams, with the winners finishing in 3 hours, and with at least 14 of them finishing within the total 7 hours.

The puzzles, in the form that we distributed to people, can be found on the BAPHL 4 page.

Here are my behind-the-scenes comments on the individual puzzles. (These have substantial spoilers, if you were planning to try the puzzles.)

The Speln Ncorrector

Today I was experimenting with using a machine translation toolkit on problems besides translation. My first experiment was to try to make a system that would take in a word it hasn’t seen before and guess how it was pronounced.

I think I accidentally ran something backwards, because what I got was a system that takes in the legitimate pronunciation of a word, and usually outputs a creative new spelling that it hasn’t seen before.

I have created a spelling incorrector.

Or, as it would say when I put the CMUDICT pronunciations of these words in:

Tude I wuz expermentn wth usin a mhin translaun tulkt on problems besids translaun. M furst experment wuz 2 tri 2 mak a sstm thet wood tak in a word t haznt syn byfor and gs hu t wuz pranonst.

I think I axedently ran sumthn bakwards, bkuz wut I gt wuz a sstm thet takz in the logitamt prenanseaun v a word, and ugly otputs a creatv nu speln thet t haznt syn byfor.

I hav creatd a speln ncerector.

I am not kidding. Those are the actual spellings my backwards-wired system gives for dictionary pronunciations of those words, including spelling “usually” as “ugly”, “accidentally” as “axedently”, and “to” as “2”.

The CONFUZZLE wordlist

There are lots of text files that list the words you can use in word games like Scrabble. (Some of them are more accurate than others.) But as far as I know, this is the only one that you can print on 17 double-sided sheets of letter-size paper.

The words in the list come from the excellent, public-domain ENABLE2K word list, compiled by Mendel Cooper. His website is sadly defunct, but the wordlist has become a standard for word games and puzzles.

Why would you want a wordlist on paper instead of a computer file? The same reason you’d want to play word games on a board instead of on a computer. Sometimes, you’d rather face the person you’re playing against than have your face in a laptop.

How the list works

The list only contains words, not their definitions. Still, given that there are over 170,000 words in the list, it takes some effort — beyond just using a 6-point font — to fit them on 34 pages.

This economy of printed space is accomplished through a form of human-readable compression. When many consecutive words in alphabetical order start with the same letters, as they tend to do, they are grouped together like this:

ALLOTYP- e es ic ically ies y

The grouping was chosen by an algorithm, not by a lexicographer, so sometimes words are grouped in odd ways:

NONSOL- ar id ids ution utions

I’ve also marked the differences between ENABLE and the more well-known standard, the Official Tournament and Club Word List 2 (OWL2). These markings are not authoritative, because I don’t have an electronic file of OWL2. I compiled the differences mainly from this National Scrabble Association page.

Locating the code that chose the groupings will, unfortunately, take a bit of hard-drive archaeology. I made the list in 2003, updated it for OWL2 in 2006, and updated it manually in 2011 to incorporate the National Scrabble Association’s errata to OWL2.