Music Tagging

The way that music files are tagged is all wrong. There must be a better solution.

Like (I suppose) most people reading this I have a large (and growing) collection of music files that I’ve (legally, of course) ripped from my CDs. The ripping software that I use writes various information about the track into data “tags” that are stored in the file. It does this by querying an online database to find out the track name, artist, album, track number and (least usefully, in my opinion) genre of the track. There are many pieces of software for manipulating these tags and I’ve recently been working on series of modules that manipulate this data in a format independent manner (so it doesn’t matter if you have an Ogg Vorbis file or an MP3 file).

But I’m becoming more and more convinced that this approach needs some work. I store all of my music files in a directory structure on a (large) hard disk that I bought expressly for that purpose. I have a directory for each artist and within that a subdirectory for each album. The album subdirectory contains the actual music files and in the artist directory I also have a number of playlist files (.m3u) which represent the albums. An m3u file is pretty unintelligent. It simply contains the names of the files that make up the album in the correct order. Most people I’ve spoken to have a similar set-up with minor variations.

This setup can create a number of problems. Most of them stem from the fact that the same track can appear on a number of different albums. And under my current system I need to store each track once for each album that it appears on. And that wastes space.

Of course, I don’t actually need to store a track multiple times. If a track appears on more than one album, I could just store it once and and reference that version of the file in the m3u files for each of the other albums that contain that track. In fact I could lose the different directories for different albums and just have a big directory containing all the tracks by each artist and just use m3u files to reconstruct each album.

The problem with that comes down to the data tags that I mentioned earlier. When I rip a track from one album, it is tagged with the name of that album and its track number on that album. When I try to link that same track to a different album there’s no way that I can include the new album information in the data tags. So when I’m playing the new album, my music player will display the wrong information for tracks that were previously ripped from other albums. You might not see that as a huge problem, but it niggles me.

The core of the problem is that the data has been modelled incorrectly. It makes no sense to try and store all of this data in a file representing the track. You actually need to push some of the data into the file that represents the album. So you need to store the album name and the list of tracks in the m3u file for that album and remove the album name and the track number from the track file.

When you think about it, that’s a much better way of doing it. In the general case, a track isn’t associated with a particular album so storing that data in the track file really doesn’t make much sense. It’s as tho’ the format was designed by someone who didn’t understand data modelling[1]. I’m going to think about this a bit more over the next few days and see how easy it would be to implement it. Of course, the current standard is implemented in huge amounts of existing software, so getting a new standard implemented anywhere might be a bit of a bugger. We can try tho.

[1] Something I’ve seen a few examples of recently – but that’s a rant for another day.

7 comments

  1. There’s a few things being done on this front. The first is being done with next generation filesystems, where you can query them like databases. BeOS was the first OS that I know of that did this, so you could create a folder that was really a search for all the MP3s where the ID3 matched “The Grey Album” or whatever. Then it doesn’t matter where the mp3s are because you can get at them by their metadata. WinFS will supposedly operate this way in Longhorn and the guys who designed BeFS are now working on Apple’s next generation file system.The second thing is http://www.musicbrainz.org/, which associates music metadata with “audio fingerprints.” These are hashes of songs that allow you to identify a specific song regardless of format as long as it’s reasonably encoded. By associating metadata with audio fingerprints a system could be queried on any metadata that matched that song. That would more or less normalize the database.

  2. The problem with moving metadata out of the files and up tot he album is that the moment you copy a file across the network, or onto a CD, it’s now lost it’s context and metadata. While tracks do exist that appear on multiple albums, I think this is probably still more of an exception to the rule; For the most part, an instance of a song belongs to one album, and so storing that data in the file isn’t a big problem.I find it interesting that you have a playlist for each album. Primarily using iTunes, I’ve grown accustomed to the mp3 player being able to dynamically show albums without needing explicit playlists.

    George: Regarding BeFS, life filesystem queries on mp3s was really great, but the data usually came from ID3 tags anyway. It would be possible to utilize BeFS to do what you want, but it would require new software as the current system still has the issue of the album and track information being stored in the mp3.

  3. After being very harsh about this post on the spool (id 4034, if you care to pick over Sunday afternoon grumpiness), candace talked me into accepting some points, and I argued her out of some. Some bullet points, then:

    * The database must be at library level, not album level.
    * iTunes (and most of the Linux iTunes clones) maintain a database.
    * This database is invariable sourced, and synced, with the id3 tags.
    * Finding albums from such a database is simple; simpler, even, than using the filesystem and m3u.

    I’m still happy to, say, shove JPGs into id3 tags to preserve the integrity of copying over the network. However, I’m also using a small database (which takes about 10m to populate with about 8000 mp3s) to let me find which mp3s need editing. (For example, the 4000+ tracks which just had id3v1(.1) tags in my collection are now tagged with v2.3, based on a SELECT from that database).

    I’m also toying with the idea of a DBD::iTunes, but I haven’t quite finished picking over the bones of DBD::Google or DBD::Mock enough to actually get anywhere yet.

  4. I’m surprised that no one has mentioned the fact that the same song on two albums is not actually the same track. Even if the tracks come from the same original recording, different albums might have different effects applied to the track (e.g. equalizations, volume normalization, etc.). Now granted that this is a very minor point and for all intents and purposes those tracks are really the same thing. But some people might want to be able to keep track of the tracks as different entities.Especially if we’re talking about differences in volume normalization, it would be very annoying to be listening to an album mastered at one volume level, and all of a sudden have the volume jump because your MP3 player pulled in the same track from a different album that had been mastered at a different volume level.

  5. I am joining the party late, but being a fairly recent arrival to the UK, I’m not clear on one point.UK does NOT support fair use for audio in any shape or form, IIRC. So, ripping to hard disk, even if it’s your own CDs, is supposedly illegal here? Is this not the case ?The reason I ask is that I embarked on a similar project and a friend made a comment that as of November last year, a law passed makes it illegal.

  6. Well, it’s good to see someone else feeling my pain.

    First, tinman, in the US, when I buy a CD, I can copy it all I want, but it has to be for my personal use.Now for the bigger point. My hard drive is 80 meg, and I have about 650 CDs (I’m a former wedding DJ)I’m ripping to it. I wish I could have just one copy of Louie Louie instead of 6. Or Come Together on the Blue CD and Abbey Road. Otherwise, I will run out of room for sure!

    Thanks for the mastering issue…didn’t really think of that. If anyone comes up with a solution, let me know. You would think Media Player would address this, but maybe not enough people complain.

    Cheers.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.