By Michael J. Hammel
While the younger set lives blissfully unimpressed by anything that preceded downloaded music, many Linux users nostalgically hang on to compact disc collections. Tools such as Gnome's Sound Juicer, Gtk's Grip, and KDE's KAudioCreator copy CDs to hard drives easily, and the modern versions of these tools let users add information about the CD to the files and directories in which they are stored. But if you don't have this information or it is outdated, mislabeled, or just plain wrong, where do you get it and where do you put it? Simply put, it all comes down to tagging.
Recently I tagged my entire 400+ CD collection using Picard, the official tag editor for the MusicBrainz project. Despite its lack of meaningful documentation, Picard is both stable and easy to use. It allowed me to tag all my music quickly, relabeled the directory structures in a more appropriate and consistent manner, and even added album art. The result is a music collection fully compliant with Amarok, MythMusic, KPlaylist, and a host of other multimedia tools for Linux.
In this article, I examine the standard tag format for audio files known as ID3, I discuss working with tags in Picard and MusicBrainz, and I walk you through these tools so you can update your audio collections quickly and painlessly right from the desktop.
Picard and MusicBrainz work on ID3 tags. ID3 [1] is a format that adds metadata - additional text information - to digital audio files. ID3v1, the first version of ID3, was created to address the lack of metadata support in the original MP3 specification [2]. It adds a chunk of data to the end of the audio file with various information, such as the album name and artist. Because this version of the standard for identifying MP3 files does not support internationalization, and because the information is often stored as text in the user's native language, many players end up displaying the extra information incorrectly. Additionally, this format used a very small chunk of metadata, which forced the truncation of long song and album names.
ID3v2 was created to address the shortcomings of ID3v1, although in reality, the two versions are not related. Whereas ID3v1 was a de facto standard with limited capabilities for appending data to files, ID3v2 is a fully accepted standard offering as much as 256MB of metadata.
If your files have ID3v2 tags, it will help you get an iTunes-like experience out of your music players. Some tools will allow you to add either ID3v1, ID3v2, or both. Where provided, choose ID3v2. Additionally, you might be offered variations on ID3v2. In this case, choose the highest value, such as ID3v2.3 or ID3v2.4.
Multiple Tag Formats |
Picard provides an option to remove another tag format known as APE from your audio files. If you find that format, enabling this option will remove it and help prevent problems that can arise from having two types of tags applied to a single file. Additionally, you can embed cover art into tags or have them saved as separate files in the album folders. Picard can apply tags to a wide variety of audio formats, including MP3, OGG, and FLAC. Picard writes ID3v2.4 tags by default, but you can configure it to write ID3v2.3. This might be necessary to work around a problem when using tagged files with iTunes, but Linux players probably won't care either way. |
Of the Linux music players I tested for ID3 compatibility - Banshee, Amarok, RhythmBox, Audacious, XMMS, VLC, MPlayer, and Xine - only Amarok appeared to have problems with the ID3 tags on my audio files. As it scanned the directory of music, Amarok printed the following message for nearly every track it found:
TagLib: ID3v2.4 no longer supports the frame type TDAT. It will be discarded from the tag.
Apparently Amarok has moved to the latest release of ID3v2 and is recognizing tag information that is no longer supported in that new release. Fortunately, it just ignores the outdated data.
Banshee, Amarok, and Rhythmbox are the iTunes-style players that show cover art, as well as additional information. Each of these must scan the music directories to create a database and utilize the ID3 information - and none of these applications share their database with any of the others.
Audacious and XMMS are simple players. Audacious shows more ID3 information and can show cover art. As far as I can tell, XMMS does neither.
VLC, MPlayer, and Xine are all media players that are more typically used for video playback. VLC will display ID3 information, but it won't display cover art unless you grab it. (I couldn't get it to use the local cover art already downloaded.) If you start MPlayer on a command line, you'll see ID3 information, but it doesn't display cover art. Xine just plays the file and doesn't display ID3 information.
Note that my tests were far from exhaustive. Most of these players will let you edit the tag information directly, but doing this manually for a large collection would take a while.
MusicBrainz [3] is a website that provides a large database of album metadata. Access to this data is offered directly through the website or through applications that can read XML [4] data. The database is user maintained, so any user can provide updates.
Picard is the Python-based cross-platform application that is used to query the MusicBrainz website for album metadata and simplifies the process of tagging your collection. The application uses an acoustic fingerprint in an effort to identify the audio files in an album and find the closest matches in the database. When you first use Picard, be sure to configure the default release country and enable use of folksonomies for genres in the options dialog (Options | Options). Folksonomies are community-based information that might improve the application of genres to audio files.
Picard opens with a folder browser on the left, a middle column for temporary identification, and a right column that shows the matched track and album data (Figure 1). First, find a folder of music files in the browser and drag it into the Unmatched Files entry in the middle column. I dragged a folder containing folders for each of Boston's three albums, and Picard immediately began to identify the audio files and posted matching albums in the right column.
Any files Picard can't match with data from MusicBrainz remain in the Unmatched Files collection. To match these, Picard couples with your web browser. Clicking on the entry under Unmatched Files fills in known information in the Original Metadata fields at the bottom of the window (Figure 2). Then click on the Lookup button, which opens a browser window in which you can enter the artist name and any other additional information. Next, click on the Search button, choose the appropriate entry, and hit the green tagger button (Figure 3). This adds the album to the window on the right; then you can drag any other tracks you have in Unmatched Files into the matching track name under the album.
If you use Lookup, usually a single CD will have multiple near-perfect matches because CDs are often released in multiple formats in different parts of the world. Thus, although the US version might have 10 tracks, the UK version of the same CD could have 11 tracks, or the order of tracks might be different. MusicBrainz does an excellent job of providing sets of almost exact matches from which to choose.
Alternatively, if the track isn't matched but Picard shows the album in the list on the right anyway, just drag the track from Unmatched Files to its matching album entry in that list. Tracks matched to the wrong album can be dragged from that album to the correct album (if shown) in the list on the right.
Installing Picard |
Picard is packaged for most popular Linux distributions; however, you might need to install extra packages to get acoustic fingerprinting. For example, on Fedora, you need to install both the picard and picard-freeworld packages. |
In the example with the Boston albums, you'll notice that album tracks are displayed with icons to the left of their track names in the right column (Figure 4). If the icon is a set of musical notes, that entry has not been matched with any files from the Unmatched Files collection. If the matching track is still listed under Unmatched Files (meaning Picard wasn't able to match it with the album entry), just drag it over the matching entry in the right column.
Matched entries (i.e., entries that were under Unmatched Files but which Picard matched to tracks and albums in the right column) display various colored block icons. The colors include green, yellow, orange, and red. Green means your audio file matches exactly the entry Picard chose for it on the basis of its acoustic fingerprint. Yellow is a near match. Orange and red indicate less accurate matches. Any entry with a reddish/pinkish tint is simply an entry that is not an exact match and might need your attention.
In some cases, Picard found a matching track but under the wrong album, which is what happened with Party and Foreplay/Long Time (Figure 5). Picard found them in a compilation album before it found them in the band's debut album. To fix this, you can drag the entry from the wrong album over the entry for the correct album. When an album has no more matches (i.e., all the entries have the musical notes icon), right-click on the album name to post a menu. To clean up some space in the list and make it easier to drag album entries around, choose Remove from the menu to remove the album entry.
If a track is a bad match, click on that entry and then use the New Metadata Lookup button to do a manual search. This will open a browser window to MusicBrainz for that track. To find the correct track metadata, view the alternate PUIDs. In many cases, bad matches only reflect multiple PUIDs, unique IDs representing entries in the MusicBrainz database, that all end up having the same information (Figure 6).
After the incorrect matches are fixed and entries under Unmatched Files are applied to the correct albums, the result looks like Figure 7. The matches are not exact, and I could go through them to fix the length and other metadata manually, but the matches are usually very close. Often the only mistake in the metadata will be the length of the audio track, which might be off by a second or two.
Once the change is complete, select all tracks and albums in the right column and click on the Save button to update the files on disk. This process can take a while, especially if your source and destination directories are on NFS mounts. As it completes processing of the file, Picard will mark the entries in the right column with green checks. The settings used in this example (set via the Options menu) will move the files, not copy them, to another directory. This makes it easier to determine what files and folders are still to be processed.
When all files have been tagged, select all entries in the right column once again and use the Remove button to clear the right column.
To tag an entire collection, you could just drag the folder containing all the music files into Unmatched Files. If you try this, it could take quite some time for Picard to complete the matching process.
Unfortunately, Picard and MusicBrainz won't match every file. If your collection is missing ID3 tags, dropping your entire collection at once onto Unmatched Files will leave a lot of unmatched files and a potentially huge number of albums in the right column. If this is the first time you are tagging your collection, do it a few folders at a time, and process fringe music (e.g., soundtracks) in even smaller groups. If your collection already has some tagging, then dragging the top folder into Unmatched Files might be more successful.
In either case, it might be wise to make a complete copy of your collection before starting the tagging process, which will require large amounts of disk space. However, you'll be happy you did it if you run into errors in how the new files are named or tagged. By working on copies, you can always start over until you have Picard configured to produce the best results.
Picard works great on files that already have some tag information and works fairly well on those without any tags. It performed best with well-known artists and albums but had more trouble identifying movie soundtracks and couldn't recognize some of my classical albums at all. Picard also crashed when I tried to import my entire collection, so you might want to consider tagging a small set of album directories at a time.
Picard and MusicBrainz were fast in performing lookups over a cable-based network connection. I noticed a difference in basic functionality between the time I first tried Picard (January 2009) and the time I wrote this article. The old interface required that I use the Lookup button to open a web page on MusicBrainz. To choose the best match manually, I clicked on a green tagger button on the website, which then updated Picard.
The MusicBrainz green tagger buttons and Picard's Lookup button are still there, but Picard is now better at automatically searching, and the Lookup and tagger buttons are now only used in manual searches. This is probably an improvement with Picard 0.11, the version I used when writing this article.
Overall, I found Picard to be easy to use - especially given the way drag and drop allows for quickly matching files to MusicBrainz tag data.
INFO |
[1] ID3: http://www.id3.org/
[2] ID3 on Wikipedia: http://en.wikipedia.org/wiki/ID3 [3] MusicBrainz: http://musicbrainz.org/ [4] XML WebService: http://musicbrainz.org/doc/XML_Web_Service |
THE AUTHOR |
Michael J. Hammel is a software engineer living with his wife, Brinda, and two Golden Retrievers in Colorado Springs, Colorado, USA. When he isn't working on grid systems or other geekery, he likes to play tennis, run his dogs around the park, drink tea with his wife, and wonder how his daughter is enjoying her first year of college. He has written more than 100 articles for numerous online and print magazines and is the author of three books on GIMP, the GNU Image Manipulation Program. |