Dmitry Leskov
 

Converting Local freedb From UTF-8 To ANSI For Use With Exact Audio Copy

I have purchased a new home PC recently. Having lots of disk space at my disposal has prompted me to rip my CD collection using a lossless codec so as to (a) have a backup and (b) put CDs away in boxes and save some money and living room space on a CD cabinet.

After some online (re)search, I have decided to go for FLAC as the encoding format and EAC as a tool to do the job. See this and this pages for instructions on configuring EAC to use FLAC.

EAC wrongfully displaying non-ASCII Latin characters as CyrillicEAC’s ability to retrieve artist, album, and track names from freedb, had already saved me a lot of time when I inserted a non-English CD into the drive. Some letters on the screenshot definitely do not belong to the French alphabet, are they?

The problem is that there is no uniform encoding in freedb, while it accepts submissons from many tools running on different platforms, so the non-ASCII characters are in the native encoding for the platform from which the disk information was submitted. At the same time, EAC does not have support for multiple encodings and expects CD data to be in the Windows ANSI codepage. As my ANSI codepage is Windows-1251, any non-ASCII Latin characters were displayed as Cyrillic.

As you surely have guessed, I had solved the problem by temporarily switching the code page to Windows-1252, ripping the few French, Italian, etc. CDs, and restoring the code page before getting to the Russian CDs.

Question marks all overAll of a sudden, one Russian CD information displayed as question marks. Another dose of Internet search and I have discovered that the recommended encoding for freedb is actually UTF-8. There is even a program that converts a local copy of freedb to UTF-8. But while EAC can work with local freedb, it has no clue about UTF-8, so I needed to perform exactly the opposite transformation for strings containing characters from my ANSI codepage.

Fortunately, freedb is just a set of plain text files, so I ended up writing a short filter program in C and a script to run it against all database files. The conversion algorithm is pretty straightforward: try to convert each input string from UTF-8 to Unicode and then to the current Windows code page, and if both conversions succeed, output the result, otherwise output the original string. (For ASCII strings, the conversion always succeeds but produces an identical string, so it does not matter which string to output.)

Download the source code and freedb scripts

Here is the filter source. NOTE: I assumed that there are no lines longer than 1024 bytes in freedb files. If you use this program for any other purpose, you may need to add buffer overrun protection.

#include <windows.h>
#include <stdio.h>

int main (void) {
	char in[1024], out[1024];
	WCHAR wBuf[1024];
	while (gets(in)) {
		int wLen =
			MultiByteToWideChar(
			CP_UTF8,
			MB_ERR_INVALID_CHARS,
			in,
			-1,
			wBuf,
			1024);
		if (!wLen) {
			DWORD err = GetLastError();
			if (err == ERROR_NO_UNICODE_TRANSLATION) { // Not UTF-8
				puts(in);
				continue;
			}
			else exit(err);
		}
		BOOL dcFlag; // Flag indicating whether the default char was used
		int aLen =
			WideCharToMultiByte(
			CP_ACP,
			WC_NO_BEST_FIT_CHARS,
			wBuf,
			wLen,
			out,
			1024,
			"?",
			&dcFlag);
		if (!aLen) exit(GetLastError());
		else if (dcFlag) { // There were Non-ANSI characters
			puts(in);
			continue;
		}
		else puts(out);
	}
} 

P.S. During testing, I have noticed the regular usage of the Unicode right single quotation mark (UTF-8 E2 80 99) instead of the ASCII apostrophe character in otherwise English album and track titles in freedb, so you may wish to use my solution even if you only have English CDs in your collection.

Not quite decipherableP.P.S. The above did not quite work for one of my Russian CDs, but I have noticed album and track names containing too many characters, some of which were not question marks, so I guessed that they had somehow undergone the Windows-1251 to UTF-8 conversion twice. I have identified the respective database file and run it through the above program again.

Correctly displayed Russian track titlesIt worked:

« | »

Talkback

  1. Alex Y.
    15-Dec-2010
    11:37 pm
    1

    Hi Dmitry. Thank you very much for your hack! It works for me as well. Alex

  2. Денис
    19-Aug-2011
    8:47 pm
    2

    Дмитрий, добрый день!
    Не совсем понял, скрипт нужно запускать в папке EAC? Или уже когда всё перекодировалось и заголовки кривые? Сорри. совсем не программер, да ещё и на винде:(

  3. Dmitry Leskov
    31-Aug-2011
    8:54 pm
    3

    Денис, скрипту надо указать путь до локальной копии freedb, например:

    freedb-convert.bat C:\freedb

    А уже потом запускать EAC.

* Copy This Password *

* Type Or Paste Password Here *