VSS Encoding Issues

Determining the correct encoding

VSS does not store information about which encoding it uses to store filenames, comments, label names, etc. Rather, each client that connects to the repository uses the default encoding for that computer.

As you can imagine, this can make it very difficult to migrate this data, especially if different clients used different codepages. Subversion uses UTF-8 for all internal data, but we must know how to convert to UTF-8 during the migration.

By default, vss2svn uses the windows-1252 codepage as the source encoding during the migration. In order to use a different codepage, you must manually supply it using the --encoding switch to the migration program.

Unsupported encodings

This script can only support, by default, the encodings which are supported by the XML::Parser Perl module. It is possible to create encodings for "unsupported" codepages, but this must be done manually.

Here is a small recipe to generate your custom encoding. As an example we will create the windows-1253 encoding:

  1. Make sure you have the XML::Encoding perl module. One possible way to install it is by running the command:
    perl -MCPAN -e 'install XML::Encoding'
    
    This will install the make_encmap and compile_encoding scripts into perl's bin directory. On windows this is a batch file.
  2. Download the the conversion table for the codepage you want to use from ftp://ftp.unicode.org/Public/MAPPINGS. (We will use VENDORS/MICSFT/WINDOWS/CP1253.TXT)
  3. generate an XML file from the mapping using the command:
    make_encmap windows-1253 CP1253.TXT > windows-1253.xml
    
  4. Edit the file with your favorite text editor and add expat='yes to the first line.
    It should look like this:
    <encmap name='windows-1253' expat='yes'>
    
  5. From that xml, create a perl encoding file with the following command:
    compile_encoding windows-1253.xml
    
    It will generate a windows-1253.enc file.
  6. Put this file into PERLs encodings directory: {path to PERL}/site/lib/XML/Parser/Encodings
  7. Try to run your conversion again.

See also

Encodings included with Vss2Svn

In addition to the encodings included with XML::Parser by default (see above), Vss2Svn now also includes the following encodings:

  • Windows-1253: Modern Greek
  • Windows-1251, KOI8-R, CPP866: Russian

For a full list, see source:trunk/script/encodings

PumaCode.org recommends CVSDude for fast, professional Subversion and Trac hosting:

CVSDude.com

These ads are automatically generated by Google. Revenue from these ads helps to pay for hosting this site; however, these ads do not constitute an endorsement by PumaCode.org.