This question about Upgrading from TWiki to Foswiki: Answered

CharsetConverterContrib: some characters were deleted

CharsetConverterContrib did a good job after an upgrade from TWiki 6.0.1 to Foswiki 2.1.6, except for some characters.

These characters weren't correctly converted, but gave the impression to be deleted (although the arrow left/right gave the impression that there still was a character...).
JavaScript Escape character example result remark
\xEF geïnstalleerd choice-yes  
\xEB kopiëren choice-yes  
\xB2 ²superscript two choice-yes  
\xB3 ³superscript three choice-yes  
\xB7 ·middot choice-yes  
\xE9 één choice-yes  
\xE0 à la bonheur choice-yes  
\x85 next line help  
\x80 choice-no $ and £ OK!
\xBB » choice-yes  
\x91 not applicable  
\x92 choice-no  
\x93 choice-no  
\x94 choice-no  
-- Main.StijnBousard - 04 Apr 2018

The Charset Converter or bulk copy utilities can only convert based upon what it is told is being used as the source character set. Foswiki and TWiki has always defaulted to ISO-8859-1, however they did nothing to enforce that the characters in the document are actually from that set. Most users probably use Windows to access Foswiki, and it by default uses CP-1252, the "Windows" code page / character set. CP-1252 is a "superset" of ISO-8859-1 and fills in some of the gaps of the ISO character set.

See: Wikipedia:ISO/IEC_8859-1 which does not define the 0x79-0x9F range. Wikipedia:Windows-1252 shows the additional characters.

I'm guessing that you used the default ISO-8859-1 character set when you ran the converter. It should have flagged warnings when it encountered the unknown characters, but due to the amount of output these can be easy to miss. If you have a backup of the old installation, you could run the converter again against the original data and specify the options that override the charset.

Never run the converter on data that's already been converted! Running the converter a second time will corrupt utf-8 characters. Unfortunately we don't have any tool that can easily fix a topic containing a mixture of utf-8 and non-utf-8 characters. If you cannot get back to the original pre-conversion data, then the only solution I'm aware of is to manually edit the topics to replace the incorrect text.

-- Main.GeorgeClark - 04 Apr 2018 - 14:56

 

QuestionForm edit

Subject Upgrading from TWiki to Foswiki
Extension CharsetConverterContrib
Version Foswiki 2.1.6
Status Answered
Related Topics Utf8MigrationConsiderations
Topic revision: r3 - 25 Jul 2019, StijnBousard
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy