Item10748: foswiki Response::body() corrupting international chars if the underlying store is utf-8 and Site-CharSet is set to utf-8

Priority: Urgent
Current State: Needs Developer
Released In: n/a
Target Release: n/a
Applies To: Extension
Component: MongoDBPlugin
Reported By: PaulHarvey
Waiting For: OlivierRaginel, PaulHarvey
Last Change By: GeorgeClark
For example: any ndash (HTML entity: –): '–' is converted to a question-mark when MongoDBPlugin is enabled.

Currently we have $Foswiki::cfg{Site}{CharSet} = 'utf-8';

Steps to reproduce:
  • Configure with MongoDBPlugin. Don't need search/query algo's set.
  • Edit a topic in WYSIWYG
  • Switch to raw
  • place
  • Switch back to WYSIWYG
  • Save & continue
  • Observe ? instead of –
  • Disable MongoDBPlugin in configure
  • place the – character into the topic again, and save
  • Observe that – is no longer corrupted

-- PaulHarvey - 16 May 2011

The only reason I've got WYSIWYG in the steps-to-reproduce is to avoid any ambiguity with which dash to paste in. The-UTF-8-version-of-– just seems long-winded smile

WysiwygPlugin happily converts entities into the "native" charset (by design), and given that UTF-8 can represent them natively, they are converted instead of being left as entities

-- PaulHarvey - 16 May 2011

Okay, a new and improved procedure:
  • Disable MongoDBPlugin
  • Edit this topic, and save: TestNdash.txt - observe that the topic remains unchanged
  • Enable MongoDBPlugin
  • Edit the topic, and save again
  • Observe that the ndash is replaced with a '?'

-- PaulHarvey - 17 May 2011

The attached file uses UTF-8 encoding

-- PaulHarvey - 17 May 2011

this appears to be a foswiki core issue - I've commited a change that I suspect resolves it. Please provide feedback?

-- SvenDowideit - 17 May 2011

It now works for plain old CGI. But FastCGIEngineContrib is crashing (some regexes fail with an error mentioning "malformed utf-8", other times it's "wide character in output"

-- PaulHarvey - 17 May 2011

Same feedback I did with Paul's other task. I would think binmode is more appropriate on the output filehandle than encode, but I don't have time to test it frown, sad smile

-- OlivierRaginel - 17 May 2011

need to consider the viewfile case

-- PaulHarvey - 18 May 2011

If every open is in :utf8, then viewfile should also work. But I'll try to work on this and test this asap.

-- OlivierRaginel - 18 May 2011

See also Item10635.

-- PaulHarvey - 19 May 2011

This is so dodgy, it makes me want to weep. I personally do not favour attempting to fix this without major investment in unit tests and conversion to unicode.

-- CrawfordCurrie - 21 May 2011

I think we all agree there. The question is: what should we target to? My guess is that we should put everything to utf8, hence do all open calls in utf8 and binmode STDOUT too.

But yes, this needs serious testing, and I'll try to start by looking into how catalyst and others are dealing with it.

-- OlivierRaginel - 21 May 2011

ItemTemplate edit

Summary foswiki Response::body() corrupting international chars if the underlying store is utf-8 and Site-CharSet is set to utf-8
ReportedBy PaulHarvey
Codebase trunk
SVN Range
AppliesTo Extension
Component MongoDBPlugin
Priority Urgent
CurrentState Needs Developer
WaitingFor OlivierRaginel, PaulHarvey
Checkins distro:e9b6cceb8ebd
TargetRelease n/a
ReleasedIn n/a
I Attachment Action Size Date Who Comment
OTU2291.txttxt OTU2291.txt manage 3 K 18 May 2011 - 04:09 PaulHarvey  
TestNdash.txttxt TestNdash.txt manage 423 bytes 17 May 2011 - 00:34 PaulHarvey  
Topic revision: r12 - 03 Dec 2016, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy