Item9130: 'use bytes' causes a pregnant pause

pencil
Priority: Urgent
Current State: Closed
Released In: 1.1.0
Target Release: minor
Applies To: Engine
Component:
Branches:
Reported By: CrawfordCurrie
Waiting For:
Last Change By: KennethLavrsen
Some background: for a while, pages on trunk have been fed to the browser in a fairly whippy way, but after all the content seemed to be transmitted, there was a pregnant pause. I noticed this first on a server which was set to use utf-8, but have since confirmed on iso-8859-1 as well.

The problem is that the computation of the Content-length header is wrong; it produces a result that is 4 bytes too long when viewing Main/WebHome. When you curl the page, you get the result:

curl: (18) transfer closed with 4 bytes remaining to read

In Foswiki/Response.pm, where the content-length is computed, it says

use bytes

It appears this is wrong, because when I comment this line out, the computation is correct and the pregnant pause disappears.

There are other occurrences of use bytes in the code:
/home/foswiki/trunk/core/lib/Foswiki/Prefs/Stack.pm:28:use bytes;
/home/foswiki/trunk/core/lib/Foswiki/Users/TopicUserMapping.pm:211:    use bytes;
/home/foswiki/trunk/core/lib/Foswiki/Render.pm:1689:            use bytes;
/home/foswiki/trunk/core/lib/Foswiki/Users.pm:397:    use bytes;

The ones in TopicUserMapping.pm, Render.pm and Users.pm are related to cUID extraction and make some sense. The one in Stack.pm looks dangerous, as it applies to the entire file, but may be required for the bitstring computations. Marking for Gilmar's feedback to confirm this. If it's needed, it should be localised to the operations where it's needed, and not apply to the whole file.

-- CrawfordCurrie - 09 Jun 2010

After a long and difficult debug, the conclusion is that it's a lot nastier than just the use bytes.

The problem is, at heart, that Perl tries to be too smart. Specifically, if you have a string that is marked as utf8 encoded, then it will "taint" any other string it comes into contact with - even if neither string has any characters that need encoding.

The specific case here was a string generated by JSON::to_json in a plugin, that was being added to the head zone. The string was marked as utf8, despite not containing any encodable characters. This ended up marking the entire output page as utf8. At the same time, the i18n code had added a bunch of language names to the output. Some of these names contained high-bit characters. When the output string was tainted to utf8, that resulted in some of these 8-bit chars being encoded. When the length of the output string was taken to compute the content-length, it was done inside a use bytes, so it counted the utf8 representation of the chars, which is 2 bytes. However print is smart; it sees that the string doesn't actually contain any characters that need encoding, and emits a single byte for each of these chars. As a result the browser has been told by the content-length to expect more chars than are actually delivered, so it hangs waiting for them to arrive.

Since a print of a truly utf8-encoded string would result in a wide-byte error on the print, there should be no problem with downgrading the character string when it is added to the output.

Also fixed the enabling of available languages from LocalSite.cfg. I debate the whole language cache thing - that should be done in configure and added to Foswiki.cfg.

Note that this will also affect 1.0.9.

-- CrawfordCurrie - 09 Jun 2010

Babar suggested an improvement.

-- CrawfordCurrie - 09 Jun 2010

 
Topic revision: r11 - 04 Oct 2010, KennethLavrsen
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy