Item9130: 'use bytes' causes a pregnant pause
Priority: Urgent
Current State: Closed
Released In: 1.1.0
Target Release: minor
Applies To: Engine
Component:
Branches:
Some background: for a while, pages on trunk have been fed to the browser in a fairly whippy way, but after all the content seemed to be transmitted, there was a pregnant pause. I noticed this first on a server which was set to use utf-8, but have since confirmed on iso-8859-1 as well.
The problem is that the computation of the Content-length header is wrong; it produces a result that is 4 bytes too long when viewing Main/WebHome. When you
curl
the page, you get the result:
curl: (18) transfer closed with 4 bytes remaining to read
In Foswiki/Response.pm, where the content-length is computed, it says
use bytes
It appears this is wrong, because when I comment this line out, the computation is correct and the pregnant pause disappears.
There are other occurrences of
use bytes
in the code:
/home/foswiki/trunk/core/lib/Foswiki/Prefs/Stack.pm:28:use bytes;
/home/foswiki/trunk/core/lib/Foswiki/Users/TopicUserMapping.pm:211: use bytes;
/home/foswiki/trunk/core/lib/Foswiki/Render.pm:1689: use bytes;
/home/foswiki/trunk/core/lib/Foswiki/Users.pm:397: use bytes;
The ones in
TopicUserMapping.pm, Render.pm and Users.pm are related to cUID extraction and make some sense. The one in Stack.pm looks dangerous, as it applies to the entire file, but may be required for the bitstring computations. Marking for Gilmar's feedback to confirm this. If it's needed, it should be localised to the operations where it's needed, and not apply to the whole file.
--
CrawfordCurrie - 09 Jun 2010
After a long and difficult debug, the conclusion is that it's a lot nastier than just the use bytes.
The problem is, at heart, that Perl tries to be too smart. Specifically, if you have a string that is marked as utf8 encoded, then it will "taint" any other string it comes into contact with - even if neither string has any characters that need encoding.
The specific case here was a string generated by JSON::to_json in a plugin, that was being added to the head zone. The string was marked as utf8, despite not containing any encodable characters. This ended up marking the entire output page as utf8. At the same time, the i18n code had added a bunch of language names to the output. Some of these names contained high-bit characters. When the output string was tainted to utf8, that resulted in some of these 8-bit chars being encoded. When the length of the output string was taken to compute the content-length, it was done inside a
use bytes
, so it counted the utf8 representation of the chars, which is 2 bytes. However
print
is smart; it sees that the string doesn't actually contain any characters that need encoding, and emits a single byte for each of these chars. As a result the browser has been told by the content-length to expect more chars than are actually delivered, so it hangs waiting for them to arrive.
Since a print of a truly utf8-encoded string would result in a wide-byte error on the print, there should be no problem with downgrading the character string when it is added to the output.
Also fixed the enabling of available languages from
LocalSite.cfg. I debate the whole language cache thing - that should be done in
configure
and added to Foswiki.cfg.
Note that this will also affect 1.0.9.
--
CrawfordCurrie - 09 Jun 2010
Babar suggested an improvement.
--
CrawfordCurrie - 09 Jun 2010