Item10268: Error while Indexing Attachments (Control Characters)

pencil
Priority: Normal
Current State: Closed
Released In: n/a
Target Release: n/a
Applies To: Extension
Component: SolrPlugin
Branches: trunk
Reported By: OliverSchaub
Waiting For: Main.MichaelDaum
Last Change By: OliverSchaub
Indexing certain Attachments (pdf for example) brought Errors about "control Characters":
HTTP Status 400 - Illegal character ((CTRL-CHAR, code 12))
 at [row,col {unknown-source}]: [1,75608]</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>Illegal character ((CTRL-CHAR, code 12))
 at [row,col {unknown-source}]: [1,75608]</u></p><p><b>description</b> <u>The request sent by the client was syntactically incorrect (Illegal character ((CTRL-CHAR, code 12))
 at [row,col {unknown-source}]: [1,75608]).</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.29</h3></body></html> at /WebService/Solr.pm line 180
        WebService::Solr::_send_update('WebService::Solr=HASH(0xb8d0c08)', 'XML::Generator::overload=ARRAY(0xb091938)') called at /WebService/Solr.pm line 73
        WebService::Solr::add('WebService::Solr=HASH(0xb8d0c08)', 'WebService::Solr::Document=HASH(0xb0975b4)') called at /Plugins/SolrPlugin/Index.pm line 594
        Foswiki::Plugins::SolrPlugin::Index::add('Foswiki::Plugins::SolrPlugin::Index=HASH(0xae51b70)', 'WebService::Solr::Document=HASH(0xb0975b4)') called at /Plugins/SolrPlugin/Index.pm line 575

Other illegal Characters mentioned where: "code 8" (Backspace!!) and "Unicode 0". There might be more, so we decided to "exclude" ANY Control-Characters from the Index. Adding the following line to the Subroutine "sub plainify" in "Index.pm" solved the Problem for us:

$text =~ s/\p{C}/ /g;

-- OliverSchaub - 18 Jan 2011

Found one possible cause: the use of substr() on an undecoded string. Have a try?

-- MichaelDaum - 08 Mar 2012

I'm not quite sure what exactly I should try here!

-- OliverSchaub - 29 Mar 2012

Do you still have the document that bailed out with the above error message? Try the latest SolrPlugin and tell me if the substr() fix I added cured it. Thanks.

-- MichaelDaum - 30 Mar 2012

Just saw this error again on another PDF. This time, it was "CTRL-CHAR, code 11" We have the latest "official" SolrPlugin installed.

-- OliverSchaub - 21 May 2012
 
Topic revision: r10 - 20 Sep 2013, OliverSchaub
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy