Item10268: Error while Indexing Attachments (Control Characters)
Priority: Normal
Current State: Closed
Released In: n/a
Target Release: n/a
Applies To: Extension
Component: SolrPlugin
Branches: trunk
Indexing certain Attachments (pdf for example) brought Errors about "control Characters":
HTTP Status 400 - Illegal character ((CTRL-CHAR, code 12))
at [row,col {unknown-source}]: [1,75608]</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>Illegal character ((CTRL-CHAR, code 12))
at [row,col {unknown-source}]: [1,75608]</u></p><p><b>description</b> <u>The request sent by the client was syntactically incorrect (Illegal character ((CTRL-CHAR, code 12))
at [row,col {unknown-source}]: [1,75608]).</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.29</h3></body></html> at /WebService/Solr.pm line 180
WebService::Solr::_send_update('WebService::Solr=HASH(0xb8d0c08)', 'XML::Generator::overload=ARRAY(0xb091938)') called at /WebService/Solr.pm line 73
WebService::Solr::add('WebService::Solr=HASH(0xb8d0c08)', 'WebService::Solr::Document=HASH(0xb0975b4)') called at /Plugins/SolrPlugin/Index.pm line 594
Foswiki::Plugins::SolrPlugin::Index::add('Foswiki::Plugins::SolrPlugin::Index=HASH(0xae51b70)', 'WebService::Solr::Document=HASH(0xb0975b4)') called at /Plugins/SolrPlugin/Index.pm line 575
Other illegal Characters mentioned where: "code 8" (Backspace!!) and "Unicode 0". There might be more, so we decided to "exclude" ANY Control-Characters from the Index.
Adding the following line to the Subroutine "sub plainify" in "Index.pm" solved the Problem for us:
$text =~ s/\p{C}/ /g;
--
OliverSchaub - 18 Jan 2011
Found one possible cause: the use of
substr()
on an undecoded string. Have a try?
--
MichaelDaum - 08 Mar 2012
I'm not quite sure what exactly I should try here!
--
OliverSchaub - 29 Mar 2012
Do you still have the document that bailed out with the above error message? Try the latest
SolrPlugin and tell me if the
substr()
fix I added cured it. Thanks.
--
MichaelDaum - 30 Mar 2012
Just saw this error again on another PDF. This time, it was "CTRL-CHAR, code 11"
We have the latest "official"
SolrPlugin installed.
--
OliverSchaub - 21 May 2012