KinoSearchContrib Development

This is the topic to discuss development of download KinoSearchContrib

help If you need support, go to Support.KinoSearchContrib where you can ask questions and find answers to previously asked questions. warning If you want to report a bug, or a feature request, go to Tasks.KinoSearchContrib where you can see already submitted issues and where you can submit a new bug report or feature request.

Active Items

Id Summary Priority Current State Creation Date Last Edit
Item56 kinosearch - enable admin to index just a specified list of webs' attachments Enhancement Confirmed 02 Nov 2008 - 05:27 27 Feb 2010 - 01:26
Item8369 LDAP Plugin seems to be interfering with the Index initialization script Normal Waiting for Feedback 19 Dec 2009 - 06:16 11 Mar 2010 - 23:54
Item5647 Make the indexing process more robust Urgent Being Worked On 21 May 2008 - 16:26 18 Mar 2010 - 01:49
Item2308 Kinosearch should set 'search' context for the kinosearch script Low New 30 Oct 2009 - 09:40 16 Jul 2010 - 15:47
Item5581 Full text search over form fields Enhancement Confirmed 28 Apr 2008 - 19:38 26 Mar 2011 - 23:50
Item10769 Sort out logging Enhancement New 18 May 2011 - 17:02 18 May 2011 - 17:02
Item10775 When updating a topic, we reindex all attachments Enhancement New 19 May 2011 - 09:30 19 May 2011 - 09:30
Item10774 Should not store .kinoupdate in webs data directory Enhancement New 19 May 2011 - 09:32 19 May 2011 - 09:32
Item11223 duplicate results after kinoupdate via bug in KinoSearchContrib/Index.pm sub changedTopics Urgent New 31 Oct 2011 - 14:43 31 Oct 2011 - 14:43
Item11388 kinoupdate seems broken with 1.1.4 Urgent New 23 Dec 2011 - 00:09 23 Dec 2011 - 00:09
Item8083 Using 'TWiki::Store::SearchAlgorithms::Kino' with RcsLite doesn't work Urgent Waiting for Feedback 24 Mar 2009 - 13:54 21 Apr 2012 - 20:29
Item10543 KinoSearchContrib incompatible with 1.1 when used as the stores search algorithm Urgent Confirmed 25 Mar 2011 - 16:13 02 May 2012 - 14:27

Discussion

Installation examples: (just put them as you want below, and we will format it nicely and maybe merge some with the plugin documentation)

Installation on debian lenny (5.0.5 - Jul 2010)

Starting point:
  • Fresh installed debian lenny (5.0.5)
  • foswiki installed from the fosiki.com debian
  • everything up-to-date
My final aim was to install the KinoSearchContrib / KinoSearchPlugin to perform a full text search within topic attachments.

Those (KinoSearchContrib / KinoSearchPlugin) extensions have quite a lot of dependencies which are not automatically resolved by the debian packaging system.

Note1: there is a chain dependencies of 3 foswiki components:

KinoSearchPlugin depends on -> KinoSearchContrib which depends on -> StringifierContrib

This means if you look for help / dependencies etc etc check for all 3 components!

Note2: the “core” KinoSearch component is itself an external Perl Module!

The KinoSearchPlugin/KinoSearchContrib is based on the following steps:
  • Convert in plain text files “binary formats”
  • Scan those converted files and create an indexing
  • At search time, search for a match in the created index.
  • This means the indexing must be updated once in a while (cron) to be up-to-date
This means first of all you need the proper converter for each “binary file” format.

From http://foswiki.org/Extensions/StringifierContrib you learn that you need:

Converter for do c files: you need one of the 3:
  • antiword (in debian package antiword)
  • abiword (in debian package abiword)
  • wvhtml (in debian package wv)
Converter for pdf & ppt:
  • xpdf (in debian package xpdf-utils)
  • ppthtml (in debian package ppthtml)
Install the KinoSearchPlugin from the debian repo:

root ~# aptitude install foswiki-kinosearchplugin

One might thing that’s it. I just have to create the index and it will work.

Unfortunately this is not the case. In fact the dependencies are by far not all resolved.

1) In the porting of the KinoSearchPlugin from Twiki to Foswiki the KinoSearchContrib has been split into: KinoSearchPlugin + StringifierContrib. This means that if you look for the dependencies of the KinoSearchPlugin you have to look at the sum of the dependencies of all 3 extensions!

2) Not every coded dependency in the foswiki extensions is automatically coded into the .deb packaging!
  1. Sometimes is not coded at all è so even if the deb package would exist it is not installed
  2. Sometimes, no matter if the dependency is coded or not, does not exist any deb package containing the perl module/program needed. In this case unfortunately at package install time you do not get any error from apt-get/aptitude.
3) The “plugin-error-check” on System. InstalledPlugins -> Plugin Diagnostics does not really seem to work if the installation is performed via “debian”. I.e. the fact that you see “Error = none” does not mean that the extension is really working, i.e. the dependencies are all resolved.

  • To check if a Perl module is installed (IO::File in the example):
root ~# perl -e 'use IO::File; print $IO::File::VERSION."\n"'

  • To check if a deb package is installed ( wv in the example):
root ~# dpkg -l | grep wv

  • To check if a program/module is contained in a debian package, if yes which one
    • Use apt-file
In the example look the perl module CharsetDetector exists in one deb package

root ~# apt-file find CharsetDetector|grep perl
    • Search in the debian packages from the web.
To search for dependencies:
  • Extension Topic -> Contrib Info -> Dependencies
  • The same info is (normally?!) coded at the bottom of: /var/lib/foswiki/ <extension_name>_installer
For example:

root ~# tail /var/lib/foswiki/KinoSearchContrib_installer

<<<< DEPENDENCIES >>>>

Foswiki::Contrib::StringifierContrib,>0,1,perl,Required for indexing attachments

KinoSearch,>0,1,cpan,Required

Error,>0,1,cpan,Required

Time::Local,>0,1,cpan,Required

IO::File,>0,1,cpan,Required
  • Look at the options in the “configure” application

Manual dependencies solving

The sum of all the dependencies is:

Perl Module/Program

Installation Status

Deb Package

y/n

Deb Package / CPAN

File::MMagic

installed

yes

libfile-mmagic-perl

Module::Pluggable

installed

yes

libmodule-pluggable-perl

HTML::TreeBuilder

not installed

yes

libhtml-tree-perl

Spreadsheet::ParseExcel

installed

yes

libspreadsheet-parseexcel-perl

Spreadsheet::XLSX

not installed

NO

CPAN

CharsetDetector

not installed

NO

CPAN

Encode

installed

yes

perl

Error

installed

yes

liberror-perl

ppthtml

not installed

yes

ppthtml

pdftotext

not installed

yes

xpdf-utils

antiword

not installed

yes

antiword

abiword

not installed

yes

abiword

wvWare

not installed

yes

wv

wvHtml

installed

yes

wv

docx2txt

installed

yes

foswiki ??

pptx 2txt

installed

yes

foswiki ??

KinoSearch

not installed

NO

CPAN

Time::Local

installed

yes

libtime-local-perl

IO::File

installed

yes

perl-base

pptx2txt.pl

installed

yes

foswiki ??

docx2txt.pl

installed

yes

foswiki ??

xls2txt.pl

installed

yes

foswiki ??

Install with aptitude ( apt-get) all the components which are available from the debian distro:

root ~# aptitude install ppthtml xpdf-utils antiword abiword wv

root ~# aptitude install libhtml-tree-perl

Note: you do not really need to install all antiword, abiword, and wv. antiword is the default, the suggested for a linux installation and works pretty well.

Use CPAN to install the perl modules which are not available from the debian distro accepting the resolution of all the suggested dependencies. Note to use cpan you need to install make and gcc):

root ~# cpan

install KinoSearch

install CharsetDetector

install Spreadsheet::XLSX

quit

Fix Path

If you try to build the index now you still get an error.

System.KinoSearchPlugin ==> index

=Foswiki detected an internal error - please check your Foswiki logs and webserver logs for more information. =

Logfile cannot be opend in path-20100728.log.

You can see in the configure that the needed paths for log and index do not exist!

{KinoSearchContrib}{LogDirectory} /var/lib/foswiki/pub/../kinosearch/logs

{KinoSearchContrib}{IndexDirectory} /var/lib/foswiki/pub/../kinosearch/index

You should create them by hand as the correct web user:

root ~# su - www-data

www-data ~$ mkdir -p /var/lib/foswiki/pub/../kinosearch/logs

www-data ~$ mkdir -p /var/lib/foswiki/pub/../kinosearch/index

Create the index

System.KinoSearchPlugin ==> index

Now it works without error and it populates the /var/lib/foswiki/kinosearch/index directory.

Fix MS Office 2007 indexing

At this point with the default settings:

  • {KinoSearchContrib}{WordIndexer} = antiword
  • {StringifierContrib}{WordIndexer} = antiword
All the indexing and the search in the full text of the attachments seem to work except the MS Office 2007 files (tested the .docx files): I do not get any match, no matter how “easy” is the .docx file.

I test from the console the the docx2txt.pl file works.

root ~# /var/lib/foswiki/tools/docx2txt.pl ./Test5.docx

Failed to extract required information from <./Test5.docx>!

This is a symptom that unzip is not installed.

root ~# dpkg -l|grep unzip

root ~# aptitude install unzip

root ~# /var/lib/foswiki/tools/docx2txt.pl Test5.docx

root ~# ls

Test5.docx Test5.txt

root ~# cat Test5.txt

ciao ciao

OK from the console works, and if you re-index kinosearch now and make a search the docx files are included!

Note: MS Office 2007 mime.types

This is not directly correlated to this extension but I have stroke against another problem which testing this one, so I paste my notes here.

With an out-of-the-box debian + debian foswiki installation if you attach a MS Office 2007 file in a topic (one .docx for example), and afterwards you try to open the attachment with an IE8 client, the result is: instead of opening the docx file, you get a download of a zip file (note the same test with firefox will work! So you might think it is a client problem…. Well it is but the solution is a server solution).

I have found this explanation ( http://www.webdeveloper.com/forum/showthread.php?t=162526 )and solution (see: “ubuntu/apache2” contribution) on the web.

So to solve the problem. Login as root on the foswiki server

root ~# cp /etc/mime.types /etc/mime.types.orig

edit /etc/mime.types and add at the bottom the lines:

application/vnd.ms-word.document.macroEnabled.12 .docm

Restart apache:

root ~# /etc/init.d/apache2 restart

Note: foswiki itself has it’s own mime.types defined.

/var/lib/foswiki/data/mime.types

This contains a “resolution” for at least the docx pptx xlsx formats.

application/vnd.openxmlformats docx pptx xlsx

This does not look like preventing the weird behavior with the IE8 client.

I did not change anything in

/var/lib/foswiki/data/mime.types

because it did not seem to have any effect, so I kept everything “as default as possible”.
Topic revision: r3 - 30 Jul 2010, CatiaLavalle
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy