Fooling around more with the Perl Plucene module and, having created my index, I am now trying to search it and return results.
My code to create the index is here…chances are you can skip this and read on:
#usr/bin/perl
use Plucene::Document;
use Plucene::Document::Field;
use Plucene::Index::Writer;
use Plucene::Analysis::SimpleAnalyzer;
use Plucene::Search::HitCollector;
use Plucene::Search::IndexSearcher;
use Plucene::QueryParser;
use Try::Tiny;
my $content = $ARGV[0];
my $doc = Plucene::Document->new;
my $i=0;
$doc->add(Plucene::Document::Field->Text(content => $content));
my $analyzer = Plucene::Analysis::SimpleAnalyzer->new();
if (!(-d "solutions" )) {
$i = 1;
}
if ($i)
{
my $writer = Plucene::Index::Writer->new("solutions", $analyzer, 1); #Third param is 1 if creating new index, 0 if adding to existing
$writer->add_document($doc);
my $doc_count = $writer->doc_count;
undef $writer; # close
}
else
{
my $writer = Plucene::Index::Writer->new("solutions", $analyzer, 0);
$writer->add_document($doc);
my $doc_count = $writer->doc_count;
undef $writer; # close
}
It creates a folder called “solutions” and various files to it…I’m assuming indexed files for the doc I created. Now I’d like to search my index…but I’m not coming up with anything. Here is my attempt, guided by the Plucene::Simple examples of CPAN. This is after I ran the above with the param “lol” from the command line.
#usr/bin/perl
use Plucene::Simple;
my $plucy = Plucene::Simple->open("solutions");
my @ids = $plucy->search("content : lol");
foreach(@ids)
{
print $_;
}
Nothing is printed, sadly )-=. I feel like querying the index should be simple, but perhaps my own stupidity is limiting my ability to do this.
Three things I discovered in time:
I will share two scripts – one to import a file into a new Plucene index and one to search through that index and retrieve it. A truly working example of Plucene…can’t really find it easily on the Internet. Also, I had tremendous trouble CPAN-ing these modules…so I ended up going to the CPAN site (just Google), getting the tar’s and putting them in my Perl lib (I’m on Strawberry Perl, Windows 7) myself, however haphazard. Then I would try to run them and CPAN all the dependencies that it cried for. This is a sloppy way to do things…but it’s how I did them and now it works.
So what does this do…you call the script with two command line arguments of your choosing – it creates a key-value pair of the form “second argument” => “first argument”. Think of this like the XMLs in the tutorial at the apache site (http://lucene.apache.org/solr/api/doc-files/tutorial.html). The second argument is the field name.
Anywho, this will make a folder in the directory the script was run in – in that folder will be files made by lucene – THIS IS YOUR INDEX!! All we need to do now is search that index using the power of Lucene, something made easy by Plucene. The script is the following:
You run this script by calling it from the command line with ONE argument – for example’s sake let it be the same first argument as you called the previous script. If you do that you will see that it prints your second argument from the example before! So you have retrieved that value! And given that you have other key-value pairs with the same value, this will print those too! With “—seperator—” between them!