The documentation says to look at this page: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches
But it hasn’t been very helpful.
I’ve downloaded rich.patch (http://wiki.apache.org/solr/UpdateRichDocuments#Updating_a_Solr_Index_with_Rich_Documents_such_as_PDF_and_MS_Office) and I’ve cd’d into my solr home directory. I tried to run the following command:
patch -p0 -i rich.patch
And it just asks me which file I want to patch. For example, it would say this:
can't find file to patch at input line 2681
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
Index: example/solr/conf/solrconfig.xml
When it prompted me, then, for the file I wanted to patch, I just typed in the path to my solrconfig.xml file, “C:\xampp\solr\conf\solrconfig.xml”
When I do this, it successfully updates my java/org folder to contain ExcelParse.java, PowerPointParser.java, etc. But when I try to post a word document using “java -jar post.jar .“, I get the error
FATAL: Solr returned an error #400 Bad Request
The rich.patch has been fixed with Solr 1.4 version and you should be able to parse and index Rich documents with Solr out of the box without any patches.
As Mauricio mentioned check out ExtractingRequestHandler
Also check out : –
posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika
indexing-rich-files-into-solr-quickly-and-easily