Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 128421
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T05:37:58+00:00 2026-05-11T05:37:58+00:00

I need to develop an IFilter for Microsoft Search Server 2008 that performs prolonged

  • 0

I need to develop an IFilter for Microsoft Search Server 2008 that performs prolonged computations to extract text. Extracting text from one file can take from 5 seconds to 12 hours.

One idea to doing this is creating a preprocessing application.

How do I design such an application? Specifically: – how do I connect the Search Server crawler to my application? – how do I feed extracted text into Search Server once extraction is complete?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T05:37:58+00:00Added an answer on May 11, 2026 at 5:37 am

    First you will need to code the IFilter itself.

    This article is quite good and it references some good articles too. IFilter.org Also see this set of articles

    Next is the issue of how to pre-process. The easiest way I can think of is to create a FileSystemWatcher to kick of the pre-processing of the document.

    The pre-processor can parse the text from the document and store it somewhere.

    That ‘somewhere’ becomes the next issue and that is primarily a business kind of decision. If the directory for the documents is okay to add to, I would add an Index directory in each folder as documents are parsed and store a file such as [OriginalFilenameSansExtemsion]_index.txt inside.

    If that is not possible, create an Index folder on each drive and mirror the directory structure as needed. At the end of the day, all you need is for the IFilter to be able to determine based on the filename of the file being indexed where to look for the text document with it’s pre-processed content.

    When the IFilter runs, Init is called. When that happens, simply load the text document and return it’s contents as the GetChunk, GetText and GetValue functions are called.

    This solution will end up with an implicit dependancy between the pre-processor and the IFilter as they will both store thier own way of ‘finding’ the index document.

    It should be possible to store the location of index documents in some shared configuration location.

    Update How will the IFilter method be called under Search Server? Once created, the IFilter will have to be installed on the indexing server (i.e. the relevant dll will have to be registered). Using this article as a guide, as part of your implementation, you will have given your filter a unique guid for it’s CLSID. The registration process will then be similar to that, just using a different extension and guid.

    STEP 1: COM REGISTRATION

    1.Add Registry key: HKEY_CLASSES_ROOT\CLSID\ ThreadingModel : Both

    STEP 2 : REGISTER IFILTER WITH OS

    There are 4 steps to registering the filter-extension mapping with OS:

    1. HKEY_CLASSES_ROOT\<.ext>(Default) –>
    2. HKEY_CLASSES_ROOT\(Default) –>
    3. HKEY_CLASSES_ROOT\\PersistentHandler(Default) –>
    4. HKEY_CLASSES_ROOT\\PersistentHandler\PersistentAddinsRegistered\IID_IFilter\ (Default) –>

    Now we’re all set to regiter our product with WSS (Windows Sharepoint Services) or MOSS( Microsoft Office Sharepoint Server).

    STEP 3: REGISTER FILTER EXTENSION WITH MOSS

    1. Add the filter-extension to the File types crawled: Start -> Program -> Microsoft Office Server -> SharePoint 3.0 Central Administration -> -> Search Settings -> File Types -> New File Type (Add extension here)

    2. Add the following registry keys:

      [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0 \Search\Applications\\Gather\Portal_Content\Extensions\ExtensionList]

      [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\Filters.ext] Default = (value not set) Extension = FileTypeBucket REG_DWORD = 0x00000001 (1) MimeTypes =

      [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension.ext] Default REG_MULTI_SZ = IFIlter CLASSID

    3. Finally, recycle the Search Service by executing the following command from the command window:

      D:> net stop osearch

      D:> net start osearch

    Does the Search Server passes an URL and not the local file name? The LoadIFilter function is where you will have the pathname for the file. It is here that you create the instance of the IFilter that reads the indexed text instead of the actual file.

    What will I do if it calls IFilter::Init for an URL which is not indexed yet? If the indexed file does not exist, you will not be able to index, so return one of the available error codes.

    A pre-processing application will need to extract the text from a document if that takes a long time. The text will need to be stored where the IFilter can access it when it comes to process the file during the LoadIFilter function (which is passed the url/filepath of the file by the search application). Using the url/filepath of the file, the Ifilter must be able determine where the previously extracted text is. When the IFilter then can load the text and parse it instead of the ‘actual’ file. Bypassing the need for long search crawl times.

    If you aren’t going to get the pre-processor to do entire sites, it would take multiple passes of the search crawler to get what you require. Assume the crawler is doing an incremental crawl every evening. The first day a file is added, the incremental crawl picks up the file and passes it to the LoadIFilter. The function looks and cannot see any pre-processed text for the file, so it adds the path to a config file (or list) and returns an error code. The file does not get added to the search results. The pre-processor, at a different time, looks at the config list sees that there is a file to be processed and starts the work. when it finished, it stores the text and removes the file from the config list. The next time the crawler runs, it will find the file and its stored text for parsing.

    This process is starting to get a bit complex and I would worry about the crawler and the pre-processor having to communicate so well. Also, the incremental crawl may need the pre-processor to ‘touch’ the file once it has had it’s text extracted.

    At this point, it may be best to develop something and see how what happens as so far this is just a theoretical algorithm.

    Hope this is helpful.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to develop a software that selects a face from a photo where
I need to develop a generic jQuery-based search plugin for the ASP.NET MVC application
I need to develop an application which stores data in a SQL Server 2005
I need to develop a process that will detect if the users computer has
Consider the need to develop a lightweight desktop DB application on the Microsoft platforms.
I need to develop a small Flex application that needs to access a MySQL
I need to develop a small-medium sized desktop GUI application, preferably with Python as
I need to develop a system for storing large numbers (10's to 100's of
I need to develop a CRM system which will allow users to have a
I'm a big fan of Capistrano but I need to develop an automated deployment

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.