Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9030063
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T07:23:49+00:00 2026-06-16T07:23:49+00:00

I am using Lucene.Net + custom crawler + Ifilter so that I can index

  • 0

I am using Lucene.Net + custom crawler + Ifilter so that I can index data inside blob.

foreach (var item in containerList)
            {
                CloudBlobContainer container = BlobClient.GetContainerReference(item.Name);
                if (container.Name != "indexes")
                {
                    IEnumerable<IListBlobItem> blobs = container.ListBlobs();
                    foreach (CloudBlob blob in blobs)
                    {
                        CloudBlobContainer blobContainer = blob.Container;
                        CloudBlob blobToDownload = blobContainer.GetBlobReference(blob.Name);

                        blob.DownloadToFile(path+blob.Name);
                        indexer.IndexBlobData(path,blob);
                        System.IO.File.Delete(path+blob.Name);
                    }
                }
            }
/*Code for crawling which downloads file Locally on azure instance storage*/

The below code is indexer function which uses IFilter

public bool IndexBlobData(string path, CloudBlob blob)
    {
        Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();
        try
        {
            TextReader reader = new FilterReader(path + blob.Name);
            doc.Add(new Lucene.Net.Documents.Field("url", blob.Uri.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.NOT_ANALYZED));
            doc.Add(new Lucene.Net.Documents.Field("content", reader.ReadToEnd().ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED));
            indexWriter.AddDocument(doc);
            reader.Close();
            return true;
        }
        catch (Exception e)
        {
            return false;
        }
    }

Now my issue is I don’t want to DOWNLOAD file on instance storage.. I directly want to pass the File to FilterReader. But it takes “Physical” path, passing http address doesn’t work. Can anybody suggest any other workaround? I don’t want to download same file again from blob and then index it, instead i will prefer download and keep it in main memory and directly use index filter.

I am using IFilter from here

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T07:23:50+00:00Added an answer on June 16, 2026 at 7:23 am

    It is not very clear what do you mean by I don't want to download same file again from blob and then index it, instead i will prefer download and keep it in main memory and directly use index filter? What is that main memory – the Azure Blob storage, or local instance memory.

    The issue you are facing however cannot be workaround-ed, because of the nature of IFilter interface. If you look a bit deeper into the source you are using from here, you will discover that under the covers it uses IPersistFile COM interface. Unfortunately this interface only works with local files and does not accept streams.

    What I would have suggested is to use Stream from Blob and pass it to the Reader, instead of the physical path. However, as already said – IFilter uses COM interfaces which work only with physical paths. So with your current approach there is no way to skip blob downloading.

    There is nothing scary about downloading blobs locally. If the storage account is in the same affinity group as the compute, the download will be extremely fast, the traffic will be free. Given you use a small instance size, you will have 165GB for local storage. Which is plenty of storage. You can optimize your process a bit by keeping track of what is indexed and what not. You can use Azure Table storage for that. Another extremely fast and cheap storage solution which is perfect for storing key-value pairs as file name – etag. Then when you enumerate the blobs, first fetch the etag for a blob and check with the table whether it is already indexed or not. Download it only if it is not indexed, then add new record to the Table to mark this file as indexed.

    Or… Or don’t use IFilter. I don’t see any benefit of using IFilter on Azure. IFilters are only registered when the Application is installed. For instance if you want to process Office documents with IFilter – you have to install Microsoft Office on the VM (which currently you can’t do, even if you have license, because of license mobility restrictions for MS Office). If you want to get the IFilter for PDF – you have to install Adobe Acrobat Reader (which you can do via a startup task). And so on, and so on – some applications you can install, some you can’t. Your Windows Azure VM Instance is plain Windows with no IFilters at all. Imagine a basic installation of Windows Server 2008 R2, with no roles and no features added – that is your instance.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a single XML file that I want to index using Lucene.NET. The
In Azure how can multiple Web Roles read from the same Lucene.net index that
I am using Azure Library for Lucene.Net to index and search data. My webrole
I am using a Lucene.Net index and want to give the user an option
Firstly, I must say that the version of Lucene.NET we are using is not
Situation:I have an ASP .NET application that will search through docs using Lucene. I
I'm using Lucene.NET 3.0.3 How can I modify the scoring of the SpellChecker (or
Where I can download example source code of project with using Lucene.NET for ASP.NET
I am using Lucene .net to index eml file with eml content, id and
We use Lucene.net for indexing. One of the fields that we index, is a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.