Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8412205
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T00:36:57+00:00 2026-06-10T00:36:57+00:00

I want to index pdf attachment using Tire gem as client for ElasticSearch. In

  • 0

I want to index pdf attachment using Tire gem as client for ElasticSearch. In my mapping, I exclude the attachment field from _source, so that the attachment is not stored in the index and not returned in the search results:

mapping :_source => { :excludes => ['attachment_original'] } do
  indexes :id, :type => 'integer'
  indexes :folder_id, :type => 'integer'
  indexes :attachment_file_name
  indexes :attachment_updated_at, :type => 'date'
  indexes :attachment_original, :type => 'attachment'
end 

I can still see the attachment content included in the search results when I run the following curl command:

curl -X POST "http://localhost:9200/user_files/user_file/_search?pretty=true" -d '{
  "query": {
    "query_string": {
      "query": "rspec"
    }
  }
}'

I have posted my question in this thread:

But I have just noticed that not only the attachment is included in the search results, but all other fields, including the ones that are not mapped, are also included as you can see here:

{
  "took": 20,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.025427073,
    "hits": [
      {
        "_index": "user_files",
        "_type": "user_file",
        "_id": "5",
        "_score": 0.025427073,
        "_source": {
          "user_file": {
            "id": 5,
            "folder_id": 1,
            "updated_at": "2012-08-16T11:32:41Z",
            "attachment_file_size": 179895,
            "attachment_updated_at": "2012-08-16T11:32:41Z",
            "attachment_file_name": "hw4.pdf",
            "attachment_content_type": "application/pdf",
            "created_at": "2012-08-16T11:32:41Z",
            "attachment_original": "JVBERi0xLjQKJeLjz9MKNyA"
          }
        }
      }
    ]
  }
}

attachment_file_size and attachment_content_type are not defined in the mapping, but are returned in the search results:

{
  "id": 5,
  "folder_id": 1,
  "updated_at": "2012-08-16T11:32:41Z",
  "attachment_file_size": 179895, <---------------------
  "attachment_updated_at": "2012-08-16T11:32:41Z",
  "attachment_file_name": "hw4.pdf", <------------------
  "attachment_content_type": "application/pdf",
  "created_at": "2012-08-16T11:32:41Z",
  "attachment_original": "JVBERi0xLjQKJeLjz9MKNyA"
}

Here’s my full implementation:

  include Tire::Model::Search
  include Tire::Model::Callbacks

  def self.search(folder, params)
    tire.search() do
      query { string params[:query], default_operator: "AND"} if params[:query].present?
      #filter :term, folder_id: folder.id
      #highlight :attachment_original, :options => {:tag => "<em>"}
      raise to_curl
    end
  end

  mapping :_source => { :excludes => ['attachment_original'] } do
    indexes :id, :type => 'integer'
    indexes :folder_id, :type => 'integer'
    indexes :attachment_file_name
    indexes :attachment_updated_at, :type => 'date'
    indexes :attachment_original, :type => 'attachment'
  end

  def to_indexed_json
     to_json(:methods => [:attachment_original])
   end

  def attachment_original
    if attachment_file_name.present?
      path_to_original = attachment.path
      Base64.encode64(open(path_to_original) { |f| f.read })
    end    
  end

Could somebody help me figure out why all the fields are included in the _source?

Edit: This is the output of running localhost:9200/user_files/_mapping

{
  "user_files": {
    "user_file": {
      "_source": {
        "excludes": [
          "attachment_original"
        ]
      },
      "properties": {
        "attachment_content_type": {
          "type": "string"
        },
        "attachment_file_name": {
          "type": "string"
        },
        "attachment_file_size": {
          "type": "long"
        },
        "attachment_original": {
          "type": "attachment",
          "path": "full",
          "fields": {
            "attachment_original": {
              "type": "string"
            },
            "author": {
              "type": "string"
            },
            "title": {
              "type": "string"
            },
            "name": {
              "type": "string"
            },
            "date": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "keywords": {
              "type": "string"
            },
            "content_type": {
              "type": "string"
            }
          }
        },
        "attachment_updated_at": {
          "type": "date",
          "format": "dateOptionalTime"
        },
        "created_at": {
          "type": "date",
          "format": "dateOptionalTime"
        },
        "folder_id": {
          "type": "integer"
        },
        "id": {
          "type": "integer"
        },
        "updated_at": {
          "type": "date",
          "format": "dateOptionalTime"
        }
      }
    }
  }
}

As you can see, for some reason all the fields are included in the mapping!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T00:36:59+00:00Added an answer on June 10, 2026 at 12:36 am

    In your to_indexed_json, you include the attachment_original method, so it is sent to elasticsearch. That’s also the reason why all your other properties are included in the mapping and, consequently, source.

    See the ElasticSearch & Tire: Using Mapping and to_indexed_json question for more information on the topic.

    It seems that Tire is indeed sending the proper mapping JSON to elasticsearch — my advice is to use Tire.configure { logger STDERR, level: "debug" } to inspect what is happening and trz to pinpoint the problem on the raw level.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I've got some PDF attachments being indexed in Elasticsearch, using the Tire gem. It's
I have a single XML file that I want to index using Lucene.NET. The
I want to index PDF (and other rich) documents. I am using the DataImportHandler.
I have a large number of documents (mainly PDFs) that I want to index
I am using a Lucene.Net index and want to give the user an option
Let's say I want to index my shop using Solr Lucene. I have many
I start to using cassandra and I want to index my db with sphinx.
I want to index a large number of pdf documents. I have found a
Given the following model, I want to index the fields (sequence,stock) class QuoteModel(models.Model): quotedate
Suppose I have a tuple of (1, 2, 3) and want to index a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.