Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9311043
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 19, 20262026-06-19T01:15:56+00:00 2026-06-19T01:15:56+00:00

I have large mbox files and I am using third party API like mstor

  • 0

I have large mbox files and I am using third party API like mstor to parse messages from mbox file using hadoop. I have uploaded those files in hdfs. But the problem is that this API uses only local file system path , similar to shown below

MessageStoreApi store = new MessageStoreApi(“file location in locl file system”);

I could not find a constructor in this API that would initialize from stream . So I cannot read hdfs stream and initialize it.

Now my question is, should I copy my files from hdfs to local file system and initialize it from local temporary folder? As thats what I have been doing for now:

Currently My Map function receives path of the mbox files.

Map(key=path_of_mbox_file in_hdfs, value=null){

String local_temp_file = CopyToLocalFile(path in hdfs);
MessageStoreApi store = new MessageStoreApi(“local_temp_file”);
//process file

}

Or Is there some other solution? I am expecting some solution like what If I increase the block-size so that single file fits in one block and somehow if I can get the location of those blocks in my map function, as mostly map functions will execute on the same node where those blocks are stored then I may not have to always download to local file system? But I am not sure if that will always work 🙂

Suggestions , comments are welcome!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-19T01:15:57+00:00Added an answer on June 19, 2026 at 1:15 am

    For local filesystem path-like access, HDFS offers two options: HDFS NFS (via NFSv3 mounts) and FUSE-mounted HDFS.

    The former is documented under the Apache Hadoop docs (CDH users may follow this instead)

    The latter is documented at the Apache Hadoop wiki (CDH users may find relevant docs here instead)

    The NFS feature is more maintained upstream than the FUSE option, currently.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

If I have large set of data in an array from like this, 1
We have large files with zlib-compressed binary data that we would like to memory
I have large number of files with .gif extension. I would like to move
Installing third-party components always take a long time specially if you have large ones,
i have large numbers of text files and i am in problem that i
I have large images displayed in a grouped tableview. I would like the images
I have large video files (~100GB) that are local on my machine. I have
i have large data in label field and using FOCUSABLE on it makes the
I have large batches of XHTML files that are manually updated. During the review
I have large file comprising ~100,000 lines. Each line corresponds to a cluster and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.