Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8787743
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T21:57:59+00:00 2026-06-13T21:57:59+00:00

I am generating a csv file in my map function. So that each map

  • 0

I am generating a csv file in my map function. So that each map task generates one csv file. Now this is a side effect and not the output of the mapper. The way I am naming those files is something like filename_inputkey. However when I run the application on a single node cluster, there is only one file generated. I have 10 lines in my input and as per my understanding goes, there will be 10 mapper tasks and 10 files would be generated. Let me know if I am thinking in a wrong way here.

Here is my GWASInputFormat class

import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.RecordReader;
import org.apache.hadoop.mapred.Reporter;

public class GWASInputFormat extends FileInputFormat<LongWritable, GWASGenotypeBean>{

@Override
public RecordReader<LongWritable, GWASGenotypeBean> getRecordReader(org.apache.hadoop.mapred.InputSplit input, JobConf job, Reporter arg2) throws IOException {
    return (RecordReader<LongWritable, GWASGenotypeBean>) new GWASRecordReader(job, (FileSplit)input);
}

}

Here is GWASRecordReader

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.LineRecordReader;
import org.apache.hadoop.mapred.RecordReader;

public class GWASRecordReader implements RecordReader<LongWritable, GWASGenotypeBean>{

private LineRecordReader lineReader;
private LongWritable lineKey;
private Text lineValue;

@Override
public void close() throws IOException {
    if(lineReader != null) {
        lineReader.close();
    }
}

public GWASRecordReader(JobConf job, FileSplit split) throws IOException {
    lineReader = new LineRecordReader(job, split);
    lineKey = lineReader.createKey();
    lineValue = lineReader.createValue();
}

@Override
public LongWritable createKey() {
    return new LongWritable();
}

@Override
public GWASGenotypeBean createValue() {
    return new GWASGenotypeBean();
}

@Override
public long getPos() throws IOException {
    return lineReader.getPos();
}

@Override
public boolean next(LongWritable key, GWASGenotypeBean value) throws IOException {
    if(!lineReader.next(lineKey, lineValue)){
        return false;
    }

    String[] values = lineValue.toString().split(",");

    if(values.length !=32) {
        throw new IOException("Invalid Record ");
    }

    value.setPROJECT_NAME(values[0]);
    value.setRESEARCH_CODE(values[1]);
    value.setFACILITY_CODE(values[2]);
    value.setPROJECT_CODE(values[3]);
    value.setINVESTIGATOR(values[4]);
    value.setPATIENT_NUMBER(values[5]);
    value.setSAMPLE_COLLECTION_DATE(values[6]);
    value.setGENE_NAME(values[7]);
    value.setDbSNP_RefSNP_ID(values[8]);
    value.setSNP_ID(values[9]);
    value.setALT_SNP_ID(values[10]);
    value.setSTRAND(values[11]);
    value.setASSAY_PLATFORM(values[12]);
    value.setSOFTWARE_NAME(values[13]);
    value.setSOFTWARE_VERSION_NUMBER(values[14]);
    value.setTEST_DATE(values[15]);
    value.setPLATE_POSITION(values[16]);
    value.setPLATE_ID(values[17]);
    value.setOPERATOR(values[18]);
    value.setGENOTYPE(values[19]);
    value.setGENOTYPE_QS1_NAME(values[20]);
    value.setGENOTYPE_QS2_NAME(values[21]);
    value.setGENOTYPE_QS3_NAME(values[22]);
    value.setGENOTYPE_QS4_NAME(values[23]);
    value.setGENOTYPE_QS5_NAME(values[24]);
    value.setGENOTYPE_QS1_RESULT(values[25]);
    value.setGENOTYPE_QS2_RESULT(values[26]);
    value.setGENOTYPE_QS3_RESULT(values[27]);
    value.setGENOTYPE_QS4_RESULT(values[28]);
    value.setGENOTYPE_QS5_RESULT(values[29]);
    value.setSTAGE(values[30]);
    value.setLAB(values[31]);
    return true;
}

@Override
public float getProgress() throws IOException {
    return lineReader.getProgress();
}

}

Mapper class

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

import com.google.common.base.Strings;

public class GWASMapper extends MapReduceBase implements Mapper<LongWritable, GWASGenotypeBean, Text, Text> {

private static Configuration conf;

@SuppressWarnings("rawtypes")
public void setup(org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException {
    conf = context.getConfiguration();
    // Path[] otherFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
}


@Override
public void map(LongWritable inputKey, GWASGenotypeBean inputValue, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {



    checkForNulls(inputValue, inputKey.toString());




    output.collect(new Text(inputValue.getPROJECT_CODE()), new Text(inputValue.getFACILITY_CODE()));

}

private void checkForNulls(GWASGenotypeBean user, String inputKey) {

    String f1 = " does not have a value_fail";
    String p1 = "Must not contain NULLS for required fields";
    // have to initialize these two to some paths in hdfs

    String edtChkRptDtl = "/user/hduser/output6/detail" + inputKey + ".csv";
    String edtChkRptSmry = "/user/hduser/output6/summary" + inputKey + ".csv";
            ../

            List<String> errSmry = new ArrayList<String>();
    Map<String, String> loc = new TreeMap<String, String>();

    if(Strings.isNullOrEmpty(user.getPROJECT_NAME())) {
        loc.put("test", "PROJECT_NAME ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getRESEARCH_CODE())) {
        loc.put("test", "RESEARCH_CODE ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getFACILITY_CODE())) {
        loc.put("test", "FACILITY_CODE ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getPROJECT_CODE())) {
        loc.put("test", "PROJECT_CODE ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getINVESTIGATOR())) {
        loc.put("test", "INVESTIGATOR ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getPATIENT_NUMBER())) {
        loc.put("test", "PATIENT_NUMBER ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getSAMPLE_COLLECTION_DATE())) {
        loc.put("test", "SAMPLE_COLLECTION_DATE ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getGENE_NAME())) {
        loc.put("test", "GENE_NAME ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getSTRAND())) {
        loc.put("test", "STRAND ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getASSAY_PLATFORM())) {
        loc.put("test", "ASSAY_PLATFORM ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getSOFTWARE_NAME())) {
        loc.put("test", "SOFTWARE_NAME ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getTEST_DATE())) {
        loc.put("test", "TEST_DATE ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getPLATE_POSITION())) {
        loc.put("test", "PLATE_POSITION ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getPLATE_ID())) {
        loc.put("test", "PLATE_ID ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getOPERATOR())) {
        loc.put("test", "OPERATOR ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getGENOTYPE())) {
        loc.put("test", "GENOTYPE ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getSTAGE())) {
        loc.put("test", "STAGE ");
        errSmry.add("_fail");
    } else if(Strings.isNullOrEmpty(user.getLAB())) {
        loc.put("test", "LAB ");
        errSmry.add("_fail");
    }

    String customNullMsg = "Required Genotype column(s)";
    List<String> error = new ArrayList<String>();
    String message = null;

    if(!loc.isEmpty()) {
        for (Map.Entry<String, String> entry : loc.entrySet()) {
        message = "line:" + entry.getKey() + " column:" + entry.getValue() + " " + f1;
        error.add(message);
        }
    } else {
        message = "_pass";
        error.add(message);
    }

    int cnt = 0;
    if(!errSmry.isEmpty()) {

        // not able to understand this. Are we trying to get the occurances
        // if the last key that contains _fail
        for (String key : errSmry) {
        if(key.contains("_fail")) {
            cnt = Collections.frequency(errSmry, key);
            // ******************** Nikhil added this
            break;
        }
        }

        if(cnt > 0) {
        writeCsvFileSmry(edtChkRptSmry, customNullMsg, p1, "failed", Integer.toString(cnt));
        } else {
        writeCsvFileSmry(edtChkRptSmry, customNullMsg, p1, "passed", "0");
        }

    } else {
        writeCsvFileSmry(edtChkRptSmry, customNullMsg, p1, "passed", "0");
    }

    // loop the list and write out items to the error report file
    if(!error.isEmpty()) {
        for (String s : error) {
        //System.out.println(s);
        if(s.contains("_fail")) {
            String updatedFailmsg = s.replace("_fail", "");
            writeCsvFileDtl(edtChkRptDtl, "genotype", updatedFailmsg, "failed");
        }
        if(s.contains("_pass")) {
            writeCsvFileDtl(edtChkRptDtl, "genotype", p1, "passed");
        }
        }
    } else {
        writeCsvFileDtl(edtChkRptDtl, "genotype", p1, "passed");
    }
    // end loop
   }

 private void writeCsvFileDtl(String edtChkRptDtl, String col1, String col2, String col3) {
    try {
        if(conf == null) {
            conf = new Configuration();
        }
        FileSystem fs = FileSystem.get(conf);

        Path path = new Path(edtChkRptDtl);
        if (!fs.exists(path)) {
            FSDataOutputStream out = fs.create(path);
            out.writeChars(col1);
            out.writeChar(',');
            out.writeChars(col2);
            out.writeChar(',');
            out.writeChars(col3);
            out.writeChar('\n');
            out.flush();
            out.close();
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
}

private void writeCsvFileSmry(String edtChkRptSmry, String col1, String col2, String col3, String col4) {
    try {


        if(conf == null) {
            conf = new Configuration();
        }
        FileSystem fs = FileSystem.get(conf);

        Path path = new Path(edtChkRptSmry);
        if (!fs.exists(path)) {
            FSDataOutputStream out = fs.create(path);
            out.writeChars(col1);
            out.writeChar(',');
            out.writeChars(col2);
            out.writeChar(',');
            out.writeChars(col3);
            out.writeChar(',');
            out.writeChars(col4);
            out.writeChar('\n');
            out.flush();
            out.close();
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
}
}

Here is my driver class

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class GWASMapReduce extends Configured implements Tool{

/**
 * @param args
 */
public static void main(String[] args) throws Exception {
    Configuration configuration = new Configuration();
    ToolRunner.run(configuration, new GWASMapReduce(), args);
}

@Override
public int run(String[] arg0) throws Exception {

    JobConf conf = new JobConf(new Configuration());
    conf.setInputFormat(GWASInputFormat.class);
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(Text.class);
    conf.setJarByClass(GWASMapReduce.class);
    conf.setMapperClass(GWASMapper.class);
    conf.setNumReduceTasks(0);
    FileInputFormat.addInputPath(conf, new Path(arg0[0]));
    FileOutputFormat.setOutputPath(conf, new Path(arg0[1]));
    JobClient.runJob(conf);
    return 0;
}
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T21:58:00+00:00Added an answer on June 13, 2026 at 9:58 pm

    There will probably be only one Mapper task, and ten invocations of it’s map method. If you wish to write out one file per Mapper, you should do so in its configure method. If you wish to write out one file per input record, you should so in its map method.

    Edit: The above turned out to be unrelated to the problem. The issue is that in GWASInputFormat, you do not set the key in the next method, so your map input key is always the same. Simply add key.set(lineKey.get()); to the next method, and it should work.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a script that generates a csv file using the following code: header('Content-type:
I have an XSL file that I am generating from CSV from and Object
Have a program that's dynamically generating an Excel file and a csv. The excel
I have a large nested array that I'm generating from parsing a CSV file
I'm generating a csv file using php, now some columns contain a paragraph with
I am generating a CSV file from Ruby, and the system that is receiving
I am using C++ and I am generating a csv file to report some
Generating normal columnar data in excel file is quite easy but does any one
I am generating a CSV file from Ruby. The problem is a column string
I'm generating a CSV file from the following code public ActionResult Index() { var

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.