Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8367257
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T13:06:47+00:00 2026-06-09T13:06:47+00:00

The csv file consists of boolean data for user preferences (userid,itemid). The file get

  • 0

The csv file consists of boolean data for user preferences (userid,itemid).
The file get checked by a preprocessor for any inconsistencies. I have checked it manually as well and the data seems to be consistent and correct format.
Two things to note:
– The job never fails if there is only one input file to the hadoop job that is to say if all the preferences are exported into a single csv with no duplicate entries for (userid,itemid)
– The job randomly fails where there are multiple csv files in the hadoop directory that is initial dump of user preferences plus daily delta files for user preferences.

The job shouldn’t fail with ArrayIndexOutOfBounds exception if the csv data is consistent and correct throughout. Is it possible that the job might fail if there are duplicate entries for (userid,itemid) across delta files. A lot of these entries are duplicate across several delta files due to boolean preferences.

The logs don’t seem to output the bit of data which caused the error. Here is the log:

2012-08-09 15:03:22,652 INFO org.apache.hadoop.mapred.JobInProgress: job_201208021510_0221: nMaps=2 nReduces=1 max=-1
2012-08-09 15:03:22,652 INFO org.apache.hadoop.mapred.JobTracker: Job job_201208021510_0221 added successfully for user 'deploy' to queue 'default'
2012-08-09 15:03:22,652 INFO org.apache.hadoop.mapred.AuditLogger: USER=deploy  IP=127.0.0.1    OPERATION=SUBMIT_JOB    TARGET=job_201208021510_0221    RESULT=SUCCESS
2012-08-09 15:03:22,652 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201208021510_0221
2012-08-09 15:03:22,653 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201208021510_0221
2012-08-09 15:03:23,023 INFO org.apache.hadoop.mapred.JobInProgress: jobToken generated and stored with users keys in /zenius/hadoop/tmp/mapred/system/job_201208021510_0221/jobToken
2012-08-09 15:03:23,027 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201208021510_0221 = 56518256. Number of splits = 2
2012-08-09 15:03:23,027 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208021510_0221_m_000000 has split on node:/default-rack/localhost
2012-08-09 15:03:23,028 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201208021510_0221_m_000001 has split on node:/default-rack/localhost
2012-08-09 15:03:23,028 INFO org.apache.hadoop.mapred.JobInProgress: job_201208021510_0221 LOCALITY_WAIT_FACTOR=1.0
2012-08-09 15:03:23,028 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201208021510_0221 initialized successfully with 2 map tasks and 1 reduce tasks.
2012-08-09 15:03:25,787 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_SETUP) 'attempt_201208021510_0221_m_000003_0' to tip task_201208021510_0221_m_000003, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:31,794 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201208021510_0221_m_000003_0' has completed task_201208021510_0221_m_000003 successfully.
2012-08-09 15:03:31,795 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000000_0' to tip task_201208021510_0221_m_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:31,796 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000000
2012-08-09 15:03:31,796 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000001_0' to tip task_201208021510_0221_m_000001, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:31,796 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000001
2012-08-09 15:03:37,800 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201208021510_0221_m_000001_0' has completed task_201208021510_0221_m_000001 successfully.
2012-08-09 15:03:37,801 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201208021510_0221_r_000000_0' to tip task_201208021510_0221_r_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:49,807 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201208021510_0221_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

2012-08-09 15:03:52,810 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a failed task task_201208021510_0221_m_000000
2012-08-09 15:03:52,810 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000000_1' to tip task_201208021510_0221_m_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:03:52,810 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000000
2012-08-09 15:03:52,810 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201208021510_0221_m_000000_0'
2012-08-09 15:04:14,603 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201208021510_0221_m_000000_1: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

2012-08-09 15:04:17,606 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a failed task task_201208021510_0221_m_000000
2012-08-09 15:04:17,607 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000000_2' to tip task_201208021510_0221_m_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:04:17,607 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000000
2012-08-09 15:04:17,607 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201208021510_0221_m_000000_1'
2012-08-09 15:04:35,618 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201208021510_0221_m_000000_2: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

2012-08-09 15:04:38,621 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a failed task task_201208021510_0221_m_000000
2012-08-09 15:04:38,621 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201208021510_0221_m_000000_3' to tip task_201208021510_0221_m_000000, for tracker 'tracker_localhost:localhost/127.0.0.1:50158'
2012-08-09 15:04:38,621 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201208021510_0221_m_000000
2012-08-09 15:04:38,621 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201208021510_0221_m_000000_2'
2012-08-09 15:04:56,632 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201208021510_0221_m_000000_3: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.TaskInProgress: TaskInProgress task_201208021510_0221_m_000000 has failed 4 times.
2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.JobInProgress: TaskTracker at 'localhost' turned 'flaky'
2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.JobInProgress: Aborting job job_201208021510_0221
2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.JobInProgress: Killing job 'job_201208021510_0221'
2012-08-09 15:04:59,635 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_CLEANUP) 'attempt_201208021510_0221_m_000002_0' to tip...
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T13:06:48+00:00Added an answer on June 9, 2026 at 1:06 pm

    Nope, it definitely means that there is a bad line in the data somewhere. The most likely culprits are stray blank lines, a header line, “comment” lines, or some stray file like _SUCCESS in the same directory.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a CSV file consists of thousand of data (columns are address1 address2
I have two csv file ( ) I can get it to print out
I have a csv file with data looking like (see below). I need help
I have a single csv file with data about schools: their locations, their names
I have a CSV of file of data that I can load in R
I have a large CSV file (5.4GB) of data. It's a table with 6
I have a CSV file which contains data seperated with tabs. I need to
I have a CSV file supplied from a client which has to be parsed
I have a CSV file that goes something like this: ['Name1', '', '', '',
I have a csv file, and I want to extract the each column a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.