I have a large text file that contains data from the uniform crime report.

Question

0

Asked: June 14, 20262026-06-14T08:18:42+00:00 2026-06-14T08:18:42+00:00

I have a large text file that contains data from the uniform crime report.

0

I have a large text file that contains data from the uniform crime report. Ideally, what I would like to do is only import the data and leave out the other extraneous stuff in the file. The actual data is delimited by spaces and as the data goes onto another “page” the header information repeats itself. I first tried to import the data (and only the data) using the following code and to add my own headers manually:

  data <- read.fwf("2010SHRall.txt", 
        c(-4,3,8,2,4,5,6,5,4,3,3,4,4,3,3,4,6,5,3,6,26,3),   
        skip=5,       
        col.names=c("AGE","AGENCY","G","MO","HOM","INC","SIT","VA","VS","VR","VE","OA","OS","OR","OE","WEAP","REL","CIR","SUB","AGENCYNAME","STATE"), 
        strip.white=FALSE)

This works and then at line 51 it quits. I’m definitely a novice R programmer and I tried to Google the answer as well as to search Stack Overflow but I am at a loss for where to go from here. Here is a link to the text file that I am trying to import. Again, I am trying to import the data and remove any rows that have header info or other pieces that are not needed for the complete dataset.

Any help anyone could offer would be greatly appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T08:18:43+00:00

This should probably work:

text <- readLines('/tmp/2010SHRall.txt')
group.start <- '^      AGENCY'
group.end <- '(^B)|(^0END OF GROUP)'
data <- character()
inside.group <- FALSE
for (line in text) {
  if (inside.group) {
    if (grepl(group.end, line))
      inside.group <- FALSE
    else
      data <- append(data, line)
  } else if (grepl(group.start, line)) {
    inside.group <- TRUE
  }
}
read.fwf(textConnection(data),
         widths=c(-4,3,8,2,4,5,6,5,4,3,3,4,4,3,3,4,6,5,3,6,26,3),
         header=FALSE,
         col.names=c("AGE","AGENCY","G","MO","HOM","INC","SIT","VA","VS","VR","VE","OA","OS","OR","OE","WEAP","REL","CIR","SUB","AGENCYNAME","STATE"), 
         strip.white=TRUE)

It keeps all lines in between lines that match the group.start and group.end regular expressions and discards the rest.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large text file that contains data from the uniform crime report.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply