I have a unique dataset, a portion of which can be reproduced using: data

Question

0

Asked: June 10, 20262026-06-10T22:51:36+00:00 2026-06-10T22:51:36+00:00

I have a unique dataset, a portion of which can be reproduced using: data

0

I have a unique dataset, a portion of which can be reproduced using:

data <- textConnection("SNP_Pres,Chr_N,BP_A1F,A1_Beta,A2_SE,ForSortSNP,SortOrder
rs122,13,100461219,C,T,rs122,6
1,16362,0.8701,-0.0048,0.0056,rs122,7
1,19509,0.546015137607046,-0.0033,0.0035,rs122,8
1,17218,0.1539,-0.004,0.013,rs122,9
rs142,13,61952115,G,T,rs142,6
1,16387,0.1295,0.0044,0.0057,rs142,7
1,17218,0.8454,0.006,0.013,rs142,9
rs160,13,100950452,C,T,rs160,6
1,16387,0.549,-0.0021,0.0035,rs160,7
1,19509,0.519102731537216,0.003,0.0027,rs160,8
rs298,13,66664221,C,G,rs298,6
1,19509,0.308290808358246,-0.0032,0.0033,rs298,8
1,17218,0.7227,0.022,0.01,rs298,9")
mydata <- read.csv(data, header = T, sep = ",", stringsAsFactors=FALSE)

It is formatted for use in a program that requires holding spots for missing data entries. In this case, a missing entry is indicated by a numeric skip in the Sort Order column. An entry is complete if the column descends 6 – 7 – 8 – 9, with a new entry beginning again with 6.

I need a way to read through the data file, and insert a row of zeros for each missing entry, so that the file looks like this:

data <- textConnection("SNP_Pres,Chr_N,BP_A1F,A1_Beta,A2_SE,ForSortSNP,SortOrder
rs122,13,100461219,C,T,rs122,6
1,16362,0.8701,-0.0048,0.0056,rs122,7
1,19509,0.546015137607046,-0.0033,0.0035,rs122,8
1,17218,0.1539,-0.004,0.013,rs122,9
rs142,13,61952115,G,T,rs142,6
1,16387,0.1295,0.0044,0.0057,rs142,7
0,0,0,0,0,rs142,8
1,17218,0.8454,0.006,0.013,rs142,9
rs160,13,100950452,C,T,rs160,6
1,16387,0.549,-0.0021,0.0035,rs160,7
1,19509,0.519102731537216,0.003,0.0027,rs160,8
0,0,0,0,0,rs160,9
rs298,13,66664221,C,G,rs298,6
0,0,0,0,0,rs289, 7
1,19509,0.308290808358246,-0.0032,0.0033,rs298,8
1,17218,0.7227,0.022,0.01,rs298,9")
mydata <- read.csv(data, header = T, sep = ",", stringsAsFactors=FALSE)

Ultimately, the last two columns, ForSortSNP and SortOrder will be deleted from the data file, but they are included now for convenience’s sake.
Any suggestions are greatly appreicated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T22:51:38+00:00

Editorial Team

2026-06-10T22:51:38+00:00Added an answer on June 10, 2026 at 10:51 pm

Here is a solution using the expand.grid and merge functions.

grid <- with(mydata, expand.grid(ForSortSNP=unique(ForSortSNP), SortOrder=unique(SortOrder)))
complete <- merge(mydata, grid, all=TRUE, sort=FALSE)
complete[is.na(complete)] <- 0 # replace NAs with 0's
complete <- complete[order(complete$ForSortSNP, complete$SortOrder), ] # re-sort

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a unique dataset, a portion of which can be reproduced using: data

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply