I have a two big files with Start and End positions and two sample columns (numeric).
File 1:
Start End Sample1 Sample2
1 60 1 4
100 200 2 1
201 250 1 4
300 450 1 1
File 2:
Start End Sample1 Sample2
40 60 1 1
70 180 1 1
240 330 2 1
340 450 1 4
500 900 1 4
980 1200 2 1
First, I would like to take the first Start and End positions from first file and make a segment plot. The plot must also take into account Start-20 and End+20 for each position in first file.
Then I would like to take the overlapping Start and End positions from the second file and plot it on the above plot. In this way there will be many plots based on the Start and End positions from the first file and the one with out overlaps will also be plotted individually.
The color for each segment will be based on the two sample numbers (for example in both the files if its 1 and 4 the color of segment will be red, if its 1 and 1 the color of segment will be green and so on).
I would really appreciate if someone make me understand how to make function for this in R.
Thanks in advance.
PS I have attached the drawing for an
output. I have shown only two results.
Below is the code which I wrote but it gives an error
Error in match.names(clabs, names(xi)) :
names do not match previous names
Also I need to specify red color to dataset1 line segment and green color to
dataset2 line segments. How will I implement it in the code below?
overlap_func <- function(dataset1,dataset2) {
for(i in 1:nrow(dataset1))
{
loop_start <- dataset1[i,"Start"]
loop_end <- dataset1[i,"End"]
p <- dataset2[,c(1,2)]
dataset1_pos <- data.frame(loop_start,loop_end)
dataset2_filter <- p[p$Start >= (loop_start-(loop_start/2)) & p$End <= (loop_end+ (loop_end/2)), ]
data_in_loop <- rbind(dataset1_pos,dataset2_filter)
plot_function(data_in_loop,loop_start,loop_end)
}
}
plot_function <- function(loop_data,start,end){
pos <- 1:nrow(loop_data)
dat1 <- cbind(pos,loop_data)
colnames(dat1) <- c("pos","start","end")
pdf(file=paste0("path where plots are generated","_",start,"-",end,"_","overlap.pdf"))
plot(dat1$pos, type = 'n', xlim = range(c(start-(start/2), end+(end/2))))
segments(dat1$start, dat1$pos, dat1$end, dat1$pos)
dev.off()
}
df1 <- read.table(header=T, text="Start End Sample1 Sample2
1 60 1 4
100 200 2 1
201 250 1 4
300 450 1 1")
df2 <- read.table(header=T, text="Start End Sample1 Sample2
40 60 1 1
70 180 1 1
240 330 2 1
340 450 1 4
500 900 1 4
980 1200 2 1")
overlap_func(df1,df2)
Something like this??
Edit: After looking at your updated drawing, maybe this is what you want?
Edit: Save each plot in facet in separate file. You can do this by generating the plot each time by splitting on
id1Set the path in
fnaccordingly.