Hi I have a data frame that looks like this:
Seq H E T C
1 Seq_1 2 1 5 4
2 Seq_2 2 1 5 4
3 Seq_3 2 1 5 4
4 Seq_4 0 0 6 6
5 Seq_5 0 4 2 6
Where H,E,T and C represent the counts of these features within each sequence.
I’m trying to build a line graph where each line should represent one sequence. The X-axis will be the features (H,E,T,C) and the Y-axis its corresponding count, so the lines will show the count’s variation within each sequence.
How should I do that? I’ve already messed around with a lot of things but couldn’t make it!
The trick to ggplot is that it expects data to be in “long” format. It’s often easiest to get it in this format with
melt. Once melted, it becomes an exercise of specifying the plot as you want to view it. Here’s an example:This results in some overplotting issues for Seq_1 – Seq_3 so you may want to consider dropping colour and faceting instead:
Faceting obviously becomes less useful when you have 100s of sequences to review.