Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 721887
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T05:55:30+00:00 2026-05-14T05:55:30+00:00

I have the following data set that I am trying to plot with ggplot2,

  • 0

I have the following data set that I am trying to plot with ggplot2, it is a time series of three experiments A1, B1 and C1 and each experiment had three replicates.

I am trying to add a stat which detects and removes outliers before returning a smoother (mean and variance?). I have written my own outlier function (not shown) but I expect there is already a function to do this, I just have not found it.

I’ve looked at stat_sum_df(“median_hilow”, geom = “smooth”) from some examples in the ggplot2 book, but I didn’t understand the help doc from Hmisc to see if it removes outliers or not.

Is there a function to remove outliers like this in ggplot, or where would I amend my code below to add my own function?

EDIT: I just saw this (How to use Outlier Tests in R Code) and notice that Hadley recommends using a robust method such as rlm. I am plotting bacterial growth curves, so I don’t think a linear model is best, but any advice on other models or using or using robust models in this situation would be appreciated.

library (ggplot2)  

data = data.frame (day = c(1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7), od = 
c(
0.1,1.0,0.5,0.7
,0.13,0.33,0.54,0.76
,0.1,0.35,0.54,0.73
,1.3,1.5,1.75,1.7
,1.3,1.3,1.0,1.6
,1.7,1.6,1.75,1.7
,2.1,2.3,2.5,2.7
,2.5,2.6,2.6,2.8
,2.3,2.5,2.8,3.8), 
series_id = c(
"A1", "A1", "A1","A1",
"A1", "A1", "A1","A1",
"A1", "A1", "A1","A1",
"B1", "B1","B1", "B1",
"B1", "B1","B1", "B1",
"B1", "B1","B1", "B1",
"C1","C1", "C1", "C1",
"C1","C1", "C1", "C1",
"C1","C1", "C1", "C1"),
replicate = c(
"A1.1","A1.1","A1.1","A1.1",
"A1.2","A1.2","A1.2","A1.2",
"A1.3","A1.3","A1.3","A1.3",
"B1.1","B1.1","B1.1","B1.1",
"B1.2","B1.2","B1.2","B1.2",
"B1.3","B1.3","B1.3","B1.3",
"C1.1","C1.1","C1.1","C1.1",
"C1.2","C1.2","C1.2","C1.2",
"C1.3","C1.3","C1.3","C1.3"))

> data
   day   od series_id replicate
1    1 0.10        A1      A1.1
2    3 1.00        A1      A1.1
3    5 0.50        A1      A1.1
4    7 0.70        A1      A1.1
5    1 0.13        A1      A1.2
6    3 0.33        A1      A1.2
7    5 0.54        A1      A1.2
8    7 0.76        A1      A1.2
9    1 0.10        A1      A1.3
10   3 0.35        A1      A1.3
11   5 0.54        A1      A1.3
12   7 0.73        A1      A1.3
13   1 1.30        B1      B1.1
... etc...

This is what I have so far and is working nicely, but outliers are not removed:

r <- ggplot(data = data, aes(x = day, y = od))
r + geom_point(aes(group = replicate, color = series_id)) + # add points
   geom_line(aes(group = replicate, color = series_id)) + # add lines
   geom_smooth(aes(group = series_id))  # add smoother, average of each replicate

EDIT: I just added two charts below showing examples of the outlier problems that I’m having from the real data rather than the example data above.

The first plots shows series p26s4 and around day 32 something really weird went on in two of the replicates, showing 2 outliers.

The second plots shows series p22s5 and on day 18, something weird went on with the reading that day, likely machine error I think.

At the moment I am eye-balling the data, to check that the growth curves look OK. After taking Hadley’s advice and setting family = “symmetric”, I am confident that the loess smoother does a decent job of ignoring the outliers.

p26s4 shows around day 32 something really weird went on in two of the replicates, showing 2 outliers

p22s5 shows that on day 18, something weird went on with the reading that day, likely machine error I think

@Peter/@hadley, the next thing I’d like to do is to try and fit a logistic, gompertz or richard’s growth curve to this data instead of a loess and calculate the growth rate in the exponential stage. Eventually I plan to use the grofit package in R (http://cran.r-project.org/web/packages/grofit/index.html), but for now I’d like to plot these manually using ggplot2 if possible. If you have any pointers then it would be much appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T05:55:31+00:00Added an answer on May 14, 2026 at 5:55 am

    Have you tried the family = "symmetric" argument to geom_smooth (which will in turn get passed on to loess)? This will make the loess smooth resistant to outliers.

    The syntax would be:

    geom_smooth(method = loess, method.args = list(family = "symmetric"))
    

    However, looking at your data, why do you think a linear fit is not adequate? You only have 4 x values, and there certainly doesn’t seem to be strong evidence for a departure from linearity.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 367k
  • Answers 367k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer Add a global toggled variable. var toggled = false; $(document).ready(function(){… May 14, 2026 at 4:49 pm
  • Editorial Team
    Editorial Team added an answer I'm not quite sure what you want, but there's a… May 14, 2026 at 4:49 pm
  • Editorial Team
    Editorial Team added an answer Yes. ^[a-zA-Z'-]+$ Here, ^ means start of the string, and… May 14, 2026 at 4:49 pm

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.