I am trying to analyze some probability data with R. The data I have gives the frequency of certain outcomes (A and B) for a given probability p and what I want is a model that will allow me to estimate p from only the frequency data.
Right now I am just running a linear regression (something like lm(p ~ A + B)) which works more or less but I know that this is not the “right way” to do it. In particular, my current model will, for some values of A or B, return values that do not lie within the interval [0, 1], i.e. that are not valid for a probability.
I am pretty sure there is a way to do this, but I can’t for the life of me figure out what the model was called or how to run it in R. Can anybody give me a hint?
You cannot just run
lm(p ~ A + B)as there is no model relating your count variables A and B with the probabilities:lm()fits a linear regression to model an unbounded real variable as a function of a linear combination of real variables (where you can substitute count variables).The easiest model for probabilities is a logistic regresion which uses a logistic function to make from unbounded real values to the bounded interval [0,1]. You can fit logistic regression in R using
glm()as well as a number of add-on packages for special cases, see e.g. this rseek.org search for logistic regression.Also, CrossValidated is a good site for modeling questions such as this.