This is a very basic example. But I am doing some data analysis and am continually finding myself writing very similar SQL count queries like so to generate probability tables.
My tables are defined such that a value of 0 implies that an event did not take place while a value of 1 implies that the event did take place.
> sqldf("select count(distinct Date) from joinedData where C_O_Above_prevHigh = 0 and C_O_Below_prevLow = 0")
count(distinct Date)
1 1081
> sqldf("select count(distinct Date) from joinedData where C_O_Above_prevHigh = 0 and C_O_Below_prevLow = 0 and E_halfGap = 1")
count(distinct Date)
1 956
> sqldf("select count(distinct Date) from joinedData where C_O_Above_prevHigh = 1 OR C_O_Below_prevLow = 1 and E_halfGap = 1")
count(distinct Date)
1 504
In the above example, my predictor variables are C_O_Above_prevHigh and C_O_Below_prevLow my outcome variable is E_halfGap. There are several cases where there might be more predictor variables e.g. Time
Rather than doing the above and manually entering all my queries with different permuations, is there anything available in R or some other application that will:
1) output the potential probability paths based on my predictors?
2) allow me to choose how to split the paths
I appreciate your input.
If you want all totals and subtotals,
you can use
CUBE BYin SQL (but it is not in SQLite)or
addmarginsin R.If you want to build a decision tree,
you can use the
rpartpackageor check the
machine learning
or
graphical models
task views