I am having trouble developing some queries on the fly for our clients and sometimes find myself asking “Would it be better to start with a subset of the data I know I’m looking for, then just import into a program like Excel and process the data accordingly using similar functions, such as Pivot Tables”?.
One instance in particular I am struggling with is the following example:
I have an online member enrollment system. For simplicity sake, let’s assume the data captured is: Member ID, Sign Up Date, their referral code, their state.
A sample member table may look like the following:
MemberID | Date | Ref | USState
=====================================
1 | 2011-01-01 | abc | AL
2 | 2011-01-02 | bcd | AR
3 | 2011-01-03 | cde | CA
4 | 2011-02-01 | abc | TX
and so on….
ultimately, the types of queries I want to build and run with this data set can extend to:
“Show me a list of all referral codes and the number of sign ups they had by each month in a single result set”.
For example:
Ref | 2011-01 | 2011-02 | 2011-03 | 2011-04
==============================================
abc | 1 | 1 | 0 | 0
bcd | 1 | 0 | 0 | 0
cde | 1 | 0 | 0 | 0
I have no idea how to build this type of query in MySQL to be honest (I imagine if it can be done it would require a LOT of code, joins, subqueries, and unions.
Similarly, another sample query may be how many members signed up in each state by month
USState | 2011-01 | 2011-02 | 2011-03 | 2011-04
==============================================
AL | 1 | 0 | 0 | 0
AR | 1 | 0 | 0 | 0
CA | 1 | 0 | 0 | 0
TX | 0 | 1 | 0 | 0
I suppose my question is two fold:
1) Is it in fact best to just try and build these out with the necessary data from within a MySQL GUI such as Navicat or just import the entire subset of data into Excel and work forward?
2) If I was to use the MySQL route, what is the proper way to build the subsets of data in the examples mentioned below (note that the queries could become far more complex such as “Show how many sign ups came in for each particular month by each state and grouped by each agent as well (each agent has 50 possible rows)”
Thank you so much for your assistance ahead of time.
I am a proponent of doing this kind of querying on the server side, at least to get just the data you need.
You should create a time-periods table. It can get as complex as you desire, going down to days even.
This gives you almost limitless ability to group and query data in all sorts of interesting ways.
Getting the data for the original referral counts by month query you mentioned would be quite simple…
The result set would be in rows like
ref = abc, year = 2011, month = 1, referralcount = 1as opposed to a column for every month. I am assuming that since getting a larger set of data and manipulating it in Excel was an option, that changing the layout of this data wouldn’t be difficult.Check out this previous answer that goes into a little more detail about the concept with different examples: SQL query for Figuring counts by month