I have pmacct running summarizing network traffic on an hourly basis into a postgres database.
I need to write a script/query to move that data in a different format into a mysql database. I want to do as much of the data processing as possible using SQL, as this dataset is going to rapidly grow.
I have a perl script running to add an additional field (agent_id) to track the zone the data is in (local/national/international), which will show as either 0, 1, or 2.
The relevant fields from the schema of the table I’m pulling this data out from is:
ip_src, ip_dst, agent_id, bytes, stamp_updated, processed
The schema I want to insert the data into is:
ip, local_down_mb, nat_down_mb, int_down_mb, local_up_mb, nat_up_mb, int_up_mb, timestamp
As I’m only looking for the traffic where the source or destination is one of my ranges, I have a query at present which gets the upload data out of the postgres database in a way that I want it:
SELECT DISTINCT ip_src, agent_id, SUM(bytes), stamp_updated FROM acct
WHERE ip_src <<= '192.168.0.0/22'
OR ip_src <<= '10.1.2.0/24'
OR ip_src <<= '1.2.3.4/32'
GROUP BY ip_src, agent_id, stamp_updated
ORDER BY ip_src, agent_id, stamp_updated
A sample output of that query is:
ip_src | agent_id | sum | stamp_updated
--------------+----------+-----------+---------------------
10.1.2.134 | 2 | 3192 | 2012-09-13 21:20:01
10.1.2.134 | 2 | 3192 | 2012-09-13 22:20:01
10.1.2.134 | 2 | 3192 | 2012-09-13 23:20:01
10.2.3.252 | 2 | 448 | 2012-09-11 06:00:01
10.2.3.252 | 2 | 448 | 2012-09-11 07:20:01
10.2.3.252 | 2 | 448 | 2012-09-11 08:20:01
10.2.3.252 | 2 | 8112 | 2012-09-11 09:20:01
At this stage, I know I could run the same query for ip_dst, and then have a bit of a manual process when reinserting the data into mysql in the new format to ensure that the ip source and destination were matched for a timestamp, and then use the combination of agent_id and whether it was the ip source or ip destination that I was inserting to know whether it was inbound or outbound, and if the traffic was local, national, or international.
What I’d like, however, is a query which will do all of that for me. The limit of my SQL knowledge is having gone through the W3C website tutorials months ago, which has gotten me to the point where I can write a query as above, but not much further.
From what I can tell, what I need some help with is writing a join between the two sets of results, one for ip_src and one for ip_dst, and then doing some magic to use the information of which direction traffic was going in conjunction with the agent_id to have an output which will match the schema of the mysql database.
Is there someone who can either (very kindly) write what query they think might work for accomplishing this, or at least point me towards relevant documentation and give me a head start on what functions I might need to use to make this work?
I made assumptions about transforming the byte counts to rounded up megabytes in the final output based on the column names.