I generate reports through a web system that was built for the company I do most of my work for. Basically the web system tracks every hour I spend doing billable work. It generates CSV reports.
I am importing these reports into my own local mysql database, so that I can generate invoices from it. When I invoice, I’m putting two weeks worth of line-items (from the report) into the invoice. Then I’m marking the appropriate line-items as having been invoiced for.
My question is this: every time I download the report, it’s a complete report of all my work history. It’s cumbersome to edit the csv every time and remove everything I’m not interested in invoicing for, before importing that csv into my mysql database. Especially since in many cases, it can be months before I can invoice for a particular line item.
I’ve looked at the mysql pages for UPDATE, REPLACE … ON DUPLICATE UPDATE, etc. My head is about to explode and I don’t understand what I’m reading.
The goal is this: I want to be able to import the report today, invoice for a bunch of it and mark it as such. Then, I want to be able to download a new report say, tomorrow, with new content on it, and have only the new content imported. Old data shouldn’t be imported since it:
a) already exists in the database and,
b) may have been modified explicitly in the database, for example by marking "invoiced" in one of the columns, etc.
Umm.. help?
EDIT:
Ok, so if my CSV contains the following:
7,8,9,
4,5,6,
1,2,3,
And I import that into my database, then my table contains:
7,8,9,
4,5,6,
1,2,3,
and I can make edits/changes etc to the database.
I generate a new report later and my csv looks like this:
16,17,18,
13,14,15,
10,11,12,
7,8,9,
4,5,6,
1,2,3,
Now, I want to bring only the changes into my DB, as in, I want to only:
16,17,18,
13,14,15,
10,11,12,
so that my db now looks like my csv:
16,17,18,
13,14,15,
10,11,12,
7,8,9,
4,5,6,
1,2,3,
But I don't want to edit/change/touch those last 3 lines, because I may have a good reason for editing them or adding new data into them in my db (after importing them from the csv of course).
EDIT2:
I got it to work by flipping the order of my csv rows. New rows were being added to the top, meaning when I imported into my table, the id for the newest row was 1, which wasn’t helping the solutions below work well.
The code I’m using, that now works:
SELECT *
FROM lineitems_temp
LEFT OUTER JOIN lineitems
ON lineitems_temp.id = lineitems.id
WHERE lineitems.id IS NULL
ORDER BY -lineitems_temp.id
(this shows line items in lineitems_temp, which is the latest report, that aren’t present on lineitems, which is the old report. I think it’s possible also to use this mechanism to find lines that have been edited etc, in order to help manually keep things in sync.)
Lets say you want to load the result in the table “a”
1. Load the new csv into a temp table (b)
2. Do a left outer join on b and a.
(When we do a join, we will get all the elements in a and b.
When we do left outer join, we will get elements in b, but not in a. This is what we want).
3. Result from step 2, can be inserted in a. Use the same query or a temp table.