Ok, I have a SQL query that I’m trying to generate that will combine entries based on some logic. I eventually need this to cascade, but I’m running into some issues with things being combined more than I want them to.
Let’s see if I can illustrate. I have a table:
CREATE TABLE IF NOT EXISTS doc_lines (
id bigint(20) NOT NULL AUTO_INCREMENT,
file_id bigint(20) NOT NULL,
line int(4) NOT NULL,
end_line int(4) NOT NULL,
typeVARCHAR(100),
textVARCHAR( 5120 ) NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY file_id (file_id,line);INSERT INTO doc_lines VALUES(1,1,1,1,NULL, 'abcdefg'); INSERT INTO doc_lines VALUES(1,1,2,2,NULL, 'hijkl'); INSERT INTO doc_lines VALUES(1,1,3,3,NULL, 'mn'); INSERT INTO doc_lines VALUES(1,1,4,4,NULL, 'op'); INSERT INTO doc_lines VALUES(1,1,5,5,NULL, 'qrs'); INSERT INTO doc_lines VALUES(1,1,6,6,NULL, 'tuv.'); INSERT INTO doc_lines VALUES(1,1,7,7,NULL, 'wxy'); INSERT INTO doc_lines VALUES(1,1,8,8,NULL, 'zab');
I’m trying to combine the values of “text” when two lines in a row match a certain condition.
e.g. My existing query is something like the following:
UPDATE doc_lines AS a
JOIN doc_lines AS b ON a.file_id = b.file_id AND a.end_line + 1 = b.line
SET a.end_line= b.end_line, b.type=”DELETE”, a.text=CONCAT(a.text, ” “, TRIM(b.text))
WHERE b.textREGEXP ‘^[a-z]$’;
I then follow it up with a:
DELETE from doc_lines WHERE ‘type’=”DELETE”;
The problem I’m having is that line 1 matches line 2, which flags line 2 for delete….
Line 2 matches line 3, which flags line 3 for delete…
Line 3 matches line 4, which flags line 4, for delete…
etc
As a result I end up deleting more lines than I want.
At first I thought I could do this to make it skip every other line:
UPDATE doc_lines AS a
JOIN doc_lines AS b ON a.file_id = b.file_id AND a.end_line + 1 = b.line
SET a.end_line= b.end_line, b.type=”DELETE”, a.text=CONCAT(a.text, ” “, TRIM(b.text))
WHERE b.textREGEXP ‘^[a-z]$’ AND a.type<> “DELETE”;
But the update of one entry in the query doesn’t seem to take effect until after the query is done, as a result above doesn’t behave any differently…
As a result I thought, “Well, why not handle all the odd lines, then all the even?”, so I updated my query appropriately:
UPDATE doc_lines AS a
JOIN doc_lines AS b ON a.file_id = b.file_id AND a.end_line + 1 = b.line
SET a.end_line= b.end_line, b.type=”DELETE”, a.text=CONCAT(a.text, ” “, TRIM(b.text))
WHERE b.textREGEXP ‘^[a-z]$’ AND a.line% 2=0;
The problem with this that I need to run the query more than once, because eventually I want lines 1-6 combined and 7-8 (using my example). Each subsequent call combines the lines with the line after it, when it matches.
The problem with this is that eventually I end up hitting the same situation as with my original query and I’m flagging some line for deleting that was also used to flag other lines for deletion.
Even if I end up rotating odd and even on the lines, or the id, or the end_line, at some point there appears to end up being an overlap.
Any ideas? Is there a way to process every-other entry in a database, not based on its actual value?
ok, I figured something out that works.
If anyone has anything better, or more efficient… please let me know.
My solution was to create a temporary table which I join to table a:
I can then join to that table, alternate even then delete, then odd then delete, then recreate the temporary table and repeat.
Its a lot of steps, but it works.
I’d definitely be interested in a more elegant or efficient method!