My question involves the use of SQL to assign multiple groups of duplicated values a group id, through the use of a script. I’ve been doing it by hand for a bit and realized that, with the expanse of the database (a couple thousand elements), it would take ages.
Here is my DB structure:
id | db quesition | db keywords | answer id | db answer |
------------------------------------------------------------------------------------------------
0 | Why is Mars red? | [why,mars,red] | 0 | Mars is red because blah |
1 | How is Mars red? | [how,mars,red] | 0 | Mars is red because blah |
2 | What makes Mars red? | [what,makes,mars,red] | 0 | Mars is red because blah |
3 | Is Mars very rocky? | [is,mars,rocky] | 0 | Yes Mars is rocky blahbla |
4 | Does Mars have rocks?| [mars,have,rocks] | 0 | Yes Mars is rocky blahbla |
5 | What is the Sun? | [what,is,sun] | 0 | The Sun is our solar blah |
6 | What is a star? | [what,is,star] | 0 | A star is a ball of hot blah |
Now, as you can see, there can be multiple questions for one answer, therefore the database will have duplicates in the db_answer column. I would like for each db_answer to have a singular answer_id that would be repeated if the answer is used more than once. To illustrate, I’d like for my db to look like:
id | db quesition | db keywords | answer id | db answer |
-----------------------------------------------------------------------------------------------
0 | Why is Mars red? | [why,mars,red] | 1 | Mars is red because blah |
1 | How is Mars red? | [how,mars,red] | 1 | Mars is red because blah |
2 | What makes Mars red? | [what,makes,mars,red] | 1 | Mars is red because blah |
3 | Is Mars very rocky? | [is,mars,rocky] | 2 | Yes Mars is rocky blahbla |
4 | Does Mars have rocks?| [mars,have,rocks] | 2 | Yes Mars is rocky blahbla |
5 | What is the Sun? | [what,is,sun] | 3 | The Sun is our solar blah |
6 | What is a star? | [what,is,star] | 4 | A star is a ball of hot blah |
I have looked extensively for scripts that do this, but haven’t had any luck. Just as a note to show what I’ve been trying to do, I have been using the SQL for each answer group I wanted to add an id to:
UPDATE elements SET answer_id = '1' WHERE db_answer = 'Mars is red because blah'
This would be pretty easy with a PHP script:
However, I think it might be wise to store the answers in a separate table and just keep the
answer_idin theelementstable. That way you avoid unnecessarily duplicating information.EDIT :
As @mdoyle suggested, I think it would be best to use four tables:
The relationship between the
answerstable and thequestionstable is one-to-many (one answer may apply to many questions), so you have two tables. This assumes that each question can have one and only one answer. If this isn’t the case, and there is a possibility that one question might have two acceptable answers, then the relationship becomes many-to-many (continue reading for how to set up tables for a many-to-many relationship).The relationship between the
questionstable and thekeywordstable is many-to-many (many questions may use many keywords), so you have three tables. One holds the questions (one row per question), one holds the keywords (one row per keyword) and the third ties the two together. Thequestion_keywordstable will have multiple rows with the same questionID and multiple rows with the same keywordID. So if questionID 5 has three keywords, then there will be three entries in thequestion_keywordstable with a questionID of 5.For any one-to-one relationships, you are generally safe just making an additional column in the same table, so you will have one table for that relationship.
NOTE: Feel free to change the lengths of the
VARCHARcolumns. I picked values that might be OK, based on your examples, but if the questions and/or answers can be longer, then you may need to increase this size.After creating these tables, you can populate them by doing something like this:
Once you have done this, and verified that the four tables are populated correctly, you no longer need to use the
elementstable at all. Just use those four tables (questions,answers,keywords, andquestion_keywords).