I need to store reviews from different sources in table.
Fields:
- ‘produtcId’ char(14)
- ‘user’ varchar(128)
- ‘Source’ varchar(128)
- ‘content’ text`
Use cases:
- Find all reviews for product
- Insert or update review
I have troubles with case 2, because I need to find if review already exist (review with same produtcId,user and Source).
Question: Is it good to create primary key or Unique index by produtcId + user + Source?
this is a case where natural keys become bad, varchar(128) is just to big for a PK in my book. it forces you to have a big fat (very wide) PK or index in the review table. I’d do it this way:
if you really only want 1 review per product+user+source, then you could make the unique index on ProductID+UserID+ProductSourcID.
You could consider making the PK: ProductID+UserID+ProductSourcID. However, if you need to FK to Reviews in another table, then you need to drag around ProductID+UserID+ProductSourcID. I prefer to FK to ReviewID.
In any case the int+int+int auto increment ProductID+UserID+ProductSourcID is way better than the char(14)+varchar(128)+varchar(128) version, both in terms of disk storage and cache memory usage. It is much easier for the database to use and store the fixed width int+int+int index values than the char(14)+varchar(128)+varchar(128) version as well.
Also, by using the auto increment PKs, the user can change their UserName (marriage/divorce) and not break all the FKs. It will force all of your ProductSource values to be standerdized and not free text, impossible to join to.
EDIT based on OP’s comment:
I’m not sure how the IDs complicate insertions. however, if you are unable/unwilling to change the PKs of the other tables, then a hash is the best way to go, but I would not make it the PK. Never make a hash a PK, there can be collisions, preventing insertion of legitimate data. Use an auto generate INT as the PK and add a hash column. You should do it this way. Create a new column in Reviews, called “ReviewHash” and add an index to it, you could include the productid, user, and source columns as “covered columns” if you expect many collisions (multiple different rows that have the same hash value). Also, do the
WHEREdo it like:this will allow for an index to be used on the Review.Hash column and by also checking the productid, user and source, it will eliminate any invalid data if there was a hash collision.
if you do your query like:
then an index can’t be used, and the query must apply the
YourHashFunctionto every row in the table. Also, if you leave off the checks for productid, user, and source, you will get results where the hashs work out the same but the actual values differ.