Hi I am trying to write a django webapp that sits on top of a legacy database and so I cannot control too much my model fields and achieve functionality using new models.
I have many Documents with user given comma-separated-tags . I want to group “related” documents based on shared tags.
# Model to express legacy table
class Document(models.Model):
id = models.BigIntegerField(primary_key= True)
metadata_id = models.CharField(max_length=384)
tags_as_csv = models.TextField()
# Created new Model tag_text extracted from tags_as_csv
class Tagdb(models.Model):
tagid = models.BigIntegerField(primary_key=True)
referencing_document = models.ForeignKey(Document)
tag_text = models.TextField(blank=True)
So a Document would contain:
Document :
id = 1 ,
metadata_id = "a1ee3df3600c6f77a6e851781f7e70c6" ,
tags_as_csv = "raw-data , high temperature , important"
The Tagdb would have entries such as
id , referencing_document , tag_text
1 , 1 , "raw-data"
2 , 1 , "high temperature"
3 , 1 , "important"
4 , 2 , "important"
5 , 2 , "processed-data"
6 , 3 , "important"
7 , 4 , "processed-data"
Now I want to extract all the Document objects that match tags corresponding to a parent document. Which I am doing using the following get_queryset method.
def get_queryset(self, **kwargs):
parent_document = Document.objects.get(id=self.kwargs['slug'])
tags_in_parent_document = [x.tag_text for x in Tagdb.objects.filter(referencing_document=parent_document.id)]
# This will contain all the Document ids that match all the tags
queryset_with_duplicates = []
for tag in tags_in_parent_document:
queryset_with_duplicates.extend([x.referencing_document.id for x in Tagdb.objects.filter(tagtext__icontains=tag)])
# Make sure we have only unique ids
queryset_unique = set(queryset_with_duplicates)
# Get all the Document objects
queryset = Document.objects.filter(id__in=queryset_unique)
return queryset
My question is : is there a better way . Can I somehow get all the Documents that contain all the tags in the parent document and filter out the duplicates ( since multiple documents contain the same tag).
You’d better create two additional models: one for Tag and one for link between Tag and Document. If it’s somewhy unacceptable, you can use something like:
Plus, add model method for getting/setting tags, it will ease any possible refactoring.