I was recently asked to create a program to find best matches in text fragment. I have successfully written this program but I do have a question about its time complexity.
Problem is defined as following.
given a query, find occurrences of the query words in document and highlight the best tokens.
The time that my program takes
O(m + n + p)
here
m = length of the document in characters
n = length of the query in characters
p = number of total matches in the document
In this case the biggest term is always going to be "m" because in most cases documents are going to be larger then the query itself.
Can I safely deduce that time complexity of my program is O(m)?
No, you can’t. According to the Big-O notation your function
mis an upper bound on the actual time your algorithm takes to run, if there’s a constantMsuch as the real time will always be less or equals toM*m. Take a case where the document has size zero (an empty document) but someone queries it with a positive number of characters. The upper bound in this case will be0(plus a constant), but the actual time the program will take to run might be greater than that. So your program can not be said to beO(m).In other words, “most cases” isn’t enough: you must prove that your algorithm will perform within that upper bound in all cases.
Update: The same can be said for
p: common sense sayspis always smaller thanm, but that’s only true if the search terms don’t overlap. Take for instance the documentaaaaaa(m=6) and the search termsa,aaandaaa(n=3). In this case, there are 6 occurences ofa, 5 ofaaand 4 ofaaa, sop = 15. Even though it’s a very unlikely scenario (same for the empty document) it’s still required that you takepinto account in your complexity analysis. So your program must really be described asO(m + n + p)as you originally stated.