In Python I am doing a number of different string processing functions in a program. The user enters a term in a form and the term is processed through different functions. These include, stemming, stop word removal, punctuation removal, spell checking and getting synonyms.
Stemming is done using the stemming package,
stop word & punctuation removal using string.replace() and REGEX,
spell checking using pyEnchant
getting synonyms using the Big Huge Thesaurus API.
The term is sent to an API. The results are returned and put through a hard-coded sorting process. After all that the results are output to the user. The whole process takes over 10 seconds which is too long. I’m wondering if the fact that I am using many extensions, thereby importing them, causing the long delays.
Hope this isn’t against the stackoverflow rules but I’m new to python and this is the kind of thing that I need to know.
Very unlikely. If you just import once, then call in a loop, the loop should take most of the time. (Or are firing up a Python process per word/sentence?)
As a rule of thumb, computer programs tend to spend 90% of their time executing 10% of the code. That part is worth optimizing. Things like import statements are usually not. To find out where your program is spending its time, use a profiler.