I’m looking for a way to automatically produce an abstract, basically the first few sentances/paragraphs of a blog entry, to display in a list of articles (which are written in markdown). Currently, I’m doing something like this:
def abstract(article, paras=3):
return '\n'.join(article.split('\n')[0:paras])
to just grab the first few lines worth of text, but i’m not totally happy with the results.
What I’m really looking for is to end up with about 1/3 of a screenful of formatted text to display in the list of entries, but using the algorithm above, the amount pulled ends up with wildly varying amounts, as little as a line or two, is frequently mixed with more ideal sized abstracts.
Is there a library that’s good at this kind of thing? if not, do you have any suggestions to improve the output?
EDIT:
You can do something like this:
This makes use of the textwrap algorithm to get the ideal text length. It will break the text into screen-sized lines and use them to calculate the length of the desirable number of lines.
For example applying this algorithm on the python wikipedia page entry:
will give you this output:
Without further details it’s hard to help you. But if your problem was that taking the first few lines was too much for some entries you may need to have a look at textwrap
For example if you only want 100 character abstracts you can do the following:
That will also replace newlines with spaces which might be desirable depending on your requirements.