I am writing a Dexterity content type which contains plain text and HTML fields. I want to have a custom SearchableText() method which exposes these fields to portal_catalog and Plone full text search.
I assume for plain text I can just do string join with spaces. But how I should preprocess HTML content when exposing it in SearchableText()?
for converting data in plone there is a tool called portal_transforms, which is quite intelligent in converting stuff (depending on your os / installation it may also be able to convert .doc, .pdf etc.):
for indexing fields in dexterity I propose to use collective.dexteritytextindexer (but there is no TTW support at the moment).
-> http://pypi.python.org/pypi/collective.dexteritytextindexer
-> https://github.com/collective/collective.dexteritytextindexer
cheers