Looking for ideas on how to go about working with a large set of .html files (50,000+) that are preexisting.
What I am trying to accomplish is to be able to keep the same hierarchy, links etc… but add content to them via php or whatever etc..
To be able to organize or route them into a php application or my own cms. I have been thinking ways to do this and come up with a few ideas but hoping to get some feedback from some pros.
I really only need to create something that will get the data once and add it to a cms that I develop. I have a script that I wrote to be able to index and search etc.. but how to go about using the contents and adding to them while keeping structure has me swinging a little. Any thoughts?
UPDATE: I found some tips that made this easier for me as well
Use a .htacess file and ad:
AddType application/x-httpd-php .html .htm
Then I can add includes into those files, I am note sure where to put this in the config file yet but will update when I know.
If you were to use a CMS like WordPress, assuming your content follows some sort of regular structure, you could recreate the HTML frame as templates and then write a script that parses each HTML file and creates a post based on criteria like the HTML file name.
You can even match the file name by setting the slug on post creation, though you’ll need a server rewrite rule to drop the .htm(l) if you intend to keep all the backlinks to the site.
I’m oversimplifying of course, as with 50,000+ files it isn’t a trivial job to move all the data.
Perhaps automate it using something like this: http://wordpress.org/extend/plugins/import-html-pages/