I am very new to PHP, and need some suggestions on a good starting point for a project I am working on.
I have a website and a newsletter, both of which are composed of news article summaries (the website is similar to older versions of Digg, but with no user submissions). A word document is drafted in the format of:
Category
Article title
News article summary
Link to website article is found on
Once the word document is populated with all of the article summaries, I would like to be able to copy all of the text from the document, paste it into a single textarea field in an html form, and have PHP somehow pick out the separate story summaries and store them in a database, so they can later be pulled onto the website.
The only way I can think to do this is to add descriptive tags in the word document such as:
<begin_category>Category<end_category>
<begin_title>Article Title<end_title>
and so on, and then have php recognize these tags (preg_match?) and pull the information from them. My questions then are, what is the best way to go about programming this? Are there any concepts I should be researching? How do I tell PHP to look for these tags and pull everything in between them? Is this a terrible way to go about this? Am I better off just having a form that has different fields for all of the items (category, title, summary, link) and submit each summary one by one?
The only reason I want to be able to post the entire document and have it populate is to save time. The word document must be written up regardless for the newsletter.
Any direction would be much appreciated; things I should be googling, articles I should be reading, etc.
The first thing I would consider is that pasted word can be very messy if you get the encoding wrong, so if you have any issues, make sure your html form and php file (might be one and the same) and the database storage have a matching encoding.
For example save the php source in utf8, make sure to use an appropriate unicode scheme for the data storage and include the following header in your HTML :
If you are always in control of the input this is not such an issue. You can often clear word mess by pasting into notepad, and then from notepad to your form. Better would be to use notepad so are using plain asci txt. but if you get the encoding matched across the board you should be good to go.
You could use tags as you have suggested and parse the content out with a method such as the following:
You could use such a function in the following way, but it presumes that the input is even and each entry will have all required sections, as you expecting as output 4 arrays each with the same length:
You would then be able to access the contents to push to a database :
As an alterntive to all of that, you might consider posting the actual file (remove the copy and paste) and parsing that server side. Or is it possible to scrape the data from your website?