i want to parse a site with the PHP DOM-Document way: Note it is faster and easier to use. Some of you have convinced me!! One question – since i am a php-newbie 😉 can i apply the XPaths-code
Example: http://buergerstiftungen.de/cps/rde/xchg/SID-F8780E81-ABF20567/buergerstiftungen/hs.xsl/db.htm
Goal: to fetch the results ( approx 213 different records) too and parse them in order to get a database-dump for the saving on a local MySQL-Db!?
by the way: see two resultpages:
http://buergerstiftungen.de/cps/rde/xchg/SID-F8780E81-ABF20567/buergerstiftungen/hs.xsl/db_20302.htm
http://buergerstiftungen.de/cps/rde/xchg/SID-F8780E81-ABF20567/buergerstiftungen/hs.xsl/db_20289.htm
You see there are lots of information stored…
well i have tried to do write a scraper with Perl – but i had no luck. Perl is for newbies very very hard. Afterwards i tired to write a parser in PHP – it is a bit easier. But the site (see the detail-resultpages) are a bit complex. How to parse them – in order to get the dataset for a locally based MySQL database. Then i have more opportunities for a retrieval.
I want to get the datas to have them local (on my OpenSuse Linux System Version 11.3) in a MySQL-database.
well: i have three parts:
- fetching
- parsing
- storing (in MySQL: that is creating a MySQL-dump)
Since i have some very little experience with XPath i have a Xpather-Tool in my Mozilla-Browser. But i am not sure how i should apply them – see the data i gathered – below:
Perhaps some of you can help me here – and show me how to apply them in a parsercode:
I love to hear from you
See here some details:
for the results (from the approx 213 different records) – see two resultpages: – gathered some Xpath-datas:
Example: Bürgerstiftung Wiesloch
http://buergerstiftungen.de/cps/rde/xchg/SID-A7DCD0D1-702CE0FA/buergerstiftungen/hs.xsl/db_20289.htm
/html/body/div[@id=’main’]/div[@id=’wrapper’]/div[@id=’inner’]/div[@id=’marginalblock’]/div[1]/p
1. Gründungsgeschichte
/html/body/div[@id=’main’]/div[@id=’wrapper’]/div[@id=’inner’]/div[@id=’contentblock’]/div/p[1]/strong
2. Kurzvorstellung/Ziele
/html/body/div[@id=’main’]/div[@id=’wrapper’]/div[@id=’inner’]/div[@id=’contentblock’]/div/p[2]/span[2]/span/b
3. Projekte
/html/body/div[@id=’main’]/div[@id=’wrapper’]/div[@id=’inner’]/div[@id=’contentblock’]/div/p[3]/span[2]/span/strong
Kontakt:
/html/body/div[@id=’main’]/div[@id=’wrapper’]/div[@id=’inner’]/div[@id=’marginalblock’]/div[1]/h6
Question: well, how to apply the gained datas in the Libxml – in order to get the PARSER-Part up and running!? I am a XPath-starter!
Look forward to hear from you!
zero
PS – if i have to add more infos – or if i have to ask more propperly – plz let me know! Sorry for being the newbie!;-)
PPS – and update: i have the Mysql-part: it can look like this:
CREATE TABLE IF NOT EXISTS `address` (
`id` int(4) NOT NULL auto_increment,
`name` varchar(30) default NULL,
`contact-details` varchar(30) default NULL,
`street` varchar(30) default NULL,
`postal-code` varchar(30) default NULL,
`town` varchar(30) default NULL,
`phone` varchar(30) default NULL,
`email` varchar(30) default NULL,
`homepage` varchar(30) default NULL,
`summary` varchar(30) default NULL,
`projects` varchar(30) default NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=9 ;
something like this would fit the needs..
Update; many many thanks Lenzai for the quick answer:
you suggest to try something like this:
$url="http://...";
$xpath_query="/html/body/...";
/html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='marginalblock']/div[1]/p
/html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[1]/strong
/html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[2]/span[2]/span/b
/html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='contentblock']/div/p[3]/span[2]/span/strong
/html/body/div[@id='main']/div[@id='wrapper']/div[@id='inner']/div[@id='marginalblock']/div[1]/h6
$ch=curl_init($url);
$res=curl_exec($ch);
$dom = new DOMDocument()
$dom->loadHTML($res);
$xpath=new DomXPath($dom);
$node= $xpath->query($xpath_query)->item(0);
echo $node->nodeValue;
I have Curl enabled here. That is no problem. And the Xpaths i should enter
in this line: $xpath_query=”/html/body/…”;
Question: should i enter all Xpaths that are mentioned above..from 1. to 3… and so forth How does this look like finally Can you help me here – i am very very new to php?
Look forward to hear from you!! Many many thanks for all and any help!
zero
try something like this
you just need to enable curl in your php.ini