I need to put whois data in a table like
- registrant,
- created date,
- expire date etc.
I’ve the script which is extracting data from whois servers, but the output is different for each domain extensions.
For example, for .com domains registrant details comes as a total address and for .org domains it comes as registrant name,street1,street2,street3 etc.
so i’m not able extract registrant details as a unit to be put in db.
some where i heard if we get as xml data we can able to extract it, can somebody help to get around this? Thanks!.
Actually the problem is a big larger than that.
The WHOIS service is defined by RFC3912. It is a very basic request protocol that does not define the format of answered contents at all. So the answers often reflects the format of the database containing the data and you may get different syntax for each database. Since WHOIS can be use for whatever contents you want, you cannot make many assumptions about the format of answer you will get. Hopefully however, you can expect to receive parseable content, and similarly formatted answers for each request.
So you need to develop a parsing logic for each server which you will have to do in a very empirical manner.
However here a a few tips for your development that come from the RFC.
you need to send request using TCP port 43 with a single line ended by CR+LF ASCII characters
you must expect TCP end of connection as meaning the answer is finished, only.
About domain names specifically, you might be want to note that formerly restriction to ASCII encoding made some registrants to use Punycode to encode some (accentuated by example) strings in DNS systems, so you might want to be able to expect these in a Whois answer also if you meet in some replies. The existence of Internationalized Domain Names since 2003 will require you to support unicode encoding. Algorithms to converts names are complex, RFC 3490 should give you some useful details about this.
Good luck !