i have simple app the gets all the links from web page , im using libexml2 to parse the html
and extract the html links that are inside the
and Qt QNetworkAccessManager for the http requests .
now the problem is how to detecte automatcly the host name of the links if i have for example :
<a href="thelink.html" >
or
<a href="../../../thelink.html" >
or
<a href="../foo/boo/thelink.html" >
i need to convert it to full host path like :
( just example .. )
<a href="http://www.myhost.com/thelink.html" >
or
<a href="http://www.myhost.com/foo/boo/thelink.html" >
or
<a href="http://www.myhost.com/m/thelink.html" >
is there any way to do it programmatically ? without manually doing string manipulation
if you know perl its called : Return a relative URL if possible
from the : http://search.cpan.org/~rse/lcwa-1.0.0/lib/lwp/lib/URI/URL.pm
$url->rel([$base])
code example that dosnt work ( Qt )
http://qt.digia.com/support/
QString s("/About-us/");
QString base("http://qt.digia.com");
QString urlForReq;
if(!s.startsWith("http:"))
{
QString uu = QUrl(s).toString();
QString rurl = baseUrl.resolved(QUrl(s)).toString();
urlForReq = rurl;
}
the urlForReq value is “/About-us/”
I have not verified if the algorithm mentioned by @sftrabbit is completely followed by this approach, but you can use
QUrl::resolvedto convert your relative URLs to absolute URLs:prints
I can not reproduce the code example from the question which does not work for the OP. The only issue is that the
baseUrlobject is missing in the code. The following SSCCEprints