I’ve confirmed that R calls of XML functions such as htmlParse and readHTML send a blank user agent string to the server.
?XML::htmlParse tells me under isURL that “The libxml parser handles the connection to servers, not the R facilities”. Does that mean there is no way to set user agent?
(I did try options(HTTPUserAgent="test") but that is not being applied.)
XML::htmlParseuses the libxml facilities (i.e. NanoHTTP) to fetch HTTP content using the GET method. By default, NanoHTTP does not send a User-Agent header. There is no libxml API to pass a User-Agent string to NanoHTTP, although one can pass arbitrary header strings to lower-level NanoHTTP functions, likexmlNanoHTTPMethod. Hence, it would require significant source code modification in order to make this possible in the XML package.Alternatively,
options(HTTPUserAgent="test")sets the User-Agent header for functions that use the R facility for for HTTP requests. For example, one could usedownload.filelike so:The (Apache style) access log entry looks something like this: