25 June 2007

Google politics

One of the problems well known in the current implementation of the RSS support in SIP Communicator was the inability to retrieve feeds from news.google.com (and you must admit, news.google.com is quite a source of news ;) ).

A brief look at the exception returned by the plugin points out the problem. The server is sending a HTTP/403 (Forbidden) response code instead of the HTTP/200 (OK) response code. This indicates that, for some reason still unknown to me, the news.google.com server totally dislikes our client. Having found that out, i started searching for the "guilty".

After examining a few Wirechark dumps, I had a clear idea of how the HTTP request we were sending looked like. I've tried it once more, by manually connecting to the news.google.com server on port 80 using telnet, and typing the request (not very complicated, as HTTP is, as you know, a plain text protocol). The result was the same HTTP/403. I've tried then the most simple "equivalent" HTTP request:

GET /?output=rss HTTP/1.0
Host: news.google.com
Connection: close


To my surprise, I received a nice XML file along with the necessary headers. So I started changing and adding headers to get as close to the request SIP was issuing that would still work. After about a dozen tries, I found out that the User-Agent SIP was sending along with the HTTP request (that is Java/1.6.0 on my machine) is somewhat on the blacklist of the news.google.com servers. I verified this assumption by changing the User-Agent for lynx and wget. As soon as the original User-Agent was replaced by a more exotic value (be it a totally dummy User-Agent like 'Kikiriki' or a well-known User-Agent like 'Mozilla/4.0'), thigs worked like a charm.
For now I've fixed that by adding a

    System.setProperty("http.agent",
System.getProperty("sip-communicator.version"));


in the start() method of the RssActivator.

I still have some problems, but I don't know if they are caused by my pretty lame internet connection or there are other bugs in the code.
Right now I'm trying to figure if there's any other bug to catch and how to implement RSS contacts icons. But more on this later on today :)

No comments: