rss

Access Any Website Or Forum Without Registering

Tuesday, June 22, 2010

Let's imagine the following scenario: We are happily searching on Google and suddenly we find something really interesting however, when we try to open the URL, a different page is shown asking us to kindly register. Now we can ask to ourselves: Why Google can see this page ? Is it registered at every forum or web page in the web ?

The answer is quite simple: all websites and forums will block unregistered users, but usually they won’t block Google Bot. Consequently, these pages can be found using the Google search engine or Google cache system. Same applies to any other search engine.

Probably the reader is now wondering: "If I spoof my user agent to that of Google Bot, then I can freely browse any website or forum without registering". First, this is not true for every website. However, there are plenty of popular sites out their that cloak content which is normally only available to paying members or when the registration is free (usually it doesn't work with paid porn websites!). Furthermore, doing it may also be against the terms of service of the site you are visiting.

To prevent this, some websites may also use other mechanisms that can determine if you are or not a bot (e.g. IP address, User Agent cloaking, Javascript and cookie detection, and referer detection).

So, how do you beat all 5 major types of cloaking?

  • Beat IP Delivery: Use Google Translate as a Proxy, translating from any language to the original website language. The goal here, is to use Google’s IP address, so that if someone is cloaking using IP delivery, they will still assume you are Google. Other proxies may not fall into the same c-block range as Google’s, making you less likely to succeed. However it may work as well.

    Notice: If you use Google Translate, the Google's IP address will shows up in the REMOTE_ADDR field of $_SERVER variable in PHP. However your IP address is still accessible in an extra header called HTTP_X_FORWARDED_FOR, but the vast majority of sites do not check for this.

  • Beat User-Agent Cloaking: First grab the add-on for Firefox called ‘user agent’ here and install it. Now go to Tools > User Agent Switcher > Options and then again to Options.

    Select User Agent from the left sidebar and click Add. Now in the description field type:

    crawl-66-249-66-1.googlebot.com

    The only thing the webmaster can do is check of the IP matches:

    host crawl-66-249-66-1.googlebot.com
    crawl-66-249-66-1.googlebot.com has address 66.249.66.1

    Because Google doesn't post a public list of IP addresses for webmasters to whitelist.

    Set also the user agent field type:

    Googlebot/2.1 (+http://www.googlebot.com/bot.html).

    Note: You can use any other search engine was well.

  • Beat Javascript Detection: Check your Firefox preferences and simply disable JavaScript.

  • Beat Cookie Detection: Turn off cookies in Firefox settings (Tools > Options > Content and Privacy).

  • Beat Referer Detection: Just go to about:config and type “referer” in the box. Change the value to 0.

Good browsing!