IT

SMF – Google only indexes the imode, wap and wap2 pages

I was having a problem with my SMF forum because Google failed to index ANY HTML web site even if the sitemap was correctly generated and submitted.

The Google index only contained the ?imode, ?wap and ?wap2 links, but NO link to the normal / full version. Leaving aside stuff like “Google needs mobile content”, or “Google prefers a simpler webpage vs a complex one” which actually solves nothing, I identified the issues because all normal web pages had the:

<meta name="robots" content="noindex"/>

Ha!

So actually Google works!

The trouble came from trying to port the Apache modrewrite rule to nginx of the PrettyURL module. I started from

.... ./index.php?pretty;board=$1.$2 [L,QSA] ....

and ended up with:

..... index.php?pretty\;board=$2.$ .....

Notice the escape character ‘\’ that nginx needs before the semicolon, otherwise it is interpreted as an end of instruction and fails.

The trouble is that somehow a value “pretty\board” got into the $GET variables because somehow nginx carries the \ to the FCGI process somehow, causing code like:

// Right, let's only index normal stuff!
if (count($_GET) > 1) {
  $session_name = session_name();
  foreach ($_GET as $k => $v) {
    if (!in_array($k, array('board', 'start', $session_name)))
    $context['robot_no_index'] = true;
  }
}

to fail and always set the robot_no_index to true for normal pages.

The solution was to rewrite the nginx rules using ‘&’ as separator instead of the ‘;’ :

..... index.php?pretty&board=$2.$ .....

Luckily the site is crawled regularly and I hope in few weeks everything to be sorted out ..

Leave a Reply