Wednesday, December 10, 2008

urlrewriting in a shared environment (like crystaltech)

Today I had a shocker. I found out that our website - www.ActiveUnlimited.com that I'd spent months making search engine friendly with friendly URL's, like http://www.activeunlimited.com/swimming-lessons for example, wasn't being indexed by google as the googlebot wasn't processing our 404 error page properly. 

We were seriously restricted in our implementation as Crystaltech don't provide IIS7 hosting, so we couldn't use http modules and we couldn't configure IIS to send all requests through ASP.NET as we are in a shared environment and CT don't allow it. The only option available was using a custom error page - 404.aspx. We wanted our urls to not have extensions so web.config redirects weren't going to work unless we set up loads of dummy folders with dummy default.aspx files in them which really isn't an option.

To cut a very long, painful story short, here is the code that does the url rewriting on our 404.aspx page:

protected void Page_Load(object sender, EventArgs e)
    {
HttpContext myContext = HttpContext.Current;
        string _s = HttpContext.Current.Server.UrlDecode(Request.QueryString.ToString().ToLower());
        _s = _s.Replace("404;http://","");
        if (_s.IndexOf("/")!=-1)    {
            _s=_s.Substring(_s.IndexOf("/")+1);
            object _o = DAL.URLRewrites[_s];
            if (_o != null)
            {
                Response.StatusCode = 301;
                myContext.Server.Transfer("/" + _o.ToString());
                Response.End();
            }
        }
}

Where DAL.URLRewrites is a hashtable of friendly page names and actual server addresses.

The problem was happening because for some reason, and I have no idea what it is, requests from fiddler, IE, Opera, Chrome etc etc were generating a querystring of "/swimming-lessons" but requests from the googlebot were generating "swimming-lessons?" and so were not being found in the hashtable. I only found this out by using the Check Server Headers thingo over on SEOChat.com and dumping the querystring in the Response.StatusDescription field. I amended the 404.aspx page code as below and it seems to have fixed all our problems:

protected void Page_Load(object sender, EventArgs e)
    {
string _s;
        HttpContext myContext = HttpContext.Current;
        _s = HttpContext.Current.Server.UrlDecode(Request.QueryString.ToString().ToLower());
        _s = _s.Replace("404;http://","");
        if (_s.IndexOf("/")!=-1)
            _s=_s.Substring(_s.IndexOf("/")+1);
        _s = _s.Trim('?');
        if (_s!=""){
            object _o = DAL.URLRewrites[_s];
            if (_o != null)
            {
                Response.StatusCode = 200;
                Response.Status = "200 OK";
                Response.StatusDescription = "OK";
                myContext.Server.Transfer("/" + _o.ToString());
                return;
            }
        }
Response.StatusCode = 404;
   }