SEO for Single Page Backbone Applications
When we were developing the front-end for Coursecycle, we decided to make it a single-page application backed by Backbone.js so that we could provide the most streamlined experience for our users. Unfortunately, one of the downsides to this approach is that our site cannot be crawled by search engine spiders, such as Googlebot, because they are unable to execute the Javascript on web pages.
In order to address this issue, we decided to set up a prerender service that would generate static pages for bots. The following is the series of events that occurs:
nginx handles all incoming HTTP requests.
If the User-Agent of the HTTP request matches a list of known bots, then it is proxied to the prerender service, which uses PhantomJS as a headless browser to execute any Javascript associated with the route. A static copy of the page, which contains all the injected content that a user might see in their web browser, is then returned to the bot.
Requests from regular users are proxied as usual to the Ruby application server. In order to deal with concurrent requests, we use puma, which shines in terms of concurrency.
In our root path server block, we pass all files through a try_files directive in order to evaluate where the request came from.
location / { root /var/www/coursecycle/public; try_files $uri @prerender; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; }
The following @prerender directive is mostly adapted from the official prerender.io nginx.conf file with a slight modification: we found that it was far easier to just flag the Googlebot through $http_user_agent as opposed to using the _escaped_fragment_ argument.
location @prerender { set $prerender 0; if ($http_user_agent ~* "baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|Googlebot") { set $prerender 1; } if ($args ~ "_escaped_fragment_") { set $prerender 1; } if ($http_user_agent ~ "Prerender") { set $prerender 0; } if ($uri ~ "\.(js|css|xml|less|png|jpg|jpeg|gif|pdf|doc|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent)") { set $prerender 0; } resolver 8.8.8.8; if ($prerender = 1) { set $prerender "coursecycle.com:3100"; rewrite .* /$scheme://$host$request_uri? break; proxy_pass http://$prerender; } if ($prerender = 0) { proxy_pass http://coursecycle; # rewrite .* /index.html break; } }
The last step was to generate a sitemap.xml file and submit it to Google Webmaster Tools so that they would know how to index the site. There are plenty of tutorials on how to do this online so we won't discuss it further, but we ended up writing a quick Ruby script to add all the URLs we wanted to be indexed into an XML file, then placing it at /sitemap.xml so Googlebot could find it.
We weren't sure whether it was working or not for a couple of days, as none of our sites had been indexed, but we soon detected a spike in CPU activity a couple of days later.
We initially thought it was a runaway process of some sort, but it turned out that our phantomjs server was just working hard to serve up all the pages that the Googlebot was now indexing!
As of this posting, the Googlebot is still indexing a large amount of our page, but we're pretty happy with how this is turning out!














