How to make Ajax links crawlable with GWT and Google App Engine

If you care about SEO you know that using GWT has a downside: much of your app’s content is generated dynamically via Javascript, and is therefore invisible to search engines. You might have dozens of pages of awesome content, but all the Googlebot sees is the static HTML page that hosts your app. This can be a real problem.

Fortunately, Google has proposed a solution for crawling ajax content:

  • change your hrefs to support “bang notation”:  www.example.com/ajax.html#!key=value
  • when Googlebot makes requests of the form: www.example.com/ajax.html?_escaped_fragment_=key=value , you return a static HTML version of the Ajax content

So then the problem becomes one of generating static content from your Ajax links. You could do that by hand if your site is small and changes infrequently. More likely you’ll want a way to automate this.

Google recommends using a “headless browser” approach, i.e. using something like HtmlUnit. That’s a fine solution, but if you’re running on App Engine it’s almost guaranteed not to work because of the request timeout.  So if you want to run on App Engine, you’re probably going to have to spider your own pages and pre-generate your HTML content.

My solution to this problem is to break the spidering up into small chunks, and farm them out as tasks on App Engine’s Task Queues. Whenever I update my app’s content, I submit a job that spiders the landing page looking for Ajax links. For each link that’s found, I submit a task that recursively spiders the link (taking care not to get into loops). Each task saves the HTML content into the data store, which is then returned as cached static content to Googlebot.

Suddenly my “simple” solution is sounding quite complicated, but it gets the job done reliably. Here’s some code to make it clearer.

I use a CachedAjaxLink data object to persist the static content:

public class CachedAjaxLink implements Serializable
{
    @Id
    private String href;
    private String cachedContent;
    private Date dateCached;
}

Then I use an AjaxCacher which crawls a given link, stores the results as CachedAjaxLinks, and queues Task Queue tasks for each link that it finds:

public class AjaxCacher
{
    protected static final Logger log = Logger.getLogger(AjaxCacher.class.getName());
    protected static final DAO dao = new DAO();

    public static final long PUMP_TIME = 5000;
    protected WebClient webClient;
    protected String crawlServletUrl;

    public AjaxCacher(String crawlServletUrl)
    {
        this.crawlServletUrl = crawlServletUrl;
        webClient = Holder.get();
    }

    public void crawl(URL urlToCrawl, Date crawlRequestTimestamp)
    {
        // URLs we've already queued
        Set queuedURLs = new HashSet();
        queuedURLs.add(urlToCrawl);

        try
        {
            HtmlPage page = webClient.getPage(urlToCrawl);

            // appengine hack because it's single threaded
            webClient.getJavaScriptEngine().pumpEventLoop(PUMP_TIME);

            String pageContent = page.asXml();

            CachedAjaxLink cachedAjaxLink = new CachedAjaxLink();
            cachedAjaxLink.setHref(urlToCrawl.getRef());
            cachedAjaxLink.setCachedContent(pageContent);
            cachedAjaxLink.setDateCached(new Date());  // time actually cached
            dao.updateCachedAjaxLink(cachedAjaxLink);

            List anchors = page.getAnchors();
            for (HtmlAnchor anchor : anchors)
            {
                // only care about ajax links
                if (! anchor.getHrefAttribute().startsWith("#")) continue;

                URL newUrl = new URL(urlToCrawl, anchor.getHrefAttribute());

                // don't queue multiple requests for the same URL
                if (queuedURLs.contains(newUrl)) continue;

                queuedURLs.add(newUrl);

                // prevent loops
                CachedAjaxLink link = dao.getCachedAjaxLink(newUrl.getRef());
                if (link == null || link.getDateCached().getTime() < crawlRequestTimestamp.getTime())
                {
                    queueCrawlRequest(newUrl.toString(), crawlRequestTimestamp);
                }
            }

        } catch (IOException e)
        {
            log.log(Level.SEVERE, e.getMessage(), e);
        }
        finally
        {
            webClient.closeAllWindows();
        }
    }

    /**
     * submits a crawl request to the queue; TaskQueueServlet will then handle the request asynchronously
     */
    public void queueCrawlRequest(String urlToCrawl, Date timeStamp)
    {
        Queue queue = QueueFactory.getDefaultQueue();
        TaskOptions options = TaskOptions.Builder.url(crawlServletUrl);
        options.param("encodedUrl", ServerUtils.encodeURL(urlToCrawl));
        options.param("timeStamp", ServerUtils.fromDate(timeStamp));
        options.method(TaskOptions.Method.GET);
        queue.add(options);
    }

    /**
     * try to cache a copy of the WebClient in ThreadLocal for faster startups on Google App Engine
     */
    public static class Holder
    {
        private static ThreadLocal holder = new ThreadLocal()
        {
            protected synchronized WebClient initialValue()
            {
                WebClient result = new WebClient(BrowserVersion.FIREFOX_3);
                result.setWebConnection(new UrlFetchWebConnection(result));
                return result;
            }
        };

        public static WebClient get()
        {
            return holder.get();
        }
    }
}

Finally, I use a TaskQueueServlet to handle the queued tasks:

public class TaskQueueServlet extends HttpServlet
{
    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException
    {
        doPost(req, res);
    }

    @Override
    protected void doPost(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException
    {
        String encodedUrl = req.getParameter("encodedUrl");
        if (encodedUrl == null)
        {
            throw new IllegalArgumentException("missing param: encodedUrl");
        }

        String timeStamp = req.getParameter("timeStamp");
        if (timeStamp == null)
        {
            throw new IllegalArgumentException("missing param: timeStamp");
        }

        String decodedUrl = ServerUtils.decodeURL(encodedUrl);
        URL urlToCrawl = new URL(decodedUrl);

        getServletContext().getInitParameter("taskQueuePath");
        AjaxCacher cacher = new AjaxCacher(getServletContext().getInitParameter("taskQueuePath"));
        cacher.crawl(urlToCrawl, ServerUtils.toDate(timeStamp));
    }
}

Thanks Google, for making it so easy. ;-)

You can grab all of this code from my gwtquickstarter library. It’s the library that powers the best typing tutor on the web.

Posted in Google App Engine, GWT, Technical | 4 Comments

Top 10 Ways to Improve Your Typing

Everyone loves top-10 lists, so here’s the Quick Brown Frog list of Top-10 Ways to Improve Your Typing.

1. Slow Down

This may seem counter-intuitive, but most new typists need to actually slow down in order to see an improvement in their typing skills. Your Words-Per-Minute score also takes into account the number of typos and corrections you make. A slower speed tends to lead to fewer typos, and an overall increase in typing speed.

2. Pretend you don’t have a backspace key

In the bad old days of manual typewriters there was no such thing as backspacing to correct a typo; you had to apply some white-out or else start over from scratch. Some typists have become overly dependent upon their backspace key. If you find yourself making lots of mistakes, pretend that you don’t have a backspace key, and that every character you type must be perfect. This may slow you down in the short-term, but it will pay dividends as you better learn the keyboard.

3. Unlearn Bad Habits

Most people learn to type in a very ad-hoc manner. Often we learn the location of the keys through hunt-and-peck, rather than deliberately practicing proper typing technique. This also means typing keys with the wrong fingers. When these habits are learned early, they’re very hard to break. Make a concentrated effort to use the proper fingering and position for each key-press.

4. Use Both Shift Keys

Every keyboard comes with two shift keys for a good reason- certain key combinations are meant to be typed with two hands. This is something that even experienced typists often get wrong. It’s not OK to use only the left or right shift key for all your typing!

Our typing lessons clearly explain which shift key to use with each letter or symbol. As each key is introduced, its corresponding shift key is also given. Generally speaking, if the key is on the left side of the keyboard you should use the right shift key with it, and vice-versa.

5. Practice on Unfamiliar Text

If you’re trying to break some old habits, typing commonly-used words and phrases may actually be counter-productive, since you’ll just be reinforcing your improper technique. Muscle memory tends to take over when typing familiar things such as your email address, or web site URLs that you commonly use.

Instead, try typing unfamiliar text so that muscle memory doesn’t get in the way of unlearning your old technique. Quick Brown Frog offers a huge selection of practice text with new content added every week. You can also practice with unfamiliar content by using our random words feature.

6. Use the Home Row

There’s a reason that the Home Row is the very first lesson taught by every typing course- it forms the basis of proper finger placement, and allows you to easily reach all the keys of the keyboard.

Always start with your fingers above the Home Row- asdf for the left hand, and jkl; for the right. Always return your fingers to this position after typing keys on the other rows.

7. Strive for Economy of Movement

“Economy of movement” is a fancy way of saying “try to move your hands as little as possible”. If you’re a “two finger typist”, chances are you’re having to move your hands all over the place to reach the keys.

The idea with economy of movement is that you want to make use of all of your fingers, and both hands, rather than favoring your “strong” fingers (typically your index and middle fingers).

It’s no coincidence that this is the exact same concept that musicians use when learning their instruments.

8. Relax and Stretch

Like any physical activity, typing requires you to use your muscles. Your muscles work their best when you’re relaxed. Take breaks frequently, and stretch your muscles.

9. Watch the screen, not the keyboard

Many new typists spend way too much time looking at the keyboard rather than the screen. In addition to slowing down your learning the location of the keys, this habit encourages bad posture, as your head is constantly tilting up and down to go from the screen to the keyboard.

Instead, watch the screen to see what you’re typing. That might mean that you have to slow down your typing initially until you learn all the keys, but it will pay dividends in the long run.

10. Practice daily

Like any skill, developing good typing ability requires a certain amount of dedication and practice. As with practicing a musical instrument, short-but-daily sessions are better than long marathon sessions done infrequently. Try to set aside a certain amount of time each day to spend practicing with Quick Brown Frog, and you’ll be a typing expert in no time!

Posted in Typing | Leave a comment

Add a badge to your blog with your typing score!

Did you get an awesome word-per-minute score on your Quick Brown Frog typing speed test? Well now you can share it with the world by posting a Quick Brown Frog badge on your blog or website:

We just added a new feature to all our typing practice sessions- at the end of the lesson you’ll be shown the badge and the HTML code that generates it. Simply copy and paste the blue HTML code into your blog or website to add the badge for all to see.

Posted in Typing | Leave a comment

New Feature: create a typing practice from random English words

We’ve been busy adding new features over the Christmas holidays. The first of these is already available for you to use: you can now create a practice typing session from a set of random English words:

type random words

We’ve had numerous requests from users for a feature that would generate random typing lessons based on actual words. You can now generate instant typing practices, simply by choosing the number of words you wish to type.

Random word typing practice

We’re planning to extend this feature in the future to create automatically create lessons targeting letters that you need to practice (the ones you make the most typos with).

Enjoy!

Posted in Typing | Leave a comment

Quick Brown Frog is now available in the Chrome Web Store

The best typing tutor on the web is now available for sale in the Chrome Web Store.

It was surprisingly easy to get into the store if you’ve got an existing webapp running:

  • pay Google $5 (one-time developer fee; not per-app)
  • create a 16-line JSON manifest file
  • take some screen shots and create an icon (the hardest part, really)
  • bundle it into a zip
  • checkmark a few boxes, add some descriptive text, click “publish”

All you are really doing is bundling up some meta-data so that Chrome users can see your app as being “installed”. Even though my app is very much server-dependent, and has its own concept of user accounts and payments, Google is happy to have it in the store. And I’m happy for the potential extra customers.

It’s obvious that the Chrome Web Store is a great boon for developers. However it’s unclear whether users will actually find this useful, much less flock to it.

My guess is that once we start seeing more apps that really use HTML5 features like local storage, it might take off. I hope it does.

Posted in Business/Marketing, Chrome Web Store, Technical | Leave a comment

Patrick McKenzie launches Appointment Reminder

Patrick McKenzie, a solo entrepreneur whom I admire greatly has launched his second project: Appointment Reminder.

Appointment Reminder is a service for personal business services (think: hair salons, medical offices, law firms, or anyone that regularly schedules appointments with clients). Appointment Reminder sends out reminders to clients automatically, via phone, SMS or email. Fewer forgotten appointments = increased revenue.

He’s leveraging Twilio, and API that I’m just dying to find an excuse to use.

Congrats Patrick, and good luck!

 

Posted in Technical | Leave a comment

UI Refresh- new logo and other eye candy

I spent some time (and some $$) prettying up the Quick Brown Frog user interface. I bought a logo from Logosamurai.  Not bad for $67.

Quick Brown Frog logo

Then I added some gauges to display WPM and accuracy in real-time. They’re part of the Google Visualization API, and they’re awesome.  That is, when they work. They don’t seem to work in IE8, so I’ve disabled them for that browser.

And they have some pretty odd resizing behavior: if you don’t explicitly specify a width attribute, the gauges will shrink every time you update the gauge value. It took me quite a bit of experimentation to figure that one out, but thankfully it seems to be working now.

speed test gauges

You can check out the changes in this typing speed test.

Posted in Business/Marketing | Leave a comment