In a nutshell, the Google Voice Search Quality Rater guidelines tell the people who rate voice search results what to look for when evaluating responses.

If you familiarize yourself with those guidelines, then you'll have a little more knowledge about how you should optimize your site for voice search. That will increase the likelihood that your site will be chosen to answer a query.

Why Are There Separate Guidelines?

Google uses one set of guidelines to evaluate browser-based search results and another to evaluate voice search results. You might be wondering why that's the case.

The answer is because voice search results are "eyes-free." That is, they're often delivered audibly.

Think about it: when you ask Google Assistant, Alexa, or Siri a question, you're usually expecting a voice reply. That's not the case with traditional, browser-based searching that usually gives you hundreds or thousands of paginated results.

Voice search results must provide audible answers that are not only factually accurate but also presented in a way that it's easy for the user to understand them.

That's why Google produced a separate set of guidelines for voice search quality raters.

How Raters Evaluate Responses

Google advises its raters to evaluate responses based on two criteria:

Information satisfaction - the accuracy of the response itself and its relevance to the query

Speech quality - the length, formulation, and elocution of the response

Within the information satisfaction criteria, Google identifies two types of responses: answer responses and action responses.

Answer responses provide information in response to a query. Action responses generally perform a service, such as playing a song.

For example, if you ask Alexa "Who is Led Zeppelin?" you should get an answer response. On the other hand, if you tell Alexa "Play Stairway to Heaven," you should get an action response (in this case, the device will play the song Stairway to Heaven).

The speech quality criteria looks at the language used in the response. It's concerned more with grammar, dialect, and diction than factual accuracy.

Measuring Information Satisfaction

Let's look in some detail at each of the ratings criteria. We'll start with information satisfaction.

Raters score answer and action responses based on how well they meet the needs addressed in the query. Possible ratings include:

Fully meets

Highly meets

Moderately meets

Slightly meets

Fails to meet

Fortunately, the guidelines document provides several examples of query/response pairs with their associated ratings.

One example is the query "How tall was Charles Darwin?" The response is "Charles Darwin stood about 5 feet, 11 ½ inches tall."

That response earned a "Fully meets" rating because it clearly answers the question.

In another example, the query "What will the weather be like this weekend?" received this response: "It will be 69 degrees and cloudy."

That response earned a "Moderately meets to slightly meets" rating because it doesn't deliver the forecast for the entire weekend.

When it comes to action responses, raters determine whether the response did exactly what they asked the hands-free device to do.

In one example provided, the query "Play the magic flute" received a response that played Mozart's The Magic Flute.

That response got a "Highly meets" rating because in this case it was performed by an amateur orchestra. Most users would expect to hear it from a professional orchestra.

Measuring Speech Quality

As we've seen, speech quality is measured with three different categories:

Length - is the response too long, too short, or just right?

Formulation - is the response grammatically accurate?

Elocution - is the response easy to understand and in a native dialect?

Again, the guidelines offer plenty of examples with suitable ratings.

In one example, the query "What's the highest interest rate on a car?" returns the response "On [website name here], they say: Independent Consultants reported that if buyers have a credit score below 550, the interest rates on a new vehicle loan can be as low as 12% and on a used vehicle loan they can be as low as 17%, according to McGriffiths as reported."

In terms of length, that response earned a "Too Long" rating. It's way too verbal for the query.

In terms of formulation, the response earned a "Bad" rating. That's because there are two separate attributions (Independent Consultants and McGriffiths).

In terms of elocution, the response earned a "Moderate" rating. It doesn't sound very conversational in tone.

Wrapping It Up

Now that you know a little more about the voice search quality rater guidelines, think about how you can optimize your own site for voice search. Can you make your content clearer? Should you proofread it for grammatical errors?

Once you've ensured that your website content is in line with what quality raters are looking for, it stands a better chance of getting selected for a voice search response.

Published on: Jan 22, 2018