Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

Webmaster forums' Top Contributors rock

Wednesday, October 05, 2011 at 10:27 AM

Webmaster level: All

The TC Summit was a blast! As we wrote in our announcement post, we recently invited more than 250 Top Contributors from all over the world to California to thank them for being so awesome and to give them the opportunity to meet some of our forum guides, engineers and product managers in person.

Our colleagues Adrianne and Brenna already published a recap post on the Official Google Blog. As for us, the search folks at Google, there's not much left to say except that we enjoyed the event and meeting Top Contributors in real life, many of them for the first time. We got the feeling you guys had a great time, too. Let’s quote a few of the folks who make a huge difference on a daily basis:

Sasch Mayer on Google+ (Webmaster TC in English):

"For a number of reasons this event does hold a special place for me, and always will. It's not because I was one of comparatively few people to be invited for a Jolly at the ‘Plex, but because this trip offered the world's TCs a unique opportunity to finally meet each other in person."

Herbert Sulzer, a.k.a. Luzie on Google+ (Webmaster TC in English, German and Spanish):

“Hehehe! Fun, fun fun, this was all fun :D Huhhh”

Aygul Zagidullina on Google+ (Web Search TC in English):

“It was a truly fantastic, amazing, and unforgettable experience meeting so many other TCs across product forums and having the chance to talk to and hear from so many Googlers across so many products!”

Of course we did receive lots of constructive feedback, too. Transparency and communication were on top of the list, and we're looking into increasing our outreach efforts via Webmaster Tools, so stay tuned! By the way, if you haven’t done so yet, please remember to use the forwarding option in the Webmaster Tools Message Center to get the messages straight to your email inbox. In the meantime please keep an eye on our Webmaster Central Blog, and of course keep on contributing to discussions in the Google Webmaster Forum.

On behalf of all Google guides who participated in the 2011 Summit we want to thank you. You guys rock! :)


That’s right, TCs & Google Guides came from all over the world to convene in California.


TCs & Google Guides from Webmaster Central and Search forums after one of the sessions.


After a day packed with presentations and breakout sessions...


...we did what we actually came for...


...enjoyed a party, celebrated and had a great time together.

Webmaster Tools Search Queries data is now available in Google Analytics

Tuesday, October 04, 2011 at 12:42 PM

Webmaster level: All

Earlier this year we announced a limited pilot for Search Engine Optimization reports in Google Analytics, based on Search queries data from Webmaster Tools. Thanks to valuable feedback from our pilot users, we’ve made several improvements and are pleased to announce that the following reports are now publicly available in the Traffic Sources section of Google Analytics.
  • Queries: impressions, clicks, position, and CTR info for the top 1,000 daily queries
  • Landing Pages: impressions, clicks, position, and CTR info for the top 1,000 daily landing pages
  • Geographical Summary: impressions, clicks, and CTR by country
All of these Search Engine Optimization reports offer Google Analytics’ advanced filtering and visualization capabilities for deeper data analysis. With the secondary dimensions, you can view your site’s data in ways that aren’t available in Webmaster Tools.


To enable these Search Engine Optimization reports for a web property, you must be both a Webmaster Tools verified site owner and a Google Analytics administrator of that Property. Once enabled, administrators can choose which profiles can see these reports.

If you have feedback or suggestions, please let us know in the Webmaster Help Forum.

Work smarter, not harder, with site health

Thursday, September 29, 2011 at 2:26 PM

Webmaster level: All

We consistently hear from webmasters that they have to prioritize their time. Some manage dozens or hundreds of clients’ sites; others run their own business and may only have an hour to spend on website maintenance in between managing finances and inventory. To help you prioritize your efforts, Webmaster Tools is introducing the idea of “site health,” and we’ve redesigned the Webmaster Tools home page to highlight your sites with health problems. This should allow you to easily see what needs your attention the most, without having to click through all of the reports in Webmaster Tools for every site you manage.

Here’s what the new home page looks like:


You can see that sites with health problems are shown at the top of the list. (If you prefer, you can always switch back to listing your sites alphabetically.) To see the specific issues we detected on a site, click the site health icon or the “Check site health” link next to that site:


This new home page is currently only available if you have 100 or fewer sites in your Webmaster Tools account (either verified or unverified). We’re working on making it available to all accounts in the future. If you have more than 100 sites, you can see site health information at the top of the Dashboard for each of your sites.

Right now we include three issues in your site’s health check:

  1. Have we detected malware on the site?
  2. Have any important pages been removed via our URL removal tool?
  3. Are any of your important pages blocked from crawling in robots.txt?

You can click on any of these items to get more details about what we detected on your site. If the site health icon and the “Check site health” link don’t appear next to a site, it means that we didn’t detect any of these issues on that site (congratulations!).

A word about “important pages:” as you know, you can get a comprehensive list of all URLs that have been removed by going to Site configuration > Crawler access > Remove URL; and you can see all the URLs that we couldn’t crawl because of robots.txt by going to Diagnostics > Crawl errors > Restricted by robots.txt. But since webmasters often block or remove content on purpose, we only wanted to indicate a potential site health issue if we think you may have blocked or removed a page you didn’t mean to, which is why we’re focusing on “important pages.” Right now we’re looking at the number of clicks pages get (which you can see in Your site on the web > Search queries) to determine importance, and we may incorporate other factors in the future as our site health checks evolve.

Obviously these three issues—malware, removed URLs, and blocked URLs—aren’t the only things that can make a website “unhealthy;” in the future we’re hoping to expand the checks we use to determine a site’s health, and of course there’s no substitute for your own good judgment and knowledge of what’s going on with your site. But we hope that these changes make it easier for you to quickly spot major problems with your sites without having to dig down into all the data and reports.

After you’ve resolved any site health issues we’ve flagged, it will usually take several days for the warning to disappear from your Webmaster Tools account, since we have to recrawl the site, see the changes you’ve made, and then process that information through our Web Search and Webmaster Tools pipelines. If you continue to see a site health warning for that site after a week or so, the issue may not have been resolved. Feel free to ask for help tracking it down in our Webmaster Help Forum... and let us know what you think!

Pagination with rel=“next” and rel=“prev”

Thursday, September 15, 2011 at 5:00 AM

Webmaster level: Intermediate to Advanced

Much like rel=”canonical” acts a strong hint for duplicate content, you can now use the HTML link elements rel=”next” and rel=”prev” to indicate the relationship between component URLs in a paginated series. Throughout the web, a paginated series of content may take many shapes—it can be an article divided into several component pages, or a product category with items spread across several pages, or a forum thread divided into a sequence of URLs. Now, if you choose to include rel=”next” and rel=”prev” markup on the component pages within a series, you’re giving Google a strong hint that you’d like us to:
  • Consolidate indexing properties, such as links, from the component pages/URLs to the series as a whole (i.e., links should not remain dispersed between page-1.html, page-2.html, etc., but be grouped with the sequence).
  • Send users to the most relevant page/URL—typically the first page of the series.


The relationship between component URLs in a series can now be indicated to Google through rel=”next” and rel=”prev”.

There’s an exception to the rel=”prev” and rel=”next” implementation: If, alongside your series of content, you also offer users a view-all page, or if you’re considering a view-all page, please see our post on View-all in search results for more information. Because view-all pages are most commonly preferred by searchers, we do our best to surface this version when appropriate in results rather than a component page (component pages are more likely to surface with rel=”next” and rel=”prev”).

If you don’t have a view-all page or you’d like to override Google returning a view-all page, you can use rel="next" and rel="prev" as described in this post.

For information on paginated configurations that include a view-all page, please see our post on View-all in search results.

Outlining your options

Here are three options for a series:
  1. Leave whatever you have exactly as-is. Paginated content exists throughout the web and we’ll continue to strive to give searchers the best result, regardless of the page’s rel=”next”/rel=”prev” HTML markup—or lack thereof.
  2. If you have a view-all page, or are considering a view-all page, see our post on View-all in search results.
  3. Hint to Google the relationship between the component URLs of your series with rel=”next” and rel=”prev”. This helps us more accurately index your content and serve to users the most relevant page (commonly the first page). Implementation details below.

Implementing rel=”next” and rel=”prev”

If you prefer option 3 (above) for your site, let’s get started! Let’s say you have content paginated into the URLs:

http://www.example.com/article?story=abc&page;=1
http://www.example.com/article?story=abc&page;=2
http://www.example.com/article?story=abc&page;=3
http://www.example.com/article?story=abc&page;=4

On the first page, http://www.example.com/article?story=abc&page;=1, you’d include in the <head> section:
<link rel="next" href="http://www.example.com/article?story=abc&page;=2" />

On the second page, http://www.example.com/article?story=abc&page;=2:
<link rel="prev" href="http://www.example.com/article?story=abc&page;=1" />
<link rel="next" href="http://www.example.com/article?story=abc&page;=3" />

On the third page, http://www.example.com/article?story=abc&page;=3:
<link rel="prev" href="http://www.example.com/article?story=abc&page;=2" />
<link rel="next" href="http://www.example.com/article?story=abc&page;=4" />

And on the last page, http://www.example.com/article?story=abc&page;=4:
<link rel="prev" href="http://www.example.com/article?story=abc&page;=3" />

A few points to mention:
  • The first page only contains rel=”next” and no rel=”prev” markup.
  • Pages two to the second-to-last page should be doubly-linked with both rel=”next” and rel=”prev” markup.
  • The last page only contains markup for rel=”prev”, not rel=”next”.
  • rel=”next” and rel=”prev” values can be either relative or absolute URLs (as allowed by the <link> tag). And, if you include a <base> link in your document, relative paths will resolve according to the base URL.
  • rel=”next” and rel=”prev” only need to be declared within the <head> section, not within the document <body>.
  • We allow rel=”previous” as a syntactic variant of rel=”prev” links.
  • rel="next" and rel="previous" on the one hand and rel="canonical" on the other constitute independent concepts. Both declarations can be included in the same page. For example, http://www.example.com/article?story=abc&page;=2&sessionid;=123 may contain:

    <link rel="canonical" href="http://www.example.com/article?story=abc&page;=2”/>
    <link rel="prev" href="http://www.example.com/article?story=abc&page;=1&sessionid;=123" />
    <link rel="next" href="http://www.example.com/article?story=abc&page;=3&sessionid;=123" />

  • rel=”prev” and rel=”next” act as hints to Google, not absolute directives.
  • When implemented incorrectly, such as omitting an expected rel="prev" or rel="next" designation in the series, we'll continue to index the page(s), and rely on our own heuristics to understand your content.

Questions?
More information can be found in our Help Center, or join the conversation in our Webmaster Help Forum!

View-all in search results

Webmaster level: Intermediate to Advanced

User testing has taught us that searchers much prefer the view-all, single-page version of content over a component page containing only a portion of the same information with arbitrary page breaks (which cause the user to click “next” and load another URL).


Searchers often prefer the view-all vs. paginated content with arbitrary page breaks and worse latency.

Therefore, to improve the user experience, when we detect that a content series (e.g. page-1.html, page-2.html, etc.) also contains a single-page version (e.g. page-all.html), we’re now making a larger effort to return the single-page version in search results. If your site has a view-all option, there’s nothing you need to do; we’ll work to do it on your behalf. Also, indexing properties, like links, will be consolidated from the component pages in the series to the view-all page.

However, high latency can make the view-all less preferred

Interestingly, the cases when users didn’t prefer the view-all page were correlated with high latency (e.g., when the view-all page took a while to load, say, because it contained many images). This makes sense because we know users are less satisfied with slow results. So while a view-all page is commonly desired, as a webmaster it’s important to balance this preference with the page’s load time and overall user experience.

Best practices for a series of content
  1. If your site includes view-all pages

    We aim to detect the view-all version of your content and, if available, its associated component pages. There’s nothing more you need to do! However, if you’d like to make it more explicit to us, you can include rel=”canonical” from your component pages to your view-all to increase the likelihood that we detect your series of pages appropriately.


    rel=”canonical” can specify the superset of content (i.e. the view-all page) from the same information in a series of URLs.

    Why does this work?

    In the diagram, page-2.html of a series may specify the canonical target as page-all.html because page-all.html is a superset of page-2.html's content. When a user searches for a query term and page-all.html is selected in search results, even if the query most related to page-2.html, we know the user will still see page-2.html’s relevant information within page-all.html.


    On the other hand, page-2.html shouldn’t designate page-1.html as the canonical because page-2.html’s content isn’t included on page-1.html. It’s possible that a user’s search query is relevant to content on page-2.html, but if page-2.html’s canonical is set to page-1.html, the user could then select page-1.html in search results and find herself in a position where she has to further navigate to a different page to arrive at the desired information. That’s a poor experience for the user, a suboptimal result from us, and it could also bring poorly targeted traffic to your site.


    However, if you strongly desire your view-all page not to appear in search results: 1) make sure the component pages in the series don’t include rel=”canonical” to the view-all page, and 2) mark the view-all page as “noindex” using any of the standard methods.
  2. If you’d like to surface individual, component pages (or there’s no view-all available)

    It may be the case that one or both of the situations below apply to your site:

    • The view-all page is undesirable as a search result (e.g., load time too high or too difficult for users to navigate).
    • Your users prefer the multi-page experience and to be directed to a component page in search results, rather than the view-all page.

    If so, you can use standard HTML rel=”next” and rel=”prev” elements to specify a relationship between the component pages in your series of content. If done correctly, Google will generally strive to:

    • Consolidate indexing properties, such as links, between the component pages/URLs.
    • Send users to the most relevant page/URL from the component pages. Typically, the most relevant page is the first page of your content, but our algorithms may point users to one of the component pages in the series.

It’s not uncommon for webmasters to incorrectly use rel=”canonical” from component pages to the first page of their series (e.g. page-2.html with rel=”canonical” to page-1.html). We recommend against this implementation because the component pages don’t actually contain duplicate content. Using rel=”next” and rel=”prev” is far more appropriate.

Summary

Because users generally prefer the view-all option in search results, we’re making more of an effort to properly detect and serve this version to searchers. If you have a series of content, there’s nothing more you need to do. If you’d like to hint more to Google how best to serve users your information:
  1. To better optimize your view-all page, you can use rel=”canonical” from component pages to the single-page version; otherwise,
  2. If a view-all page doesn’t provide a good user experience for your site, you can use the rel=”next” and rel=”prev” attributes as a strong hint for Google to identify the series of pages and still surface a component page in results.

Questions?

As always, feel free to ask in our Webmaster Help Forum.

Reconsideration requests get more transparent

Webmaster level: All

If your site isn't appearing in Google search results, or it's performing more poorly than it once did (and you believe that it does not violate our Webmaster Guidelines), you can ask Google to reconsider your site. Over time, we’ve worked to improve the reconsideration process for webmasters. A couple of years ago, in addition to confirming that we had received the request, we started sending a second message to webmasters confirming that we had processed their request. This was a huge step for webmasters who were anxiously awaiting results. Since then, we’ve received feedback that webmasters wanted to know the outcome of their requests. Earlier this year, we started experimenting with sending more detailed reconsideration request responses and the feedback we’ve gotten has been very positive!

Now, if your site is affected by a manual spam action, we may let you know if we were able to revoke that manual action based on your reconsideration request. Or, we could tell you if your site is still in violation of our guidelines. This might be a discouraging thing to hear, but once you know that there is still a problem, it will help you diagnose the issue.

If your site is not actually affected by any manual action (this is the most common scenario), we may let you know that as well. Perhaps your site isn’t being ranked highly by our algorithms, in which case our systems will respond to improvements on the site as changes are made, without your needing to submit a reconsideration request. Or maybe your site has access issues that are preventing Googlebot from crawling and indexing it. For more help debugging ranking issues, read our article about why a site may not be showing up in Google search results.

We’ve made a lot of progress on making the entire reconsideration request process more transparent. We aren’t able to reply to individual requests with specific feedback, but now many webmasters will be able to find out if their site has been affected by a manual action and they’ll know the outcome of the reconsideration review. In an ideal world, Google could be completely transparent about how every part of our rankings work. However, we have to maintain a delicate balance: trying to give as much information to webmasters as we can without letting spammers probe how to do more harm to users. We're happy that Google has set the standard on tools, transparency, and communication with site owners, but we'll keep looking for ways to do even better.

Introducing: Application Rich Snippets

Wednesday, September 14, 2011 at 1:06 PM

Webmaster level: All

Rich snippets help users determine more quickly if a particular web page has the information they're interested in. We've previously introduced rich snippets for shopping, recipes, reviews, video, and events, and most recently music.

Before you install a software application, users might want to check out what others think about it, and how much it costs. We are pleased to announce that starting today, you’ll be able to get this information right in search results.

Here's an example of what an application snippet looks like.

Image of application snippet

You can see application snippets from several marketplaces and review sites, including Android Market, Apple iTunes, and CNET. For information on how to add app markup on your site, please refer to our Webmaster central article and send any questions to our discussion help forum.