How to test if pages are indexed and crawler friendly

Self inflicted page removal from Google search results

How to test if pages are indexed and crawler friendly

Majority of time websites succumb to stupid mistakes or lack of knowledge (Self inflicted SEO penalties) and not due to algorithm filters or manual spam action.

Below are the most common mistakes made by Webmasters when optimizing a site for the Google search results. Follow the simple steps below or continue browsing the SEO help articles.



How to test if pages are indexed

I am sure you all familiar with the site:yoursite.com command. If you type the site:yoursite.com command into Google search and no results appear its possibly a penalty from the Webspam team, or you are blocking the crawler from crawling your site.

Firstly you need to navigate to Webmaster Tools, under messages you should find a message from the Webspam team informing you about the type of penalty, if no message, no penalty.

While in Webmaster Tools use "fetch as Googlebot" to see if your index page fetches, if it fails to fetch, you have a server issue (firewall, etc) or the page/site is blocked.

If the page does not fetch it could be due to robots meta tag, robots.txt file, header response (redirect) server errors (DNS) etc.

Checking robots meta tag

You now need to check your crawler access, starting in the head section of your document by right clicking the mouse to view your source code. You could be blocking the spider via your robots meta tag.

The noindex,nofollow blocks all crawlers from crawling the page and following links.

meta name="robots" content="noindex,nofollow"

The index,follow meta tag, allows all crawlers to crawl your page and to follow all links.

meta name="robots" content="index,follow"

Testing the robots.txt file

The next step is to test your robots.txt file. Some websites do not use the robots.txt file. The file is basically used to disallow crawlers from certain directories and files, there is a possibility you don't even have one, which is no big issue. If no robots.txt is found the spider will by default crawl your entire site.

How do I check if I’ve got a robots.txt file? You simple search your URL including /robots.txt (the robots.txt resides in the root of your domain where your index page resides)

www.yoursite.com/robots.txt

Example of robots.txt that allows all spiders to crawl all files and directories.

User-agent: * Disallow:

Example of robots.txt that disallows all spiders from crawling all files and directories. The slash disallows all robots (it blocks all files and directories from all robots)

User-agent: * Disallow: /

Here is an excellent free on-line tool to use, robots checker. The tool will point out all errors, directories and files that have been blocked from crawlers. You simply type in your full URL followed by /robots.txt (don't forget to type /robots.txt) Example: http://yoursite.com/robots.txt

Testing header response, redirects

Redirects are common mistakes and widely used by hackers accessing your .htaccess file that usually resides in the root of your domain. 301 permanent redirects can result in your indexed page and other pages been removed from index (the spider drops pages that are redirected) Your header response needs to respond 200 found. An excellent tool to test your page is web-sniffer.net a free on-line tool.

Testing server errors, DNS

This free on-line DNS tool will check the health and configuration of your server also providing mail and DNS report. They also give a short summary with suggestions to fix and improve
Check DNS health

Testing rel=canonical

rel=canonical is used in the head section of your document and tells Google the preferred version of similar pages. Make sure your index page does not point to one of your internal pages, or for that matter, make sure you use rel=canonical correctly. It wont be the course of your site been de-indexed but it could cause you crawling issues.

Example of rel=canonical.

link rel="canonical" href="http://www.yourpage.com"

Still no rankings, what now?

"Help, I tested using your methods and I don't have a penalty in Webmaster Tools!! What's causing my rankings to drop? Please help me, my pages are however still indexed".

If the site lost rankings but is still indexed, its a sign of an algorithm filter.

Algorithm updates tend to be more difficult to spot than penalties. Reason been, with manual penalties you receive a message in Webmaster Tools giving you direction of "fix", the same does not apply with algorithm filters, your site simply vanishes from search, but still remains indexed.

In an instance like this you need navigate to your Webmaster account (Webmaster Tools) to view when your traffic dropped. That you can do by viewing your traffic graft (queries, impressions and clicks) establishing the date when it occurred. Then simply check which algorithm update you fall under.

When confused or in doubt, email us for assistance. We don't charge for small issues and neither do we charge for a basic site review, and we don't request login or personal info, only the URL.

What type of SEO service do you offer

Cyber SEO offers general search engine optimization on existing and new sites as well as:

  • Website Reviews, basic site reviews are for free
  • Algorithm Filter Help
  • Manual WebSpam Panda and Penguin Penalty Help
  • Reconsideration Requests
  • Disavow File uploads and link removals
  • Self Inflicted Filters/Penalties
  • And lastly, Hacking issues

Please feel to to pop us an email for personal assistance or advertising. ​

Google Webmaster SEO Help

Please fell free to brows our Google Webmaster SEO Help Articles, helping you restore rankings in the search results.

Webmaster Help
Hacking Help
​Mobile Help
​Related Help Articles