Jump to content

robots.txt, disallow, but google still crawled and indexed pages.


Recommended Posts

Hi there,

Version 1.4.1
Friendly URLS
Automatically redirect to canonical url

robots.txt file does have disallow lines for various folders and files e.g:
Disallow: /order-opc.php

However I have noticed that google has still listed various pages e.g:
mysite/quick-order
which as you may know is the default friendly url for order-opc.php

Am I missing something basic or is this kind of stupid?
Is the robot.txt file simply disallowing the normal page file (e.g: order-opc.php) so that only the friendly url is indexed?

Why is quick-order, guest-tracking, etc etc being indexed anyway?

Any thoughts, ideas, changes, tip etc gratefully received.

Link to comment
Share on other sites

are there some particular pages you do not want to index, or all of them.?

you can disallow the friendly re-written urls aswell using the robots.txt and google will obey.

you can disallow google completely from your site using the robots.txt and google will obey.

Like Never said if the pages are already indexed denying google crawling your site is only part 1. You can ask google via webmaster tools to remove certain indexed pages.

I cannot remember if there is a way over time to use the robots.txt to remove certain pages. It might be of use to add the meta 'No Index' in to your header.tpl but this will be for all pages.

shoulders

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...