Search the Community
Showing results for tags 'crawl'.
-
Buenas Quisiera saber como agregar la etiqueta no index-no follow a las siguientes paginas que no necesitan ser indexadas (A Google no le gusta que lo sean): -Carrito -Registro/checkout -mi Cuenta -Paginas de resultados de busqueda interna de la web Hay alguna manera de hacerlo desde el BO de presstashop o hay que modificar el robots.txt o el archivo .htaccesss? Espero vuestra respuesta Gracias de antemano
-
Hello, Je viens de caler un p'tit module de modifications du robots.txt sous Prestashop 1.6 et 1.7 Suite à l'installation du module, vous accéderez à un panel de configuration qui vous proposera simplement d'ajouter des règles dans le fichier robots.txt de Prestashop. Une fois vos modifications enregistrées, pensez à régénérer le robots.txt depuis l'onglet "SEO & URL" de votre Prestashop. Ce module ne désindexe pas vos pages, et y'a pas mal de modules qui permettent cela. Pensez à "googler" ou vous rendre sur Prestashop Addons pour en trouver qui vous correspondent. P.S. : j'suis un peu embêté avec le suivi de modules sur le fofo de Prestashop et depuis mon site, ça fait un peu doublon sur les modules gratos. Je me demandais si déontologiquement je pouvais mettre le lien vers mon site ou si je devais plutôt continuer (mais avec de sacrées latences) répondre aux demandes sur les différents topics du forum. Dans l'immédiat, je vais en rester à essayer de suivre le fofo, mais il faudra me pardonner pour le temps de réponse, la gratuité c'est sympa envers la communauté mais bon, ça nourrit pas son homme hein ! everpsrobots.zip
-
Hi all, I want to block all bots from crawling my site until the designing is finished. I generated a robots.txt file from the BO and checked/confirmed the file on my server via FTP, but the test from Google Webmaster Tool came back as "allowed". Here's my robots.txt: # robots.txt automaticaly generated by PrestaShop e-commerce open-source solution # http://www.prestashop.com - http://www.prestashop.com/forums # This file is to prevent the crawling and indexing of certain parts # of your site by web crawlers and spiders run by sites like Yahoo! # and Google. By telling these "robots" where not to go on your site, # you save bandwidth and server resources. # For more information about the robots.txt standard, see: # http://www.robotstxt.../wc/robots.html # All bots User-agent: * Disallow: / # Private pages Disallow: /*orderby= Disallow: /*orderway= Disallow: /*tag= Disallow: /*id_currency= Disallow: /*search_query= Disallow: /*back= Disallow: /*utm_source= Disallow: /*utm_medium= Disallow: /*utm_campaign= Disallow: /*n= Disallow: /*controller=addresses Disallow: /*controller=address Disallow: /*controller=authentication Disallow: /*controller=cart Disallow: /*controller=discount Disallow: /*controller=footer Disallow: /*controller=get-file Disallow: /*controller=header Disallow: /*controller=history Disallow: /*controller=identity Disallow: /*controller=images.inc Disallow: /*controller=init Disallow: /*controller=my-account Disallow: /*controller=order Disallow: /*controller=order-opc Disallow: /*controller=order-slip Disallow: /*controller=order-detail Disallow: /*controller=order-follow Disallow: /*controller=order-return Disallow: /*controller=order-confirmation Disallow: /*controller=pagination Disallow: /*controller=password Disallow: /*controller=pdf-invoice Disallow: /*controller=pdf-order-return Disallow: /*controller=pdf-order-slip Disallow: /*controller=product-sort Disallow: /*controller=search Disallow: /*controller=statistics Disallow: /*controller=attachment Disallow: /*controller=guest-tracking # Directories Disallow: /*classes/ Disallow: /*config/ Disallow: /*download/ Disallow: /*mails/ Disallow: /*modules/ Disallow: /*translations/ Disallow: /*tools/ # Files Disallow: /*en/password-recovery Disallow: /*en/address Disallow: /*en/addresses Disallow: /*en/authentication Disallow: /*en/cart Disallow: /*en/discount Disallow: /*en/order-history Disallow: /*en/identity Disallow: /*en/my-account Disallow: /*en/order-follow Disallow: /*en/order-slip Disallow: /*en/order Disallow: /*en/search Disallow: /*en/quick-order Disallow: /*en/guest-tracking Disallow: /*en/order-confirmation I'm worried that as if my site is crawled, and I edit/delete certain CMS/product/category pages, I'll end up with too many 404's before my site is actually launched. I attached the exact test result message Google Webmaster Tool returns when testing the robots.txt file's configuration. Any advice? I'd appreciate an expert Prestashop SEO's opinion about this issue. Thanks! -Alex
- 18 replies
-
- google webmaster tool
- robots.txt
-
(and 3 more)
Tagged with:
-
I would like to disallow bots to crawl a certain directory on my server. It is not possible to add this manually to the robot.txt file like adding information to the .htaccess file via de BO. Besides that it's not possible to overwrite the robots.txt file via ftp and even if it would be possible the manually added line will be removed when generating a new robots.txt file via de BO. Does someone know a good solution for this? Additional details why? The directory that I don't want to be crawled contains a flash banner that shows on virtualy every page of the shop and the text of the flash file is indexed by google. To be precisely the text of the flash banner is shown in the description of any indexed page in google, even though I have a unique meta description for virtually every page. Normally the meta-description of a page should be the page description for indexed pages, however google finds flash text more important. To resolve this I thought that disallowing the directory to be crawled should resolve the problem. I you have any other idea's? They would be welcome too! Thanks in advance for any replies.
- 4 replies
-
- add
- robots.txt
- (and 4 more)
-
I renamend category and products to better keywords, but google craw still trying to reach these pages and hit 404 error. I was thinking that becase it was indexed before and it will go away after some time. But few days ago I submited website to Bing and now I see same old links with 404 and 503 errors. There are no these links in sitemap. Where it comes from and how to fix it? PrestaShop™ 1.5.4.1 Google sitemap module 2.3.5
- 4 replies
-
- seo
- google index
-
(and 5 more)
Tagged with:
-
Hello all, I keep getting an 'Increase in not found errors' message in my webmaster tools and when I look at the information it shows all the errors are from different language versions of my site that are not available. I am using Prestashop 1.5.4.1 for my online store and only have two languages (English, Spanish) enabled. However Google tries to crawl the Italian, Dutch, French, German version of my site and it causes an error because those languages are not enabled. How do I go about fixing this issue? I want to block Google from searching languages that do not exist. Attached is the error page I am seeing. Here is the message I receive from Google: Google detected a significant increase in the number of URLs that return a 404 (Page Not Found) error. Investigating these errors and fixing them where appropriate ensures that Google can successfully crawl your site's pages.
-
Hola, Resulta que en google webmaster tengo una serie de metadescripciones duplicadas, por ejemplo : Lista de Fabricantes que disponemos en nuestra web /fabricantes /manufacturer.php/ Resulta que en el .htaccess que genera Prestashop marca esto: RewriteRule ^fabricantes$ /manufacturer.php [QSA,L] Como puede ser que me marque metadescripción duplicada si está redirigiendo? Como lo puedo hacer para que no me indexe manufacturer.php pero si fabricantes? Otra consulta: En un extracto del robots.txt generado por PS 1.4.7.0 marca esto : # http://www.robotstxt.org/wc/robots.html # GoogleBot specific User-agent: Googlebot Disallow: /*orderby= Disallow: /*orderway= Disallow: /*tag= Disallow: /*id_currency= Disallow: /*search_query= Disallow: /*id_lang= Disallow: /*back= Disallow: /*utm_source= Disallow: /*utm_medium= Disallow: /*utm_campaign= Disallow: /*n= Se supone que no indexará nada que contenga orderby= por ejemplo. El problema que tengo es que en google webmaster me marca como duplicado esto: Nuestros productos promocionales /promocion?orderby=price&orderway=asc /promocion?orderby=price&orderway=desc No se supone que no lo tiene que indexar? A ver si alguien me puede hechar una mano. Muchas gracias. P.D Ademas en parámetros de url en google, tengo esto: orderway 175 08/03/2012 Ordena el contenido. Ninguna URL Editar /Restablecer orderby 175 08/03/2012 Ordena el contenido. Ninguna URL Editar /Restablecer Vaya que no rastree la url de orderway y orderby.
-
I don't know what to do with this issue. Please anyone can help me. My webhosting company suspended my site because of this quote: "System administration has identified your account as using higher resources on the server housing your account. This is impacting other users, and we may be forced to suspend or have already suspended your site in order to stabilize the server. We noticed that your site (Prestahsop based site), VARIABLE_1, is being heavily 'crawled' by search engines. Search engines tend to mimic the effect of hundreds of visitors going through every portion of your site, often all at once. You may wish to implement a robots.txt file in order to reduce this effect. This file contains instructions for well behaving 'robots' on how to crawl your site. You can find more information about this here: http://www.robotstxt.org/. The basic format would be as follows to block robots from the following (example) directories: User-agent: * Disallow: /cgi-bin/ Disallow: /images/ Disallow: /tmp/ Disallow: /private/ To use this effectively, you will need to review your site and see what parts might be the most intensive. An alternative to blocking a search engine is to request their robots to not crawl through your site as quickly as they normally would. It is an unofficial extension to the robots.txt standard but one that most popular search engines use. This is an example of how you request robots request pages only every ten seconds: User-agent: * Crawl-delay: 10 This is especially useful for parts of your sites like forums or 'tag clouds' that, while useful to human visitors, are troublesome in terms of how robots aggressively pass through them repeatedly. You can also use your access logs to see how search engines are hitting your site. Let us know if you need help finding your logs in our control panel and we'll be glad to help. If your site is currently suspended, please contact us to lift the suspension in order to implement the above recommendation. As always, feel free to contact us with any further questions." Would these recomendations really help? I have many problems with Prestashop SEO lately.