X-Cart: shopping cart software

X-Cart forums (https://forum.x-cart.com/index.php)
-   Changing design (https://forum.x-cart.com/forumdisplay.php?f=51)
-   -   gsite crawler taking forever!!! (https://forum.x-cart.com/showthread.php?t=32027)

keystone 06-22-2007 12:33 PM

gsite crawler taking forever!!!
 
I started gsite crawler this morning around 9:30am I imported my robots.txt to make sure it wasn't going to crawl my entire site. I ended up with around 6400 files waiting and I'm now down to 4500 but it is 4:30pm!!!! Is it supposed to take that long???? Anybody know what settings I should have adjusted or any other tips?
thanks

nevets1219 06-22-2007 12:41 PM

Re: gsite crawler taking forever!!!
 
We have something like
Quote:

*?=productcode&=1
/*.html?js=*
/*?=productcode&=1
/?
/?*
/?=productcode&=1
/?sort=*
/admin/
/antibot_image.php
/cart.php
/catalog/
/error_message.php
/files/
/giftcert.php
/giftreg_manage.php
/giftregs.php
/help.php
/home.php?cat=*
/icon.php
/image.php
/include/
/modules/
/offers.php
/orders.php
/payment/
/product.php
/product.php*
/product_image.php
/register.php
/search.php
/shop_closed.html
/sql/
/upgrade/
/var/
?*
Some are probably duplicates and some are not necessary but we figure this works so far :) This includes more than what is in our robots.txt.

It takes about less than 5 minutes (2-3 minutes) for it to go through and process (generated 300ish links). Though you may want to limit it on a shared hosting since it can be pretty intense on your website. Also when your host locks your site for CPU overuse, GSiteCrawler doesn't realize that and continues to crawl.

Jon 06-22-2007 02:57 PM

Re: gsite crawler taking forever!!!
 
Some crawlers don't respect the base href tag so having CDSEO installed can cause problems if they don't.

If you find that the urls they are coming up with are "looping", it means that the base href tag is not being followed.

You can solve this problem with the following code in .htaccess:

Code:

RewriteEngine on
RewriteCond %{REQUEST_URI} ^/[^\.]+[^/]$
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1/ [R=301,L]


nfc5382 06-24-2007 08:21 AM

Re: gsite crawler taking forever!!!
 
take a look at http://www.xml-sitemaps.com/standalone-google-sitemap-generator.html . it can be automated on your server. works fine with CDSEO

keystone 07-05-2007 07:26 AM

Re: gsite crawler taking forever!!!
 
Is this code in my .htaccess file the same as what you suggested? If so I already have it there.

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.uscandleco\.com [NC]
RewriteRule ^(.*)$ http://www.uscandleco.com/$1 [R=301,L]
RewriteCond %{SERVER_PORT} 80
#RewriteRule ^admin/ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
</IfModule>

I ran gsitecrawler again and it ran overnight and when I checked it there were150000 files waiting and it still was going. I shut it down. I'm assuming it was just looping again. Any help would be great.

Jon 07-07-2007 10:41 PM

Re: gsite crawler taking forever!!!
 
The code isn't the same, please try the code I provided.


All times are GMT -8. The time now is 02:09 PM.

Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.