Follow us on Twitter X-Cart on Facebook Wiki
Shopping cart software Solutions for online shops and malls
 

gsite crawler taking forever!!!

 
Reply
   X-Cart forums > X-Cart 4 > Dev Questions > Changing design
 
Thread Tools Search this Thread
  #1  
Old 06-22-2007, 12:33 PM
 
keystone keystone is offline
 

X-Adept
  
Join Date: Jul 2006
Location: USA
Posts: 787
 

Default gsite crawler taking forever!!!

I started gsite crawler this morning around 9:30am I imported my robots.txt to make sure it wasn't going to crawl my entire site. I ended up with around 6400 files waiting and I'm now down to 4500 but it is 4:30pm!!!! Is it supposed to take that long???? Anybody know what settings I should have adjusted or any other tips?
thanks
__________________
www.uscandleco.com - X-Cart Version 4.7.11 Gold Plus php7.3
mods:
reCaptcha
running on UNIX

www.keystonecandle.com X-Cart Gold Plus - Version 4.7.11 php7.2
mods:
reCaptcha
cdseo pro
running on UNIX
Reply With Quote
  #2  
Old 06-22-2007, 12:41 PM
 
nevets1219 nevets1219 is offline
 

eXpert
  
Join Date: Jun 2006
Posts: 351
 

Default Re: gsite crawler taking forever!!!

We have something like
Quote:
*?=productcode&=1
/*.html?js=*
/*?=productcode&=1
/?
/?*
/?=productcode&=1
/?sort=*
/admin/
/antibot_image.php
/cart.php
/catalog/
/error_message.php
/files/
/giftcert.php
/giftreg_manage.php
/giftregs.php
/help.php
/home.php?cat=*
/icon.php
/image.php
/include/
/modules/
/offers.php
/orders.php
/payment/
/product.php
/product.php*
/product_image.php
/register.php
/search.php
/shop_closed.html
/sql/
/upgrade/
/var/
?*
Some are probably duplicates and some are not necessary but we figure this works so far This includes more than what is in our robots.txt.

It takes about less than 5 minutes (2-3 minutes) for it to go through and process (generated 300ish links). Though you may want to limit it on a shared hosting since it can be pretty intense on your website. Also when your host locks your site for CPU overuse, GSiteCrawler doesn't realize that and continues to crawl.
__________________
4.1.8
Reply With Quote
  #3  
Old 06-22-2007, 02:57 PM
  Jon's Avatar 
Jon Jon is offline
 

X-Guru
  
Join Date: Oct 2002
Location: Vancouver, Canada
Posts: 4,200
 

Default Re: gsite crawler taking forever!!!

Some crawlers don't respect the base href tag so having CDSEO installed can cause problems if they don't.

If you find that the urls they are coming up with are "looping", it means that the base href tag is not being followed.

You can solve this problem with the following code in .htaccess:

Code:
RewriteEngine on RewriteCond %{REQUEST_URI} ^/[^\.]+[^/]$ RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1/ [R=301,L]
Reply With Quote
  #4  
Old 06-24-2007, 08:21 AM
 
nfc5382 nfc5382 is offline
 

X-Adept
  
Join Date: Nov 2002
Posts: 481
 

Default Re: gsite crawler taking forever!!!

take a look at http://www.xml-sitemaps.com/standalone-google-sitemap-generator.html . it can be automated on your server. works fine with CDSEO
__________________
-----------------------
x-cart v4.7.6 [LIVE]
x-cart v4.0.18 [retired 2004-2016]
x-cart v3.5.13 [retired]
x-cart v3.4.14 [retired]
Reply With Quote
  #5  
Old 07-05-2007, 07:26 AM
 
keystone keystone is offline
 

X-Adept
  
Join Date: Jul 2006
Location: USA
Posts: 787
 

Default Re: gsite crawler taking forever!!!

Is this code in my .htaccess file the same as what you suggested? If so I already have it there.

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.uscandleco\.com [NC]
RewriteRule ^(.*)$ http://www.uscandleco.com/$1 [R=301,L]
RewriteCond %{SERVER_PORT} 80
#RewriteRule ^admin/ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
</IfModule>

I ran gsitecrawler again and it ran overnight and when I checked it there were150000 files waiting and it still was going. I shut it down. I'm assuming it was just looping again. Any help would be great.
__________________
www.uscandleco.com - X-Cart Version 4.7.11 Gold Plus php7.3
mods:
reCaptcha
running on UNIX

www.keystonecandle.com X-Cart Gold Plus - Version 4.7.11 php7.2
mods:
reCaptcha
cdseo pro
running on UNIX
Reply With Quote
  #6  
Old 07-07-2007, 10:41 PM
  Jon's Avatar 
Jon Jon is offline
 

X-Guru
  
Join Date: Oct 2002
Location: Vancouver, Canada
Posts: 4,200
 

Default Re: gsite crawler taking forever!!!

The code isn't the same, please try the code I provided.
Reply With Quote
Reply
   X-Cart forums > X-Cart 4 > Dev Questions > Changing design



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -8. The time now is 08:30 AM.

   

 
X-Cart forums © 2001-2020