Follow us on Twitter X-Cart on Facebook Wiki
Shopping cart software Solutions for online shops and malls

How Can I download all PDFs from a website?

 
Reply
   X-Cart forums > General > General talk
 
Thread Tools
  #1  
Old 04-11-2020, 09:25 AM
  johngwms's Avatar 
johngwms johngwms is offline
 

Senior Member
  
Join Date: Mar 2013
Location: North Wales
Posts: 187
 

Default How Can I download all PDFs from a website?

I NEED YOUR HELP, PLEASE

I thought this task would be straightforward.

I need to scrape my suppliers' websites for PDFs (Data Sheets, Manuals etc) so they can be loaded onto my site, as their distributor. The task just takes far too long top do manuals, one page at a time.

Has anyone come across a scraping tool that will do this by using the sitemap, or by generating one?
__________________
John Legg
www.TheDebugStore.com

5.4.1.7 Business, Crisp White Skin Template
Backorder-Preorder Module
X-Cart to Zoho Creator (in progress)
Zoho Creator <-> Zoho Inventory for order processing
NGINX hosted - XC virtual server
Reply With Quote
  #2  
Old 04-11-2020, 11:57 PM
  PhilJ's Avatar 
PhilJ PhilJ is offline
 

X-Guru
  
Join Date: Nov 2002
Location: UK
Posts: 3,769
 

Default Re: How Can I download all PDFs from a website?

https://www.httrack.com
Reply With Quote
  #3  
Old 04-12-2020, 01:16 AM
  johngwms's Avatar 
johngwms johngwms is offline
 

Senior Member
  
Join Date: Mar 2013
Location: North Wales
Posts: 187
 

Default Re: How Can I download all PDFs from a website?

Cheers, Phil
__________________
John Legg
www.TheDebugStore.com

5.4.1.7 Business, Crisp White Skin Template
Backorder-Preorder Module
X-Cart to Zoho Creator (in progress)
Zoho Creator <-> Zoho Inventory for order processing
NGINX hosted - XC virtual server
Reply With Quote
  #4  
Old 04-12-2020, 01:22 AM
  johngwms's Avatar 
johngwms johngwms is offline
 

Senior Member
  
Join Date: Mar 2013
Location: North Wales
Posts: 187
 

Default Re: How Can I download all PDFs from a website?

I don't think this is going to work as I need to download resources such as PDFs, images etc from the website. I can then upload them to my site.
__________________
John Legg
www.TheDebugStore.com

5.4.1.7 Business, Crisp White Skin Template
Backorder-Preorder Module
X-Cart to Zoho Creator (in progress)
Zoho Creator <-> Zoho Inventory for order processing
NGINX hosted - XC virtual server
Reply With Quote
  #5  
Old 04-12-2020, 01:27 AM
  PhilJ's Avatar 
PhilJ PhilJ is offline
 

X-Guru
  
Join Date: Nov 2002
Location: UK
Posts: 3,769
 

Default Re: How Can I download all PDFs from a website?

It does that
https://www.httrack.com/html/filters.html

https://www.google.com/search?q=httrack+pdf+files+only
Reply With Quote

The following user thanks PhilJ for this useful post:
johngwms (04-12-2020)
  #6  
Old 04-12-2020, 06:23 AM
  cflsystems's Avatar 
cflsystems cflsystems is offline
 

Veteran
  
Join Date: Apr 2007
Posts: 13,934
 

Default Re: How Can I download all PDFs from a website?

You can do this straight from the command line without any 3rd party software. On Linux/Mac the command is available out of the box. On Windows you need to get the wget.


wget https://www.donain.com


With options


wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://www.domain.com

--mirror - make the download recursive.
--no-parent – do not crawl the parent directory.
--convert-links - convert all links to work properly with the local offline copy.
--page-requisites - download js and css files to have the original page styling when browsing the local offline copy.
--adjust-extension - add appropriate extensions (e.g. html, css, js) to files if they were downloaded without them.


But there are tons of other options

wget --help
__________________
Steve Stoyanov
CFLSystems.com
Web Development
Reply With Quote
  #7  
Old 04-12-2020, 06:29 AM
  johngwms's Avatar 
johngwms johngwms is offline
 

Senior Member
  
Join Date: Mar 2013
Location: North Wales
Posts: 187
 

Default Re: How Can I download all PDFs from a website?

Thanks, Steve

The URL donain.com does not seem to exist. Looking seems to give references for very old versions of Windows.

Any idea if it is available for Win 10?
__________________
John Legg
www.TheDebugStore.com

5.4.1.7 Business, Crisp White Skin Template
Backorder-Preorder Module
X-Cart to Zoho Creator (in progress)
Zoho Creator <-> Zoho Inventory for order processing
NGINX hosted - XC virtual server
Reply With Quote
  #8  
Old 04-12-2020, 06:33 AM
  cflsystems's Avatar 
cflsystems cflsystems is offline
 

Veteran
  
Join Date: Apr 2007
Posts: 13,934
 

Default Re: How Can I download all PDFs from a website?

There is no "domain.com" - you are supposed to replace this with the site url you need to get. Sorry if it wasn't clear. The command is "wget". Just google how to install on Win, for exmaple - https://www.addictivetips.com/windows-tips/install-and-use-wget-in-windows-10/


Basically running it for your site will look like this


wget https://www.thedebugstore.com/


This will create new directory "www.thedebugstore.com" within the directory you are running the command from and put all the files in there.
__________________
Steve Stoyanov
CFLSystems.com
Web Development
Reply With Quote
  #9  
Old 04-12-2020, 06:37 AM
  johngwms's Avatar 
johngwms johngwms is offline
 

Senior Member
  
Join Date: Mar 2013
Location: North Wales
Posts: 187
 

Default Re: How Can I download all PDFs from a website?

Found a source, which seems to work. Just follow the instructions here:

https://builtvisible.com/download-your-website-with-wget/

I have installed it and it is working in a command window.
__________________
John Legg
www.TheDebugStore.com

5.4.1.7 Business, Crisp White Skin Template
Backorder-Preorder Module
X-Cart to Zoho Creator (in progress)
Zoho Creator <-> Zoho Inventory for order processing
NGINX hosted - XC virtual server
Reply With Quote
  #10  
Old 06-03-2020, 09:33 AM
 
hadder hadder is offline
    
Join Date: Apr 2020
Posts: 1
 

Default Re: How Can I download all PDFs from a website?

Quote:
Originally Posted by johngwms
Found a source, which seems to work. Just follow the instructions here:

https://builtvisible.com/download-your-website-with-wget/

I have installed it and it is working in a command window.

Great that you could implement it to your need.

wget is a great tool to backup your precious material on your website. Microsoft may do well to include such wonderful tool with Win10.
__________________
X-Cart v.5.4.1
Reply With Quote
Reply
   X-Cart forums > General > General talk


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -8. The time now is 09:01 PM.

   

 
X-Cart forums © 2001-2020