Project LID Homepage: Distributable Modules
Updates
Update 8/15/2023: Program released.
Description
lccWebCrawler was created to:
- crawl (search) through one to many websites
- crawl through any links that match the same site
- search using different options, including:
- regex
- link starts with
- link contains
- skip certain content (like graphics, videos, etc.), only crawling 'text content'
- provided one to many filters per report set
- produce separate reports for each report set
- produce auto reports on links/pages crawled
Example Report (Email Addresses Pulled From MailtTo):
Installation
Documentation
Disclaimer
The programs, scripts and documentation are provided AS IS without warranty of any kind. Lower Columbia College further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the programs, scripts and documentation remains with you. In no event shall Lower Columbia College, its authors, or anyone else involved in the creation, production, or delivery of the programs, scripts or documentation be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the programs, scripts or documentation, even if Lower Columbia College has been advised of the possibility of such damages.