Dino Cajic

Posts

Showing posts from January, 2017

Java Web Crawler using JSoup

A much needed program for a business application is the infamous web-crawler. There are a few paid programs that accomplish this task. I wanted to create a web-crawler that is expandable; currently it traverses the website and gathers the different links. Future updates will provide capabilities such as: Checking broken image links Gathering useful information from each page Checking broken links Checking for repetitive information Each one of these is crucial for SEO and now there's going to be an automated way to check for each. To begin, download the website-crawler from https://github.com/dinocajic/java-website-crawler Create a project and copy the files in the src folder to your IDE. Open the CrawlSite.java file and change the websiteAddress property. That's it. Compile and run. To run through the program, Main.java instantiates CrawlSite.java. In CrawlSite.java, the output.txt file gets created so that the links can be stored in the text file. The UR...