A much needed program for a business application is the infamous web-crawler. There are a few paid programs that accomplish this task. I wanted to create a web-crawler that is expandable; currently it traverses the website and gathers the different links. Future updates will provide capabilities such as:
- Checking broken image links
- Gathering useful information from each page
- Checking broken links
- Checking for repetitive information
Each one of these is crucial for SEO and now there's going to be an automated way to check for each.
To begin, download the website-crawler from https://github.com/dinocajic/java-website-crawler
Create a project and copy the files in the src folder to your IDE. Open the CrawlSite.java file and change the websiteAddress property. That's it. Compile and run.
To run through the program, Main.java instantiates CrawlSite.java. In CrawlSite.java, the output.txt file gets created so that the links can be stored in the text file. The URL is passed to the storeLinksFromPage() method and the fun begins. If you want to traverse only a certain amount of links, you can specify the stopAfter property to a number that you're comfortable with (i.e. 500). The link is stored to the output.txt file and also added to a visited pages linked list so that the crawler doesn't have to visit it again.
All of the elements with the "a" tag are grabbed and the link is extracted from the "href" attribute. The method goes through each of the links. Once it makes sure that the link goes to another page, the link is inserted to the visited pages linked list (for future use). Also, if the page hasn't been visited it recursively calls itself again to begin the process of getting the links from within the new page.
T-Shirt | titanium pans | T-Shirts | T-Shirts | T-Shirt
ReplyDeleteT-Shirts | T-Shirts | T-Shirts titanium plate flat iron | T-Shirts | titanium ore T-Shirts | T-Shirts | หารายได้เสริม T-Shirts | T-Shirts | T-Shirts | T-Shirts | T-Shirts | T-Shirts | T-Shirts | T-Shirts | T-Shirts | T-Shirts | is titanium expensive T-Shirts | T-Shirts | titanium trim hair cutter reviews T-Shirts | T-Shirts