Skip to main content

Java Web Crawler using JSoup

A much needed program for a business application is the infamous web-crawler. There are a few paid programs that accomplish this task. I wanted to create a web-crawler that is expandable; currently it traverses the website and gathers the different links. Future updates will provide capabilities such as:

  • Checking broken image links
  • Gathering useful information from each page
  • Checking broken links
  • Checking for repetitive information
Each one of these is crucial for SEO and now there's going to be an automated way to check for each.

To begin, download the website-crawler from https://github.com/dinocajic/java-website-crawler

Create a project and copy the files in the src folder to your IDE. Open the CrawlSite.java file and change the websiteAddress property. That's it. Compile and run.

To run through the program, Main.java instantiates CrawlSite.java. In CrawlSite.java, the output.txt file gets created so that the links can be stored in the text file. The URL is passed to the storeLinksFromPage() method and the fun begins. If you want to traverse only a certain amount of links, you can specify the stopAfter property to a number that you're comfortable with (i.e. 500). The link is stored to the output.txt file and also added to a visited pages linked list so that the crawler doesn't have to visit it again.

All of the elements with the "a" tag are grabbed and the link is extracted from the "href" attribute. The method goes through each of the links. Once it makes sure that the link goes to another page, the link is inserted to the visited pages linked list (for future use). Also, if the page hasn't been visited it recursively calls itself again to begin the process of getting the links from within the new page.

Comments

  1. T-Shirt | titanium pans | T-Shirts | T-Shirts | T-Shirt
    T-Shirts | T-Shirts | T-Shirts titanium plate flat iron | T-Shirts | titanium ore T-Shirts | T-Shirts | หารายได้เสริม T-Shirts | T-Shirts | T-Shirts | T-Shirts | T-Shirts | T-Shirts | T-Shirts | T-Shirts | T-Shirts | T-Shirts | is titanium expensive T-Shirts | T-Shirts | titanium trim hair cutter reviews T-Shirts | T-Shirts

    ReplyDelete

Post a Comment

Popular posts from this blog

Beginner Java Exercise: Sentinel Values and Do-While Loops

In my previous post on while loops, we used a loop-continuation-condition to test the arguments. In this example, we'll loop at a sentinel-controlled loop. The sentinel value is a special input value that tests the condition within the while loop. To jump right to it, we'll test if an int variable is not equal to 0. The data != 0 within the while (data != 0) { ... } is the sentinel-controlled-condition. In the following example, we'll keep adding an integer to itself until the user enters 0. Once the user enters 0, the loop will break and the user will be displayed with the sum of all of the integers that he/she has entered. As you can see from the code above, the code is somewhat redundant. It asks the user to enter an integer twice: Once before the loop begins, and an x amount of times within the loop (until the user enters 0). A better approach would be through a do-while loop. In a do-while loop, you "do" something "while" the condition...

Laravel 6.x with React and react-router

This will get you started on getting your first React/Laravel application deployed to your server. We'll cover everything from installation to deployment. Start by reading the installation instructions on  https://laravel.com/docs/6.x#installing-laravel . We'll cover those details below. Setting Up Laravel Check that you have the latest version of PHP installed on your computer.  It must be >= 7.2.0. Open terminal to get the Laravel installation tool. Type in composer global require laravel/installer Type in laravel to verify installation. Navigate to a directory on your computer where you want to install your project on your terminal. Run the following command: laravel new project_name (replace project_name with your project name). Once complete, cd into your new project. Type the following command: php artisan serve. You'll get a message like the following if it's running successfully: Laravel development server started: http://127.0.0.1:8000 ...

Creating your own ArrayList in Java

Wanted to show that certain data structures in Java can be created by you. In this example, we'll go ahead and create an ArrayList data structure that has some of the methods that the built in ArrayList class has. We'll create 2 constructors: The default constructor that creates an ArrayList with a default size of 10. Constructor that allows an initial size to be passed to the array. We'll also create a number of methods: void add(Object x);  A method that allows you to place an Object at the end of the ArrayList. void add(int index, Object x);  A method that allows you to place a value at a given location. Object get(int index):  Allows you to retrieve a value of the arrayList array from a given location. int size();  Allows you to get the number of elements currently in the Arraylist. boolean isEmpty();  Tests to see if the Arraylist is empty. boolean isIn(Object x);  A method that sees if a particular object exist in the arrayList. int ...