Monday, 14 August 2017

Jsoup

JSOUP: - HTML Parser



JSOUP library provide features to parse HTML pages.
Please refer https://jsoup.org/ for api details.

Jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

1.      scrape and parse HTML from a URL, file, or string
2.      find and extract data, using DOM traversal or CSS selectors
3.      manipulate the HTML elements, attributes, and text
4.      clean user-submitted content against a safe white-list, to prevent XSS attacks
5.      output tidy HTML

Example:-

package com.test.main;

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

public class Test1 {
  public static void main(String[] args) throws IOException {
       Document doc = Jsoup.connect("https://jsoup.org/").get();
       Elements div = doc.select("div");
       System.out.println(div.html());
  }
}

1 comment: