Parsing XML with Java is very simple. If you want to parse some HTML tags, then you just have to add a root element around those tags (to make it a valid XML structure) and then you can use the Java SAX XML parser.
Code
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
public class NewMain {
public static void main(String[] args) throws Exception {
String html = "<h1>Headline</h1><p><b>Hello World.</b><b>This is a test.</b></p>";
html = "<root>" + html + "</root>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(html)));
NodeList elementsByTagName = document.getElementsByTagName("b");
for (int i = 0; i < elementsByTagName.getLength(); i++) {
Node element = elementsByTagName.item(i);
String text = element.getTextContent();
System.out.println(text);
}
}
} |
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
public class NewMain {
public static void main(String[] args) throws Exception {
String html = "<h1>Headline</h1><p><b>Hello World.</b><b>This is a test.</b></p>";
html = "<root>" + html + "</root>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(html)));
NodeList elementsByTagName = document.getElementsByTagName("b");
for (int i = 0; i < elementsByTagName.getLength(); i++) {
Node element = elementsByTagName.item(i);
String text = element.getTextContent();
System.out.println(text);
}
}
}
Result
Hello World.
This is a test. |
Hello World.
This is a test.
If you wanna do some HTML with java, I recommend the jsoup project instead. SAX is almost too strict for using it as a good html parser.