【Jsoup学习礼记】示例程序: 获取所有链接

这个示例程序将展示如何从一个URL获得一个页面。然后提取页面中的所有链接、图片和其它辅助内容。并检查URLs和文本信息。

运行下面程序需要指定一个URLs作为参数

package org.jsoup.examples;import org.jsoup.Jsoup;import org.jsoup.helper.Validate;import org.jsoup.nodes.Document;import org.jsoup.nodes.Element;import org.jsoup.select.Elements;import java.io.IOException;/** * Example program to list links from a URL. */public class ListLinks {public static void main(String[] args) throws IOException {Validate.isTrue(args.length == 1, "usage: supply url to fetch");String url = args[0];print("Fetching %s…", url);Document doc = Jsoup.connect(url).get();Elements links = doc.select("a[href]");Elements media = doc.select("[src]");Elements imports = doc.select("link[href]");print("\nMedia: (%d)", media.size());for (Element src : media) {if (src.tagName().equals("img"))print(" * %s: <%s> %sx%s (%s)",src.tagName(), src.attr("abs:src"), src.attr("width"), src.attr("height"),trim(src.attr("alt"), 20));elseprint(" * %s: <%s>", src.tagName(), src.attr("abs:src"));}print("\nImports: (%d)", imports.size());for (Element link : imports) {print(" * %s <%s> (%s)", link.tagName(),link.attr("abs:href"), link.attr("rel"));}print("\nLinks: (%d)", links.size());for (Element link : links) {print(" * a: <%s> (%s)", link.attr("abs:href"), trim(link.text(), 35));}}private static void print(String msg, Object… args) {System.out.println(String.format(msg, args));}private static String trim(String s, int width) {if (s.length() > width)return s.substring(0, width-1) + ".";elsereturn s;}}

org/jsoup/examples/ListLinks.java

示例输入结果Fetching Media: (38) * img: <> 18×18 () * img: <> 10×1 () * img: <> x () * img: <> 0x10 () * script: <?s=1138> * img: <> 15×1 () * img: <> x () * img: <> 25×1 () * img: <> x (Analytics by Mixpan.) Imports: (2) * link <> (stylesheet) * link <> (shortcut icon) Links: (141) * a: <> () * a: <> (Hacker News) * a: <> (new) * a: <> (comments) * a: <> (leaders) * a: <> (jobs) * a: <> (submit) * a: <?fnid=JKhQjfU7gW> (login) * a: <?for=1094578&dir=up&whence=%6e%65%77%73> () * a: <?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+readwriteweb+%28ReadWriteWeb%29&utm_content=Twitter> (Facebook speeds up PHP) * a: <?id=mcxx> (mcxx) * a: <?id=1094578> (9 comments) * a: <?for=1094649&dir=up&whence=%6e%65%77%73> () * a: <> ("Tough. Django produces XHTML.") * a: <?id=andybak> (andybak) * a: <?id=1094649> (3 comments) * a: <?for=1093927&dir=up&whence=%6e%65%77%73> () * a: <?fnid=p2sdPLE7Ce> (More) * a: <> (Lists) * a: <> (RSS) * a: <> (Bookmarklet) * a: <> (Guidelines) * a: <> (FAQ) * a: <> (News News) * a: <?id=363> (Feature Requests) * a: <> (Y Combinator) * a: <> (Apply) * a: <> (Library) * a: <> () * a: <?from=yc> ()

,命运掌握在自己手中

【Jsoup学习礼记】示例程序: 获取所有链接

相关文章:

你感兴趣的文章:

标签云: