共计 1411 个字符,预计需要花费 4 分钟才能阅读完成。
下面的例子演示如何利用正则表达式从一个 URL 中查找并输出所有类似下面的超链接:
<a href=http://www.9iyyzm.com>
首先我们从命令行输入 URL 地址,打开输入流,读取 URL 的内容并转化为字符串存入 htmlString 中。然后以 ”(]*>)” 构造正则表达式,最后在字符串 htmlString 中查找匹配的字符串。
import java.io.*;
import java.net.*;
import java.util.regex.*;
public class GetHref {
public static void main(String[] args) {
InputStream in = null;
PrintWriter out = null;
String htmlString=null;
try {
// Check the arguments
if ((args.length != 1)&& (args.length != 2))
throw new IllegalArgumentException(“Wrong number of args”);
// Set up the streams
URL url = new URL(args[0]); // Create the URL
in = url.openStream(); // Open a stream to it
if (args.length == 2) // Get an appropriate output stream
out = new PrintWriter(new FileWriter(args[1]));
BufferedReader bin=new BufferedReader(new InputStreamReader(in));
String line;
StringBuffer sb = new StringBuffer();
while((line=bin.readLine())!=null){
if(out!=null) out.println(line);
sb=sb.append(line);
}
htmlString=sb.toString();
// System.out.println(sb.toString());
}
// On exceptions, print error message and usage message.
catch (Exception e) {
System.err.println(e);
System.err.println(“Usage: java GetURL <URL> [<filename>]”);
}
finally {// Always close the streams, no matter what.
try {in.close(); out.close();} catch (Exception e) {}
}
Pattern p = Pattern.compile(“(]*>)”);
Matcher m = p.matcher(htmlString);
boolean result = m.find();
while(result){
for(int i=1;i<=m.groupCount();i++){
System.out.println(m.group(i));
}
result=m.find();
}
}
}
程序运行结果:
C:\java>java GetHrefhttp://127.0.0.1:8080/zz3zcwbwebhome/index.jsp
<a href=mailto:hi@javaweb.cc>