浅谈java之xml解析 - 编程开发

好久都没写博客了，原因是上个月电脑坏了，最近才拿回来，前两天有在忙乎着关于解析XML文件的项目。呵呵，在这里跟大家分享一下。 xml的解析无非就是对文件的分解，虚拟主机，首先将每一个节点的标签读取出来，然后再读节点中是否包含有参数，如果存在参数的话则遍历节点中的参数，也就是分解”=”两边的字符串，获取左边的作为参数名，而右边的则为参数对应的值；再往下就判断该节点他是否包含有”innerText”，当然这里有要求，若当前节点以”/>”结尾，同时却又包含”innerText”，香港虚拟主机，这种情况将视为语法错误；最后就是将节点下面的innerText解析成节点,如果存在节点的话。基本上XML解析就这点了，下面贴出部分我做的解析代码，并作些解释。

private synchronized Node parser(Node baseNode, String document) { // 该段内容为空，不做任何解析 if (document == null) { return null; } // 该方法为对子节点进行处理 if (document.indexOf(‘<‘) == -1 && document.indexOf(‘>’) == -1) { // 该段内容为innerText不做任何解析 return null; } if (document.indexOf(‘<‘) == -1 && document.indexOf(‘>’) != -1) { // 该段内容有误，抛出异常 throw new XMLContentException(); } if (document.indexOf(‘<‘) != -1 && document.indexOf(‘>’) == -1) { // 该段内容有误，抛出异常 throw new XMLContentException(); } if(document.indexOf(“<!–“) != -1 && document.indexOf(“–>”) == -1){ //该段内容有误，抛出异常 throw new XMLContentException(); } if(document.indexOf(“<!–“) == -1 && document.indexOf(“–>”) != -1){ //该段内容有误，抛出异常 throw new XMLContentException(); } // 用于匹配”<“后面出现的字符串是否为开始部分(第一个字母为英文字母)，节点名称长度不得小于1个字符 String regExTagStart = ” *[A-Za-z][\\w.\\-:]+[\\da-zA-Z]+ *”; Pattern regexTagStart = Pattern.compile(regExTagStart); document = document.substring(document.indexOf(‘<‘) + 1).trim(); // 如果当前节点不存在空格(比如<root>)，则说明当前节点为”纯节点节点” // 用于截取节点名称的索引 int endIndex = -1; // 指示当前节点是否包含有参数 boolean hasParams = true; if (document.substring(0, document.indexOf(‘>’)).indexOf(‘ ‘) == -1) { endIndex = document.indexOf(‘>’); hasParams = false; } else { endIndex = document.indexOf(‘ ‘); } // 获取当前节点名称 String tag = document.substring(0, endIndex); // 此处加一些验证 if (!regexTagStart.matcher(tag).matches()) { // 如果验证失败,抛出异常 throw new XMLContentException(); } // 创建用于存储当前节点的节点对象 Node node = new Node(tag); node.addLisener(nodeHandler); // 如果验证通过,并且当前节点中包含参数,则取出其节点中的参数及参数值 if (hasParams) { document = document.substring(document.indexOf(‘ ‘)).trim(); // 获取当前标签行 String tagInline = document.substring(0, document.indexOf(‘>’) + 1).trim(); if (tagInline.indexOf(“/>”) != -1 && document.indexOf(‘>’) == document.indexOf(“/>”) + 1) { document = document.substring(document.indexOf(“/>”)); } else if (tagInline.indexOf(“/>”) == -1) { document = document.substring(document.indexOf(‘>’)); } // 用于匹配标签行 Pattern regExInline = Pattern .compile(“(\\w+ *= *\”[^\\n\\f\\r\”]*\”[\\n\\r\\t ]*)*/?>$”); if (!regExInline.matcher(tagInline).matches()) { // 抛出异常 throw new XMLContentException(); } // 遍历节点所有属性，并将其添加至节点集合中 while (true) { // 如果当前节点中不存在参数，跳出循环 if (tagInline.indexOf(‘=’) == -1) { break; } String paramName = new String(); String paramValue = new String(); boolean paramIsKeyword = false; paramName = tagInline.substring(0, tagInline.indexOf(‘=’)) .trim(); tagInline = tagInline.substring(tagInline.indexOf(‘=’) + 1); paramValue = tagInline.substring(tagInline.indexOf(‘”‘) + 1); paramValue = paramValue.substring(0, paramValue.indexOf(‘”‘)); tagInline = tagInline.substring(tagInline.indexOf(‘”‘) + paramValue.length() + 2); // 如果节点参数名称为关键名字如”name”和”value”，则将其添加至特定的属性当中 if (paramName.equalsIgnoreCase(“name”)) { paramIsKeyword = true; node.setName(paramValue); } if (paramName.equalsIgnoreCase(“value”)) { paramIsKeyword = true; node.setValue(paramValue); } // 当节点参数名称不为关键名字则将其添加至参数集中 if (!paramIsKeyword) { // 将参数添加至节点列表 node.addParam(paramName, paramValue); } } } // 如果当前节点以”/>”结尾,则忽略node对象的innerText值 if (document.indexOf(“/>”) != -1 && document.indexOf(‘>’) == document.indexOf(“/>”) + 1) { // 当前节点已完毕，如果document中还存有文本，则继续查找下一个节点 document = document.substring(document.indexOf(‘>’) + 1); if (document.length() > 0) { baseNode.addNode(node); node = parser(baseNode, document); } return node; } //获取结束标签，去掉标签空格 document = document.replaceFirst(“</[ ]*” + tag + “[ ]*>”, “</” + tag + “>”); // 获取当前节点的innerText值:此处有些许问题，假如</[tag]>中间包含有空格，则当前节点无法结束，导致语法错误 String innerText = document.substring(document.indexOf(‘>’) + 1, document.indexOf(“</” + tag + “>”)); node.setInnerText(innerText); // 当前节点已完毕，如果document中还存有文本，则继续查找下一个节点:此处同上 document = document.substring( document.indexOf(“</” + tag + “>”) + tag.length() + 3).trim(); if (document.length() > 0) { baseNode.addNode(node); node = parser(baseNode, document); } return node; }

代码中基本上都是采用indexOf和substring来截取获得标签及其参数，有些难看，不过我目前正准备将他改写成以正则来解析，好了闲话不说多了，现在来分析他吧。

快乐要有悲伤作陪，雨过应该就有天晴。

相关文章：

你感兴趣的文章：

标签云：