数组协变带来的静态类型漏洞

在刚才一个通不过Java字节码校验的例子里,我们看到JVM会对其所加载的.class文件做校验,以保证 类型安全。但Java里有这么一种情况,是编译器和JVM的字节码校验都无法检测到,而要到实际运行的时 候才能发现的错误——数组的协变导致的类型静态系统漏洞。

还是像前一帖一样,用ASM来生成字节码:

Java代码

import java.io.FileOutputStream;import org.objectweb.asm.ClassWriter;import org.objectweb.asm.MethodVisiTor;import org.objectweb.asm.Opcodes;public class TestASM implements Opcodes {    public static void main(String[] args) throws Exception {        ClassWriter cw = new ClassWriter(0);        cw.visit(            V1_5,               // class format version            ACC_PUBLIC,         // class modifiers            "TestVerification", // class name fully qualified name            null,               // generic signature            "java/lang/Object", // super class fully qualified name            new String[] { }    // implemented interfaces        );                MethodVisiTor mv = cw.visitMethod(            ACC_PUBLIC + ACC_STATIC,   // Access modifiers            "main",                    // method name             "([Ljava/lang/String;)V", // method description             null,                     // generic signature             null                      // exceptions        );        mv.visitCode();        mv.visitInsn(ICONST_1);        mv.visitTypeInsn(ANEWARRAY, "java/lang/Float");        mv.visitTypeInsn(CHECKCAST, "[Ljava/lang/Object;");        mv.visitVarInsn(ASTorE, 0);        mv.visitVarInsn(ALOAD, 0);        mv.visitInsn(ICONST_0);        mv.visitLdcInsn("a string");        mv.visitInsn(AASTorE);        mv.visitVarInsn(ALOAD, 0);        mv.visitInsn(ICONST_0);        mv.visitInsn(AALOAD);        mv.visitMethodInsn(INVOKEVIRTUAL, "java/lang/Object", "toString", "()V");        mv.visitInsn(RETURN);        mv.visitMaxs(3, 1);        mv.visitEnd(); // end method        cw.visitEnd(); // end class                byte[] clz = cw.toByteArray();        FileOutputStream ut = new FileOutputStream("TestVerification.class");        out.write(clz);        out.close();    }}

得到的是:

Java bytecode代码

public class TestVerification extends java.lang.Object  minor version: 0  major version: 49  Constant pool:const #1 = Asciz        TestVerification;const #2 = class        #1;     //  TestVerificationconst #3 = Asciz        java/lang/Object;const #4 = class        #3;     //  java/lang/Objectconst #5 = Asciz        main;const #6 = Asciz        ([Ljava/lang/String;)V;const #7 = Asciz        java/lang/Float;const #8 = class        #7;     //  java/lang/Floatconst #9 = Asciz        [Ljava/lang/Object;;const #10 = class       #9;     //  "[Ljava/lang/Object;"const #11 = Asciz       a string;const #12 = String      #11;    //  a stringconst #13 = Asciz       toString;const #14 = Asciz       ()V;const #15 = NameAndType #13:#14;//  toString:()Vconst #16 = Method      #4.#15; //  java/lang/Object.toString:()Vconst #17 = Asciz       Code;{public static void main(java.lang.String[]);  Code:   Stack=3, Locals=1, Args_size=1   0:   iconst_1   1:   anewarray       #8; //class java/lang/Float   4:   checkcast       #10; //class "[Ljava/lang/Object;"   7:   asTore_0   8:   aload_0   9:   iconst_0   10:  ldc     #12; //String a string   12:  aasTore   13:  aload_0   14:  iconst_0   15:  aaload   16:  invokevirtual   #16; //Method java/lang/Object.toString:()V   19:  return}

这次的代码其实直接用Java源码也能表示出来,也就是:

Java代码

public class TestVerification {    public static void main(String[] args) {        Object[] array = (Object[]) new Float[1];        array[0] = "a string"; // 问题出在这里        array[0].toString();    }}

编译不会有任何问题。这代码也是完全符合Java规范,也满足JVM的静态校验对类型的要求,所以加载 时的校验也没问题。

但是运行的话……

Command prompt代码

Exception in thread "main" java.lang.ArraySToreException: java.lang.String        at TestVerification.main(Unknown Source)

很明显我们没办法把一个String类型的对象保存到一个Float[]里,但由于Java数组是协变的,所以 Java的静态类型系统允许我们这么做,却会到运行时扔异常出来。

.NET很不幸的模仿了Java的这个特性,也把数组设计为协变的。因而CLI与JVM一样(JVM:aasTore; CLI:stelem),也必须在运行时对数组的保存做动态类型检查。这对性能的影响自然不太好,而且也使 得VM的实现更复杂……诶。

《Virtual Machines: Versatile Platforms for Systems and Processes》影印版第289页倒数第二 段提到:

引用

Hence, if an object is Accessed, the field information for the Access can also be checked statically (there is an exception for arrays, given in the next paragraph).

然后在接下来的一段里,这本书却只提到了动态检查数组访问时越界检查,而没有提到由协变带来的 静态类型漏洞。我觉得这里还是提一下协变问题比较好的。毕竟,数组长度并不是Java的静态类型的一部 分,它的检查只能留待运行时检查(VM可以根据数据流分析而消除许多数组越界和空指针检查就是了); 而类型协变是静态类型系统的一部分,却有漏洞所以运行时仍然要检查,这就不爽了。

看看Martin Odersky在最近的一个访谈里对Java数组的协变的评论:

Martin Odersky 写道

Bill Venners: You said you found it frustrating at times to have the constraints of needing to be backwards compatible with Java. Can you give some specific examples of things you couldn’t do when you were trying to live within those constraints, which you were then able to do when you changed to doing something that’s binary but not source compatible?

Martin Odersky: In the generics design, there were a lot of very, very hard constraints. The strongest constraint, the most difficult to cope with, was that it had to be fully backwards compatible with ungenerified Java. The sTory was the collections library had just shipped with 1.2, and Sun was not prepared to ship a completely new collections library just because generics came about. So instead it had to just work completely transparently.

That’s why there were a number of fairly ugly things. You always had to have ungenerified types with generified types, the so called raw types. Also you couldn’t change what arrays were doing so you had unchecked warnings. Most importantly you couldn’t do a lot of the things you wanted to do with arrays, like generate an array with a type parameter T, an array of something where you didn’t know the type. You couldn’t do that. Later in Scala we actually found out how to do that, but that was possible only because we could drop in Scala the requirement that arrays are covariant.

Bill Venners: Can you elaborate on the problem with Java’s covariant arrays?

Martin Odersky: When Java first shipped, Bill Joy and James Gosling and the other members of the Java team thought that Java should have generics, only they didn’t have the time to do a good job designing it in. So because there would be no generics in Java, at least initially, they felt that arrays had to be covariant. That means an array of String is a subtype of array of Object, for example. The reason for that was they wanted to be able to write, say, a “generic” sort method that took an array of Object and a comparaTor and that would sort this array of Object. And then let you pass an array of String to it. It turns out that this thing is type unsound in general. That’s why you can get an array sTore exception in Java. And it actually also turns out that this very same thing blocks a decent implementation of generics for arrays. That’s why arrays in Java generics don’t work at all. You can’t have an array of list of string, it’s impossible. You’re forced to do the ugly raw type, just an array of list, forever. So it was sort of like an original sin. They did something very quickly and thought it was a quick hack. But it actually ruined every design decision later on. So in order not to fall into the same trap again, we had to break off and say, now we will not be upwards compatible with Java, there are some things we want to do differently.

P.S. 不知道协变是什么的同学可以读读Wikipedia上的词条

P.P.S 不认识Martin Odersky的同学请留意:只要用到Java 5的泛型,你们的代码里就有他的痕迹。 他是Pizza语言的设计者,后来参与了GJ(Generic Java)的设计;后者就是后来Java 5中的泛型的基石 。Martin还设计了Scala << 知道Scala的人肯定比知道Pizza的多多了……

发现一种久违的感动。

数组协变带来的静态类型漏洞

相关文章:

你感兴趣的文章:

标签云: