Hadoop序列化机制及实例

序列化

1、什么是序列化？将结构化对象转换成字节流以便于进行网络传输或写入持久存储的过程。 2、什么是反序列化？将字节流转换为一系列结构化对象的过程。

序列化用途：

1、作为一种持久化格式。 2、作为一种通信的数据格式。 3、作为一种数据拷贝、克隆机制。

Java序列化和反序列化

1、创建一个对象实现了Serializable 2、序列化：ObjectOutputStream.writeObject(序列化对象) 反序列化：ObjectInputStream.readObject()返回序列化对象具体实现，可参考如下文章：

为什么Hadoop不直接使用java序列化？

Hadoop的序列化机制与java的序列化机制不同，它将对象序列化到流中，值得一提的是java的序列化机制是不断的创建对象，但在Hadoop的序列化机制中，，用户可以复用对象，这样就减少了java对象的分配和回收，提高了应用效率。

Hadoop序列化

Hadoop的序列化不采用java的序列化，而是实现了自己的序列化机制。 Hadoop通过Writable接口实现的序列化机制，不过没有提供比较功能，所以和java中的Comparable接口合并，提供一个接口WritableComparable。（自定义比较）

Writable接口提供两个方法(write和readFields)。

package org.apache.hadoop.io;public interface Writable { void write(DataOutput out) throws IOException; void readFields(DataInput in) throws IOException;}

需要进行比较的话，要实现WritableComparable接口。

<T> extends Writable, Comparable<T>{}

比如mapreduce中需要对key值进行相应的排序。可参考下面的例子：

Hadoop提供了几个重要的序列化接口与实现类：外部集合的比较器

RawComparator<T>、WritableComparator

package org.apache.hadoop.io;<int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);}{ <? extends WritableComparable> keyClass; ; ;}实现了WritableComparable接口的类（自定义比较）org接口WritableComparable<T>父接口Comparable<T>, Writable 基础实现类BooleanWritable, ByteWritable, ShortWritable,IntWritable,VIntWritable,LongWritable, VLongWritable , FloatWritable, DoubleWritable高级实现类MD5Hash, NullWritable,Text, BytesWritable,ObjectWritable,GenericWritable

仅实现了Writable接口的类

orgInterface(接口) WritableAll Known Subinterfaces（子接口）: Counter, CounterGroup, CounterGroupBase<T>, InputSplit, InputSplitWithLocationInfo, WritableComparable<T> 仅实现了Writable接口的类数组：AbstractWritable、TwoDArrayWritable映射：AbstractMapWritable、MapWritable、SortedMapWritable

Writable接口

Text Text是UTF-8的Writable，可以理解为java.lang.String相类似的Writable。Text类替代了UTF-8类。Text是可变的，其值可以通过调用set(）方法改变。最大可以存储2GB的大小。

NullWritable NullWritable是一种特殊的Writable类型，它的序列化长度为零，可以用作占位符。

BytesWritable BytesWritable是一个二进制数据数组封装，序列化格式是一个int字段。例如：一个长度为2，值为3和5的字节数组序列后的结果是：

() throws IOException {BytesWritable bytesWritable=new BytesWritable(new byte[]{3,5});byte[] bytes=SerializeUtils.serialize(bytesWritable);Assert.assertEquals(StringUtils.byteToHexString(bytes),”000000020305″); //true}

BytesWritable是可变的，其值可以通过调用set()方法来改变。

ObjectWritable ObjectWritable适用于字段使用多种类型时。

Writable集合 1、ArrayWritable和TwoDArrayWritable是针对数组和二维数组。 2、MapWritable和SortedMapWritable是针对Map和SortMap。

自定义Writable

1、实现WritableComparable接口 2、实现相应的接口方法： A.write() //将对象转换为字节流并写入到输出流out中。 B.readFileds() //从输入流in中读取字节流并发序列化为对象。 C.compareTo(o) //将this对象和对象o进行比较。可参考下面的例子，自定义NewK2类：

package Writable;import java.io.BufferedInputStream;import java.io.BufferedOutputStream;import java.io.DataInput;import java.io.DataInputStream;import java.io.DataOutput;import java.io.DataOutputStream;import java.io.File;import java.io.FileInputStream;import java.io.FileNotFoundException;import java.io.FileOutputStream;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.WritableComparable;{(String[] args) throws IOException {Student student = new Student(“liguodong”, 22, “男”);BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(new File(“g:/liguodong.txt”)));DataOutputStream dos = new DataOutputStream(bos);student.write(dos);dos.flush();dos.close();bos.close();Student student2 = new Student();BufferedInputStream bis = new BufferedInputStream(new FileInputStream(new File(“g:/liguodong.txt”)));DataInputStream dis = new DataInputStream(bis);student2.readFields(dis);System.out.println(“name=”+student2.getName()+”,age=”+student2.getAge()+”,sex=”+student2.getSex());}}class Student implements WritableComparable<Student>{private Text name = new Text();private IntWritable age = new IntWritable();private Text sex = new Text();public Student() {}public Student(String name, int age, String sex) {super();this.name = new Text(name);this.age = new IntWritable(age);this.sex = new Text(sex);}public Text getName() {return name;}(Text name) {this.name = name;}public IntWritable getAge() {return age;}(IntWritable age) {this.age = age;}public Text getSex() {return sex;}(Text sex) {this.sex = sex;}(DataOutput out) throws IOException {name.write(out);age.write(out);sex.write(out);}(DataInput in) throws IOException {//如果使用Java数据类型，比如String name; //this.name = in.readUTF();只能使用这种类型。name.readFields(in);age.readFields(in);sex.readFields(in);}(Student o) {int result=0;if((result=this.name.compareTo(o.getName())) != 0 ){return result;}if((result=this.age.compareTo(o.getAge())) != 0 ){return result;}if((result=this.sex.compareTo(o.getSex())) != 0 ){return result;}return 0;}}

运行结果：

name=liguodong,age=22,sex=男Hadoop序列化优势：当你下定决心准备出发时，最困难的时刻就已经过去了。那么，出发吧。

相关文章：

你感兴趣的文章：

标签云：